Learning automata are considered which update their action probabilities on the basis of the responses they get from a random environment. The automata update the probabilities whether the environment responds with a reward or a penalty. Learning automata are said to be ergodic if the distribution of the limiting action probability vector is independent of the initial distribution. An ergodic scheme is presented which can take into consideration a priori information about the action probabilities. This is the only reported scheme in the literature capable of achieving this. The mean and the variance of the limiting distribution of the automaton is derived, and it is shown that the mean is not independent of the a priori information. Further, it is shown that the expressions for the foregoing quantities are general cases of the corresponding quantities derived for the familiar L RP scheme. Finally, it is shown that byconstantly updating the parameter quantifying the a priori information, a resultant linear scheme can be obtained. This scheme is of a reward- reward flavor and yet is absolutely expedient. It falls within the class of absolutely expedient schemes presented by Aso and Kimura. Copyright

Additional Metadata
Persistent URL dx.doi.org/10.1109/TSMC.1987.289367
Journal IEEE Transactions on Systems, Man and Cybernetics
Oommen, J. (1987). Ergodic Learning Automata Capable of Incorporating A Priori Information. IEEE Transactions on Systems, Man and Cybernetics, 17(4), 717–723. doi:10.1109/TSMC.1987.289367