Learning automata which update their action probabilities on the basis of the responses they get from a random environment are considered. An ergodic scheme is presented which can take into consideration a priori information about the action probabilities. This to the author's knowledge is the only scheme reported in the literature capable of achieving this. The mean and the variance of the limiting distribution of the automaton are derived, and it is shown that the mean is not independent of the a priori information. It is also shown that the expressions for the above quantities are general cases of the corresponding quantities derived for the familiar linear reward-penalty scheme. By constantly updating the parameter quantifying the a priori information, a resultant linear scheme can be obtained. This scheme is counter intuitive, for it is shown to be of a reward-reward flavor and is yet absolutely expedient. This demonstrates that absolutely expedient schemes have far more general properties than previous schemes.

Additional Metadata
Conference Proceedings of the 1986 IEEE International Conference on Systems, Man, and Cybernetics.
Citation
Oommen, J. (1986). ON HOW TWO-ACTION ERGODIC LEARNING AUTOMATA CAN UTILIZE A PRIORI INFORMATION. In Proceedings of the 1986 IEEE International Conference on Systems, Man, and Cybernetics. (pp. 308–312).