The authors consider variable-structure stochastic automata (VSSA) that interact with an environment and dynamically learn the optimal action available to them. Like all VSSA, the automata are fully defined by a set of action probability updating rules. They examine the case in which the probability updating functions can assume only a finite number of values. These values discretize the probability space left bracket 0,1 right bracket , and hence they are called discretized learning automata. The discretized automata are linear because the subintervals in left bracket 0,1 right bracket are of equal length. The authors prove the following results: (i) two-action discretized linear reward-penalty automata are ergodic and epsilon -optimal in all environments where the minimum penalty probability is less than 0. 5; (ii) there exist discretized two-action linear reward-penalty automata that are ergodic and epsilon -optimal in all random environments; and (iii) discretized two-action linear reward-penalty automata with artificially created absorbing barriers are epsilon -optimal in all random environments.

Additional Metadata
Conference Proceedings of the 1987 IEEE International Conference on Systems, Man and Cybernetics.
Citation
Oommen, J, & Christensen, J.P.R. (J. P R). (1987). ON THREE FAMILIES OF ASYMPTOTICALLY OPTIMAL LINEAR REWARD-PENALTY LEARNING AUTOMATA. In Proceedings of the 1987 IEEE International Conference on Systems, Man and Cybernetics. (pp. 923–928).