A learning automaton is a finite state machine that learns the optimal action from a set of actions offered to it by an environment. In this correspondence, the automata considered have a variable structure and hence they are completely described by action probability updating functions. The action probabilities can take only a finite number of prespecified values. These values linearly increase and the interval [0,1] is divided into a number of equal length subintervals. The probability is updated by the automata only if the environment responds with a reward and hence they are called discretized linear reward-inaction (DLRI) automata. The asymptotic optimality of this family of automata is proved for all environments.

Additional Metadata
Persistent URL dx.doi.org/10.1109/TSMC.1984.6313256
Journal IEEE Transactions on Systems, Man and Cybernetics
Citation
Oommen, J, & Hansen, E. (Eldon). (1984). The Asymptotic Optimality of Discretized Linear Reward—Inaction Learning Automata. IEEE Transactions on Systems, Man and Cybernetics, SMC-14(3), 542–545. doi:10.1109/TSMC.1984.6313256