A learning automaton is a machine that interacts with a random environment and that simultaneously learns the optimal action that the environment offers to it. Learning automata with variable structure are considered. Such automata are completely defined by a set of probability updating rules. Contrary to all the variable-structure stochastic automata (VSSA) discussed in the literature, which update the probabilities in such a way that an action probability can take any real value in the interval [0, 1], the probability space is discretized so as to permit the action probability to assume one of a finite number of distinct values in [0, 1]. The discretized automaton is termed linear or nonlinear depending on whether the subintervals of [0, 1] are of equal length. It is proven that 1) discretized two-action linear reward-inaction automata are absorbing and ε-optimal in all environments; 2) discretized two-action linear inaction-penalty automata are ergodic and expedient in all environments; 3) discretized two-action linear inaction-penalty learning automata with artificially created absorbing barriers are c-optimal in all random environments; and 4) there exist nonlinear discretized reward-inaction automata that are ε-optimal in all random environments. The maximum advantage gained by rendering any finite-state discretized automaton nonlinear has also been derived.

Additional Metadata
Persistent URL dx.doi.org/10.1109/TSMC.1986.4308951
Journal IEEE Transactions on Systems, Man and Cybernetics
Oommen, J. (1986). Absorbing and Ergodic Discretized Two-Action Learning Automata. IEEE Transactions on Systems, Man and Cybernetics, 16(2), 282–293. doi:10.1109/TSMC.1986.4308951