Learning automata are stochastic automata interacting with an unknown random environment. The fundamental problem is that of learning, through feedback, the action that has the highest probability of being rewarded by the environment. A class of algorithms known as estimator algorithms are presently among the fastest known. They are characterized by the use of running estimates of the probabilities of each possible action being rewarded. The improvements gained by rendering the various estimator algorithms discrete are investigated. This is done by restricting the probability of selecting an action to a finite, and hence, discrete subset of [0, 1]. This modification is proven to be [fumula omtted]-optimal in all stationary environments. In the paper, various discretized estimator algorithms (DEA's) are constructed. Subsequently, members of the family of DEA's will be shown to be [fumula omtted]-optimal by deriving two sufficient conditions required for the -optimality-the properties of monotonicity and moderation. There is a conjecture presented about the necessity of these conditions for [fumula omtted]-optimality too. Experimental results indicate that the discrete modifications improve the performance of these algorithms to the extent that these automata constitute the fastest converging and most accurate learning automata reported to date.

Additional Metadata
Persistent URL dx.doi.org/10.1109/21.199471
Journal IEEE Transactions on Systems, Man and Cybernetics
Lanctot, J.K. (J.Kevin), & Oommen, J. (1992). Discretized Estimator Learning Automata. IEEE Transactions on Systems, Man and Cybernetics, 22(6), 1473–1483. doi:10.1109/21.199471