The problem of a stochastic learning automation interacting with an unknown random environment is considered. The fundamental problem is that of learning, through interaction, the best action allowed by the environment O.e., the action that is rewarded optimally). By using running estimates of reward probabilities to learn the optimal action, an extremely efficient pursuit algorithm (PA) was reported in earlier works, which is presently among the fastest algorithms known. The improvements gained by rendering the PA discrete is investigated. This is done by restricting the probability of selecting an action to a finite, and hence, discrete subset of [0,1]. This improved scheme is proven to be ∈ -optimal in all stationary environments. Furthermore, the experimental results seem to indicate that the algorithm presented in the paper is faster than the fastest “nonestimator” learning automata reported to date and also faster than the continuous pursuit automaton pursuit algorithm is also presented.

Additional Metadata
Persistent URL dx.doi.org/10.1109/21.105092
Journal IEEE Transactions on Systems, Man and Cybernetics
Citation
Oommen, J, & Lanctôt, J.K. (J. Kevin). (1990). Discretized Pursuit Learning Automata. IEEE Transactions on Systems, Man and Cybernetics, 20(4), 931–938. doi:10.1109/21.105092