The authors consider the problem of a learning machine (LM) interacting with a random environment which offers the LM M actions. They endeavor to obtain machines which yield an ordering of the individual actions in terms of their optimality. The problem is posed in its generality and a formal solution provided using a new mathematical model called the random race. In the simplest learning model, the learning multiple race track (LMRT) model, each racer runs on his own track, thus disallowing interference between the racers. However, in a more interactive setting, the learning single race track (LSRT) model, the racers run on a single track, and in this case, interferences between racers are specified in terms of overtaking rules. The order in which the racers reach the goal is defined as the ordering to which the learning machine converges. The LMRT scheme is examined, and various results are derived for the case when no a priori information is utilized and for the case when the LMRT uses the a priori information by giving the racers handicaps which are either uniformly or geometrically distributed. Analogous results for the LSRT are conjectured.

Additional Metadata
Conference Conference Proceedings of the 1991 IEEE International Conference on Systems, Man, and Cybernetics
Oommen, J, Ng, D.T.H. (D. T H), & Hansen, E.R. (E. R.). (1991). On using random races in learning machines which rank actions in stochastic environments. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics (pp. 1465–1471).