Consider a learning machine (LM) interacting with an environment G. The environment offers the machine M actions. Traditionally, learning systems endeavor to compute the best action that the environment offers, and this is done without any estimation procedure. In this paper, we consider the problem of the LM computing not only the optimal action offered but also the ordering of the actions in terms of their optimality. The problem is posed in its generality and various norms of learning in this setting are formalized. Also various learning strategies are presented that use a new mathematical model called the random race. In this model the learning is modeled using M racers that are running toward a goal. At each instant, racer R<inf>i</inf> moves toward the goal with a probability of s<inf>i</inf> and stays where he is with a probability of (1 - s<inf>i</inf>). In the simplest learning model, the learning multiple race track (LMRT) model, the racers run on multiple tracks, and in this scenario, each racer has his own track, thus disallowing interference between the racers. However, in a more general setting, the learning single race track (LSRT) model, the racers run on a single track, and in this case, interferences between racers are specified in terms of overtaking rules. In this paper, we first examine the learning multiple race track (LMRT) model, and we have shown that in the absence of a priori information the LMRT is permutationally ε-optimal in all suggestive random environments. Subsequently, a variety of results are proven for the cases when the LMRT uses the a priori information by giving the racers handicaps that are either uniformly or geometrically distributed. Analogous results for the learning single race track (LSRT) model are also conjectured and these conjectures are strengthened by numerous simulations.

Additional Metadata
Persistent URL dx.doi.org/10.1109/21.260677
Journal IEEE Transactions on Systems, Man and Cybernetics
Citation
Ng, D.T.H. (D. T H), Oommen, J, & Hansen, E.R. (E. R.). (1993). Adaptive Learning Mechanisms for Ordering Actions Using Random Races. IEEE Transactions on Systems, Man and Cybernetics, 23(5), 1450–1465. doi:10.1109/21.260677