A game of guarding a territory in a grid world is proposed in this paper. A defender tries to intercept an invader before he reaches the territory. Two reinforcement learning algorithms are applied to make two players learn their optimal policies simultaneously. Minimax-Q learning algorithm and Win-or-Learn-Fast Policy Hill-Climbing learning algorithm are introduced and compared. Simulation results of two reinforcement learning algorithms are analyzed.

2010 American Control Conference, ACC 2010
Department of Systems and Computer Engineering

Lu, X. (Xiaosong), & Schwartz, H.M. (2010). An investigation of guarding a territory problem in a grid world. In Proceedings of the 2010 American Control Conference, ACC 2010 (pp. 3204–3210).