A game of guarding a territory in a grid world is proposed in this paper. A defender tries to intercept an invader before he reaches the territory. Two reinforcement learning algorithms are applied to make two players learn their optimal policies simultaneously. Minimax-Q learning algorithm and Win-or-Learn-Fast Policy Hill-Climbing learning algorithm are introduced and compared. Simulation results of two reinforcement learning algorithms are analyzed.

Additional Metadata
Conference 2010 American Control Conference, ACC 2010
Lu, X. (Xiaosong), & Schwartz, H.M. (2010). An investigation of guarding a territory problem in a grid world. In Proceedings of the 2010 American Control Conference, ACC 2010 (pp. 3204–3210).