A tabular Reinforcement Learning algorithm, such as Q-learning, is incapable of solving problems with large state spaces due to explosive growth of the table required to be kept in fast memory (e.g. RAM). Neural Networks offer a solution at the expense of representation, among other things. Yet if we analyze the learning task in terms of the goal arity, we can approach problems of larger state spaces with Discretized Q-learning algorithm, by dramatically reducing memory requirements for the Q table. This paper demonstrates the methodology using the well-known LightsOut puzzle. While analytical methods of solving some variants of this puzzle are known, we proceed under a different premise of putting the learning agent in a much more difficult position of having virtually no initial information about the rules of the puzzle and often not being able to perceive the symmetries inherent in the two-dimensional puzzle grid. Without the described savings a standard and already thrifty Q-learning solution to a 5×5 variant of the puzzle would require over 3 gigabytes of RAM with 4 bytes per table entry, while our Discretized Q-learning solution required only 5 bits per table entry, allowing us to solve the puzzle on a 1 gigabyte machine. Finally, we also consider a variant of this puzzle for which no analytical solutions are known. The Discretized Q-learning algorithm is equally applicable to the new puzzle and can solve it with exactly the same effort.

Additional Metadata
Keywords Game playing, Machine learning, Reinforcement learning
Conference IASTED International Conference on Artificial Intelligence and Applications, AIA 2006
Batalov, D.V. (Denis V.), & Oommen, J. (2006). Turning lights out with DQ-learning. In Proceedings of the IASTED International Conference on Artificial Intelligence and Applications, AIA 2006 (pp. 451–456).