A multi-agent policy iteration learning algorithm is proposed in this work. The Exponential Moving Average (EMA) mechanism is used to update the policy for a Q-learning agent so that it converges to an optimal policy against the policies of the other agents. The proposed EMA Q-learning algorithm is examined on a variety of matrix and stochastic games. Simulation results show that the proposed algorithm converges in a wider variety of situations than state-of-the-art multi-agent reinforcement learning (MARL) algorithms.

Additional Metadata
Persistent URL dx.doi.org/10.1109/ADPRL.2013.6614986
Conference 2013 4th IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2013
Citation
Awheda, M.D. (Mostafa D.), & Schwartz, H.M. (2013). Exponential moving average Q-learning algorithm. In IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL (pp. 31–38). doi:10.1109/ADPRL.2013.6614986