Exponential moving average Q-learning algorithm
A multi-agent policy iteration learning algorithm is proposed in this work. The Exponential Moving Average (EMA) mechanism is used to update the policy for a Q-learning agent so that it converges to an optimal policy against the policies of the other agents. The proposed EMA Q-learning algorithm is examined on a variety of matrix and stochastic games. Simulation results show that the proposed algorithm converges in a wider variety of situations than state-of-the-art multi-agent reinforcement learning (MARL) algorithms.
|2013 4th IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2013|
|Organisation||Department of Systems and Computer Engineering|
Awheda, M.D. (Mostafa D.), & Schwartz, H.M. (2013). Exponential moving average Q-learning algorithm. In IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL (pp. 31–38). doi:10.1109/ADPRL.2013.6614986