This paper addresses the case of dual learning in the pursuit-evasion (PE) differential game and examines how fast the players can learn their default control strategies. The players should learn their default control strategies simultaneously by interacting with each other. Each player's learning process depends on the rewards received from its environment. The learning process is implemented using a two stage learning algorithm that combines the particle swarm optimization (PSO)-based fuzzy logic control (FLC) algorithm with the Q-Learning fuzzy inference system (QFIS) algorithm. The PSO algorithm is used as a global optimizer to autonomously tune the parameters of a fuzzy logic controller whereas the QFIS algorithm is used as a local optimizer. The two stage learning algorithm is compared through simulation with the default control strategy, the PSO-based FLC algorithm, and the QFIS algorithm. Simulation results show that the players are able to learn their default control strategies. Also, it shows that the two stage learning algorithm outperforms the PSO-based FLC algorithm and the QFIS algorithm with respect to the learning time.

Additional Metadata
Persistent URL dx.doi.org/10.1109/ADPRL.2014.7010641
Conference 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2014
Citation
Al-Talabi, A.A. (Ahmad A.), & Schwartz, H.M. (2014). A two stage learning technique for dual learning in the pursuit-evasion differential game. In IEEE SSCI 2014 - 2014 IEEE Symposium Series on Computational Intelligence - ADPRL 2014: 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, Proceedings. doi:10.1109/ADPRL.2014.7010641