An actor-critic reinforcement learning-based resource management in mobile edge computing systems
International Journal of Machine Learning and Cybernetics
Reinforcement learning (RL) as an effective tool has attracted great attention in wireless communication field nowadays. In this paper, we investigate the offloading decision and resource allocation problem in mobile edge computing (MEC) systems based on RL methods. Different from existing literature, our research focuses on improving mobile operators’ revenue by maximizing the amount of the offloaded tasks while decreasing the energy expenditure and time-delays. Considering the dynamic characteristics of wireless environment, the above problem is modeled as a Markov decision process (MDP). Since the action space of the MDP is multidimensional continuous variables mixed with discrete variables, traditional RL algorithms are powerless. Therefore, an actor-critic (AC) with eligibility traces algorithm is proposed to resolve the problem. The actor part introduces the parameterized normal distribution to generate the probabilities of continuous stochastic actions, and the critic part employs a linear approximator to estimate the value of states, based on which the actor part updates policy parameters in the direction of performance improvement. Furthermore, an advantage function is designed to reduce the variance of the learning process. Simulation results indicate that the proposed algorithm can find the best strategy to maximize the amount of the tasks executed by the MEC server while decreasing the energy consumption and time-delays.
|Actor-critic algorithm, Eligibility traces, Mobile edge computing, Reinforcement learning, Resource allocation|
|International Journal of Machine Learning and Cybernetics|
|Organisation||Department of Systems and Computer Engineering|
Fu, F. (Fang), Zhang, Z. (Zhicai), Yu, F.R, & Yan, Q. (Qiao). (2020). An actor-critic reinforcement learning-based resource management in mobile edge computing systems. International Journal of Machine Learning and Cybernetics. doi:10.1007/s13042-020-01077-8