## Qinghua Liu* and Qingping Li*## |

System parameter | Value |
---|---|

Number of small BS V | 12 |

Number of UE N | 15 |

The bandwidth of small BS [TeX:] $$W_{j}$$ | 10–20 MHz |

Maximum number of users that small BS can accesss [TeX:] $$B_{i}$$ | 3–5 |

Noise power [TeX:] $$\sigma^{2}$$ | -80 dBm |

Computing ability of the MEC server [TeX:] $$F_{j}$$ | 10–15 G cycles/s |

The amount of data by the task [TeX:] $$I_{i}$$ | 0.6–3 Mbits |

The amount of computing resources [TeX:] $$D_{i}$$ | 0.4–0.7 Gcycles |

Transmission power P | 0.1–0.5 W |

Computing ability of UE [TeX:] $$i F_{i}$$ | 1–2 G cycles/s |

Weight factor for task offload [TeX:] $$\mu_{i}$$ | 0.6 |

Learning rate [TeX:] $$\alpha$$ | 0.2 |

Reward discount factor [TeX:] $$\gamma$$ | 0.8 |

Maximum unloading cycle T | 60 |

According to the introduction in Section 3.1, the exact definitions of the state, actions and reward functions of the MDP used in the joint calculation and communication resource allocation algorithm under no power control are given below:

(1) State

At any time step t, if a UE chooses to offload its computing task through the small BS j, we say that the UE is in the state [TeX:] $$\varphi_{j}(\forall j \in V);$$ if the UE chooses to perform the computing task locally, it is defined in the state [TeX:] $$\varphi_{0}.$$ Therefore, the UE's state set can be expressed as [TeX:] $$S=\left\{\varphi_{0}, \varphi_{1}, \ldots, \varphi_{V}\right\}.$$

(2) Action

For each time step t, the UE must select and execute an action a in the current state [TeX:] $$s_{t}$$ according to the strategy used, and at the same time the UE transitions from the current state [TeX:] $$s_{t}$$ to the next state [TeX:] $$s_{t+1}\left(\forall s_{t}, s_{t+1} \in S\right).$$ We use [TeX:] $$A=\left\{\phi_{0}, \phi_{1}, \ldots, \phi_{V}\right\}$$ to represent the action space of the UE in the MDP. For a UE, [TeX:] $$a=\phi_{0}$$ means that it chooses to perform the computing task locally. Accordingly, [TeX:] $$a=\phi_{0}(\forall j \in S)$$ means that it chooses the small BS j to offload its computing task to the MEC server.

(3) Reward function

After the agent-environment interaction in each time step t, the UE as an agent will get a feedback from the environment, that is, reward r, which is used to reflect the good results of the UE after performing an action in a certain state of bad situation. The measure of maximizing the benefit rate of the first optimization target MEC proposed above is also the overall cost of the UE, while the average cost of the second optimization target UE is directly based on the ratio of the total system cost to the number of UEs. The reward function in the resource allocation algorithm without power control can be specifically defined as:

Among them, [TeX:] $$\lambda_{1} \text { and } \lambda_{2}$$ are standardized variable.

For the UE i, [TeX:] $$\boldsymbol{T}_{i}$$ represents its calculation task, which is defined as a two-dimensional array [TeX:] $$\left(b_{i}, c_{i}\right),$$ where [TeX:] $$b_{i}$$ (in bits) represents the input data amount of the task, and [TeX:] $$c_{i}$$ (in CPU revolutions per bit) represents the calculation of each bit task, the required CPU revolutions. The values of [TeX:] $$b_{i} \text { and } c_{i}$$ depend on the nature of the specific task and can be obtained by offline measurement. We use [TeX:] $$d_{i}=j$$ to indicate that the i-th UE chooses to offload the task to the MEC server through the j-th small base station, that is, there is [TeX:] $$d_{i} \in\{0,1, \ldots, j, \forall j \in V\}$$ and when [TeX:] $$d_{i}=0$$ it indicates that UE i chooses to perform its computing task locally.

For UE i, if it chooses to execute the task locally, the delay generated by the calculation task [TeX:] $$\boldsymbol{T}_{i}$$ is expressed as:

where [TeX:] $$F_{i}$$ represents the computing power of the UE i and is measured in revolutions per second of the CPU (revolutions per second). We use [TeX:] $$V_{i}$$ to represent the energy consumed by the UE i per second when performing local calculations. The total energy consumed by the local computing task [TeX:] $$\boldsymbol{T}_{i}$$ is defined as:

In this paper, we have considered various service quality requirements of UEs, that is to say, some delay-sensitive UEs (such as mobile phones, monitoring equipment, etc.) need to obtain the lowest possible delay, but can accept higher energy consumption. For some UEs that are sensitive to energy consumption (such as sensor nodes, IoT devices, etc.), they need to meet the minimum energy consumption. These devices usually do not have high requirements for delay. Therefore, we adopt the composite index of calculation cost as in [16] to reflect the total cost of a terminal device performing a calculation task under different service quality requirements. Specifically, when the UE i chooses to execute the computing task [TeX:] $$\boldsymbol{T}_{i}$$ locally the total cost can be defined as:

When a UE is in a low power state, it is more meaningful to reduce energy consumption compared to calculation and transmission delay. Therefore, [TeX:] $$\mu_{i}$$ can be set to 0. On the contrary, when a UE has sufficient power and needs to run some delay-sensitive applications, [TeX:] $$\mu_{i}$$ can be set to 1. Therefore, for devices with different service quality requirements and applications with different service types, the setting of the weighting factor [TeX:] $$\mu_{i}$$ is also one of the indicators to improve the algorithm performance of the system.

In view of the previous research on MEC and mobile networks, in order to facilitate the analysis, we only consider a quasi-static situation: a group of active UEs in a task offload decision period T (for example, hundreds of milliseconds) and its wireless channel conditions. It stays the same, and may change in different cycles but have no effect on the algorithm performance of the system. We also assume that each small base station has only one physical channel and does not overlap with each other. Each UE can select a specific small base station to offload computing tasks to the MEC server.

The transmission power used by the UE i to upload the task input data to the MEC server through the small BS is [TeX:] $$p_{i, j}.$$ In the resource allocation algorithm without power control, we use a fixed transmission power for each UE according to the relevant power control algorithm. At the same time, we give the active UE’s decision combination [TeX:] $$q=\left(Q_{1}, \ldots, Q_{i}\right)$$ and the data transmission rate when the i-th UE selects the j-th small BS to offload its task:

where [TeX:] $$\sigma^{2}$$ represents the noise variance at the j-th small base station, [TeX:] $$g_{i, j}$$ represents the power gain of the channel between the i-th UE and the j-th small base station, and the [TeX:] $$\left(n \in N-\{1\} \text { and } d_{n}=d_{i}\right)$$ term indicates that other than the UE also chooses the j-th small base station to offload UE * to MEC server. The sub-channel bandwidth [TeX:] $$W_{j}^{\text {sub }}$$ of small BS j is modeled as:

Because the MEC server provides powerful computing power (because many telecom operators have the ability to make large-scale infrastructure investments). In addition, because the amount of data in the calculation result is small, compared with the amount of input data and task calculation, the feedback delay is also negligible. Therefore, when the small base station j performs the calculation task [TeX:] $$T_{i}$$ remotely on the MEC server, the delay can be expressed as

The energy consumption mainly comes from the UE transmitting the task input data to the small base station, which can be expressed as:

When the terminal device i chooses to offload the task to the MEC server through the small base station j , we can give the following definition, and use the weighted sum of execution delay and energy consumption to describe the UEI calculation total cost.

In this section, MATLAB simulation software is used to verify the performance of the algorithm proposed in this paper, and comparison is made with the algorithms in [7] and [8], which are also based on reinforcement learning, and the algorithms of [9] and [11], which are two new algorithms of computing offload decision and resource allocation.

The simulation scenario is a MEC system composed of multiple small base stations and multiple UEs that supports dense networking. The size of the simulation area is 1,200 m × 1,200 m. Both the UE and the small base station are randomly distributed within the simulation area. As shown in Table 1, other simulation parameters take values according to Table 1 unless otherwise specified. In addition, in all simulation experiments, for any transmission power involved in no power control, the maximum transmission power is used for all UEs.

Fig. 3 is a graph of the relationship between the task execution cost and the amount of task input data obtained by the proposed algorithm, the algorithm in [7], and the algorithm in [8]. Here, the task unloading weight factor [TeX:] $$\mu_{i}$$ is set to values 0.5, 0.6, 0.7, respectively, and comparative analysis is made in three cases. As can be seen from Fig. 3(a), 3(b), and 3(c), in three cases, as the amount of task input data increases, the task execution energy consumption of the three algorithms increases, because the task input data The increase in the amount will cause the task execution delay to increase, which in turn will cause the task execution energy consumption to increase. Under the same task input data volume, the total computational cost of the proposed algorithm task is 5%–10% less than that of the two comparison algorithms. This is because the algorithm proposed in this paper considers the problems of dynamic offloading, joint access and resource optimization under multiple servers, and whether the terminal equipment benefits from MEC is measured according to its comprehensive cost. With the addition of power control, the average over.head of all terminal devices in the entire system is reduced to a certain extent.

Fig. 4 is a comparison of task execution cost when the proposed algorithm has unloading weight factors [TeX:] $$\mu_{i}$$ set to values 0.5, 0.6, and 0.7, respectively. It can be seen that when the [TeX:] $$\mu_{i}$$ value is 0.6, the execution cost of the proposed algorithm task is the smallest. Therefore, in subsequent experiments, the proposed task unloading weight factor [TeX:] $$\mu_{i}$$ all takes a value of 0.6.

Two new algorithms of [9] and [11] are selected for comparative experiments. In this experiment, the proposed algorithm task offload weight factor value [TeX:] $$\mu_{i}$$ is 0.6.

Fig. 5 shows the relationship between the task execution cost and the amount of task input data obtained by the proposed algorithm, the algorithm in [9], and the algorithm in [11]. Under the same task input data volume, the total computational cost of the proposed algorithm task is more than 5%, less than the two comparison algorithms. In addition, as the minimum rate requirement for task transmission increases, the task execution energy consumption increases. The higher the minimum transmission rate requirement,

the higher the transmission power required, so more task execution overhead is consumed.

Based on reinforcement learning, the joint calculation and communication resource allocation algorithm optimizes the data task offload and power control strategy for each terminal device. When the terminal device chooses to offload its task to the MEC server, the choice of power becomes more flexible. At the same time, the impact on other terminal equipment is also relatively reduced, providing better performance optimization for the coordinated operation of the entire system.

As can be seen from Fig. 6, as the amount of computing resources required to complete the task increases, the total cost of task calculation increases. This is because more intensive computing tasks lead to increased execution delay, which in turn consumes more energy. From the comparison of the three algorithms, we can see that the algorithm of [9] has the highest algorithm cost and a linear growth trend. The total cost of the algorithm in [11] and the algorithm proposed in this paper decreases in turn, and the cost of the latter is lower than the task execution cost of the comparison algorithms. This is because the algorithm of [9] only supports random offloading of tasks and is not optimized according to the amount of computing resources required to complete the task.

Fig. 7 compares the total computational cost of each algorithm task in two scenarios. It can be seen that the task execution energy consumption of the algorithm in [9] in the best and worst cases is the highest among the three algorithms. This is because the authors [11] uses system task execution energy consumption and optimizes task offloading and resource allocation strategies as expected, but fails to take into account user fairness, resulting in a large gap between the total task computations overhead in both cases. The optimal energy consumption of the algorithm in [11] is lower than that of the algorithm in [9], which can reflect the advantage of task offloading to save equipment energy consumption to a certain extent, but due to its failure to calculate the channel gain and the load situation of the MEC server determines the joint strategy. The worst-case task execution cost is still close to the algorithm of [9].

In addition, in [11], the total task computational cost is lower than the energy consumption obtained by the algorithm proposed in this paper in the worst case, but the gap is small. The algorithm proposed in this paper is based on the joint algorithm of computing and communication resource allocation for reinforcement learning, and each UE is jointly optimized for data task offloading and power control strategy. When the terminal device chooses to offload its task to the MEC server, the choice of power becomes more flexible. At the same time, the impact on other UEs is relatively reduced, providing better performance optimization for the coordinated operation of the entire system.

Theoretical knowledge of reinforcement learning used in the study is introduced in detail, including several commonly used algorithms and frameworks in reinforcement learning. In order to highlight the comparison of the effects of resource allocation algorithms and the results of simulation experiments, a joint algorithm for computing and communication resources allocation based on reinforcement learning is proposed, and the algorithm is analyzed in detail, and the simulation experiments are carried out, and finally the simulation results are compared, analyzed and summarized. The problems of computing task offload and power control in the MEC system with multiple base stations alleviate the data processing pressure of the MEC server.

However, there are still some problems and deficiencies in the joint resource allocation algorithm proposed in this paper. In the task offloading model, we propose an assumption that the wireless channel conditions of a group of terminal devices remain the same during the same task offloading period. This is difficult to achieve in application scenarios. Therefore, for this problem, in the next stage of research, we will add the time-varying situation of the wireless channel to the resource allocation system model, so that the performance of the algorithm is closer to the real scene, to further expand reinforcement learning in MEC application in the system.

She has got Master’s degree of Computer Application Technology. She Graduated from Shanghai University in 2011. She is working in Zhejiang Yuying College of Vocational Technology. She is a lecturer now. Her research interests include computer information management technology and computer network technology.

- 1 P. Mach, Z. Becvar, "Mobile edge computing: a survey on architecture and computation offloading,"
*IEEE Communications Surveys Tutorials*, vol. 19, no. 3, pp. 1628-1656, 2017.doi:[[[10.1109/COMST.2017.2682318]]] - 2 C. Li, J. Tang, Y. Luo, "Dynamic multi-user computation offloading for wireless powered mobile edge computing,"
*Journal of Network and Computer Applications*, vol. 131, pp. 1-15, 2019.custom:[[[-]]] - 3 Y. Mao, C. You, J. Zhang, K. Huang, K. B. Letaief, "A survey on mobile edge computing: the com-munication perspective,"
*IEEE Communications Surveys Tutorials*, vol. 19, no. 4, pp. 2322-2358, 2017.custom:[[[-]]] - 4 J. Liu, Y. Mao, J. Zhang, K. B. Letaief, "Delay-optimal computation task scheduling for mobile-edge computing systems," in
*Proceedings of 2016 IEEE International Symposium on Information Theory (ISIT)*, Barcelona, Spain, 2016;pp. 1451-1455. custom:[[[-]]] - 5 M. H. Chen, B. Liang, M. Dong, "Joint offloading decision and resource allocation for multi-user multi-task mobile cloud," in
*Proceedings of 2016 IEEE International Conference on Communications (ICC)*, Kuala Lumpur, Malaysia, 2016;pp. 1-6. custom:[[[-]]] - 6 M. H. Chen, B. Liang, M. Dong, "Joint offloading and resource allocation for computation and communication in mobile cloud with computing access point," in
*Proceedings of the IEEE Conference on Computer Communications (INFOCOM)*, Atlanta, GA, 2017;pp. 1-9. custom:[[[-]]] - 7 J. Li, H. Gao, T. Lv, Y. Lu, "Deep reinforcement learning based computation offloading and resource allocation for MEC," in
*Proceedings of 2018 IEEE Wireless Communications and Networking Conference (WCNC)*, Barcelona, Spain, 2018;pp. 1-6. custom:[[[-]]] - 8 Z. Wei, B. Zhao, J. Su, X. Lu, "Dynamic edge computation offloading for internet of things with energy harvesting: a learning method,"
*IEEE Internet of Things Journal*, vol. 6, no. 3, pp. 4436-4447, 2018.custom:[[[-]]] - 9 W. Chen, Y. He, J. Qiao, "Cost minimization for cooperative mobile edge computing systems," in
*Proceedings of 2019 28th Wireless and Optical Communications Conference (WOCC)*, Beijing, China, 2019;pp. 1-5. custom:[[[-]]] - 10 T. Q. Dinh, J. Tang, Q. D. La, T. Q. Quek, "Offloading in mobile edge computing: task allocation and computational frequency scaling,"
*IEEE Transactions on Communications*, vol. 65, no. 8, pp. 3571-3584, 2017.doi:[[[10.1109/TCOMM.2017.2699660]]] - 11 Z. Zhang, J. Wu, L. Chen, G. Jiang, S. K. Lam, "Collaborative task offloading with computation result reusing for mobile edge computing,"
*The Computer Journal*, vol. 62, no. 10, pp. 1450-1462, 2019.custom:[[[-]]] - 12 T. Alfakih, M. M. Hassan, A. Gumaei, C. Savaglio, G. Fortino, "Task offloading and resource allocation for mobile edge computing by deep reinforcement learning based on SARSA,"
*IEEE Access*, vol. 8, pp. 54074-54084, 2020.custom:[[[-]]] - 13 T. D. Parker, C. F. Slattery, J. Zhang, J. M. Nicholas, R. W. Paterson, A. J. Foulkes, et al., "Cortical microstructure in young onset Alzheimer's disease using neurite orientation dispersion and density imaging,"
*Human Brain Mapping*, vol. 39, no. 7, pp. 3005-3017, 2018.custom:[[[-]]] - 14 D. Ramachandran, R. Gupta, "Smoothed sarsa: reinforcement learning for robot delivery tasks," in
*Proceedings of 2009 IEEE International Conference on Robotics and Automation*, Kobe, Japan, 2009;pp. 2125-2132. custom:[[[-]]] - 15 A. Larmo, R. Susitaival, "RAN overload control for machine type communications in LTE," in
*Proceedings of 2012 IEEE Globecom Workshops*, Anaheim, CA, 2012;pp. 1626-1631. custom:[[[-]]] - 16 X. Chen, L. Jiao, W. Li, X. Fu, "Efficient multi-user computation offloading for mobile-edge cloud computing,"
*IEEE/ACM Transactions on Networking*, vol. 24, no. 5, pp. 2795-2808, 2015.doi:[[[10.1109/TNET.2015.2487344]]]