## Sungwon Moon and Yujin Lim## |

Parameter | Value |
---|---|

Number of CPU cycles to process one bit data [TeX:] $$\left(C_m\right)$$ | 500 cycles/bit |

Background noise power [TeX:] $$\left(\sigma^2\right)$$ | [TeX:] $$10^{-9} \mathrm{~W}$$ |

Channel bandwidth (B) | 1 MHz |

Length of time slot [TeX:] $$\left(\tau_0\right)$$ | 1 ms |

Task arrival rate [TeX:] $$\left(\lambda_n\right)$$ | 1–3 Mbps |

Local execution power of vehicle n [TeX:] $$\left(P_n^{l o}\right)$$ | 2 W |

Transmission power of vehicle n [TeX:] $$\left(P_n^{t r}\right)$$ | 2 W |

Maximum CPU cycle frequency of vehicle n [TeX:] $$\left(F_n^{\max }\right)$$ | 2.51 GHz |

Discount factor of long-term reward [TeX:] $$(\gamma)$$ | 0.99 |

Update parameter for target network [TeX:] $$(\tau)$$ | 0.001 |

Fig. 2 shows the throughput rate between offloading and local computing under different distance rates. The distance rate is defined as the ratio of the distance between the vehicle and the MECS to the communication radius of the MECS. For example, the distance rate of 0.5 equals to 50% of the RSU radius. This is a comparison of the amount of data executed on MECS by offloading and the amount of data executed locally by the vehicle according to the distance. As the distance between the vehicle and the MECS increases, the throughput by offloading decreases, while the throughput by local computing increases. This is because the channel condition is affected by path loss. The closer the vehicle is to the MECS, the less path loss and the better the channel condition. This increase offloading throughput by consuming more transmission power to offload the MECS as the approaches the MECS. Conversely, when the vehicle moves away from the MECS, the vehicle consumes less transmission power for offloading, resulting in less offloading throughput than local execution.

In Fig. 3, power consumption and throughput under different distance rates are shown when the task arrival rate is 3 and the number of vehicles is 30. The throughput is defined as the total amount of data executed by local and offload. The larger the distance rate, the greater path loss and the worse the channel conditions. The vehicle consumes more transmission power to offload to their neighboring MECSs. As the distance rate increases, the offloading throughput decreases due to deteriorated channel condition even if it consumes more power than when the distance rate is low. Compare1 makes an offloading decision considering power consumption and queuing delay, so it consumes slightly less power than Proposed. Compare2 focuses on minimum power consumption, so it consumes less power. However, Proposed processes about 18-33% more data than Compare1 and Compare2 compared to the amount of power consumption. GD_T consumes the most amount of power because it aims to maximize throughput regardless of power consumption. On the contrary, GD_P processes the least amount of data because GD_P aims to achieve lowest power consumption.

Fig. 4 shows power consumption and throughput under different task arrival rates, respectively. We conducted experiments by setting the distance rate to 0.4 and the number of vehicles to 30. Proposed consumes about 23% more power than Compare1, and processes about 34% more data than it. Proposed is better than GD_T because GD_T has a throughput growth rate of about 12% compared to the power consumption growth rate, while Proposed has a throughput growth rate of about 34% compared to the power consumption growth rate. Proposed processes more data than Compare2 and GD_P, which is focused on minimal power consumption, while it consumes more power than these. As the task arrival rate increases, the load on the system increases, and the increase in power consumption of Proposed compared to the load increase rate is lower than that of other methods. This is because Proposed considers throughput compared to power consumption. Compare1 and Compare2 show that as the task arrival rate increases, power consumption increases and throughput increases rapidly increases. These mainly consider power consumption, so power consumption results in low throughput when the load is low.

Fig. 5 shows the power consumption and throughput under different the number of vehicles. We conducted experiments by setting the distance rate to 0.4 and the task arrival rate to 3. Proposed consumes about 25-43% more power than Compare1 and Compare2, while it processes about 33-49% more data than these. Compare1 and Compare2 model reward as a weighted sum, and Proposed models reward as a logarithmic function. For the reward by using a weighted sum, the weighting coefficient is used to weight on defined parameters. Because the performance of experiments is varying according to the weighting coefficient, the coefficient needs to be selected carefully. However, reward as a logarithmic function uses a ratio in the relationship between defined parameters. Therefore, Proposed balances between power consumption and throughput, while reducing power consumption and increasing throughput. GD_P and GD_T show the best performance on power consumption and throughput, which aims to minimize power consumption and maximize throughput, respectively.

Fig. 6 shows the task completion rate within delay constraints under different task arrival rates. We conducted experiments by setting the distance rate to 0.4 and the number of vehicles to 30. The rate of task completion within the delay constraint is defined as the ratio of the number of completed tasks while satisfying the delay constraint to the total number of tasks. The task completion rate is affected by the throughput. Low throughput means that there is a lot of tasks in the queue to be processed. This can increase the waiting time in the queue, resulting in a lower task completion rate. GD_T has the highest task completion rate because it processes a lot of tasks due to high throughput. Proposed is about 10% close to GD_T. GD_P processes the least amount of data due to low throughput, so there is a lot of tasks in the queue and has the low task completion rate. Proposed has a task completion rate of about 16%– 24% higher than Compare1 and Compare2.

It is difficult to update the optimal offloading policy in real time by observing the dynamic state of the MEC system. Moreover, in the real world, the environment is very dynamic, and large amounts of tasks occur continuously. Existing methods usually model reward functions as a weighted sum, mainly focusing on power consumption. Because a weighted sum has a fixed weight coefficient of parameters, the performance varies greatly according to the weight coefficient, and it is difficult to select the optimal weight coefficient in the dynamically changing environment. However, the logarithmic function we used does not need to set an optimal weight like a weighted sum. Reward increases in proportion to throughput, and reward decreases rapidly as energy consumption increases using the base of the logarithmic function. It reflects a sharp decrease in QoE for power consumption. Our method has a higher task completion rate within the delay constraint due to higher throughput and consumes a little more power than existing methods. Our method is more efficient because it processes a larger amount of data than it consumes a little more power, which can lead to improved QoE. However, there is a limitation in that the status of MECS is not considered. Considering this, future research will be conducted.

In this paper, we considered the dynamic MEC environment with stochastic channel conditions and task arrival rate. We proposed a DRL based offloading method with resource allocation to maximize the long-term reward in terms of throughput and power consumption by dynamically allocating CPU resources for local execution and transmission power for offloading. We described the system model and formulated the offloading problem as MDP. The DDPG is adopted to optimal offloading method, and each vehicle as an agent learns independently its offloading policy. We compared the method proposed in this paper with conventional methods through simulations, and simulation results indicated the proposed method minimizes power consumption of vehicle and maximizes throughput by local execution and by offloading according to the distance between the vehicle and MECS, and the task arrival rate.

In future research, we will study an offloading method how to select the MECS for processing the tasks in consideration of environment such as MECS status. In addition, we will consider each agent learning in cooperation with each other rather than learning independently in a dynamic MEC system. We will extend the offloading method based on a counterfactual multi-agent policy gradient learning, which determines how much each agent contributes to the overall reward in cooperation with each other.

She received B.S., M.S., and Ph.D. degrees in Computer Science from Sookmyung Women's University, Korea, in 1995, 1997 and 2000, respectively, and Ph.D. degree in Information Sciences from Tohoku University, Japan, in 2013. From 2004 to 2015, she was an associate professor in Department of Information Media, Suwon University, Korea. She joined the faculty of IT Engineering at Sookmyung Women's University, Seoul, in 2016, where currently she is a professor. Her research interests include edge computing, intelligent agent system, and artificial intelligence.

- 1 A. Al-Fuqaha, M. Guizani, M. Mohammadi, M. Aledhari, and M. Ayyash, "Internet of Things: a survey on enabling technologies, protocols, and applications," IEEE Communications Surveys & Tutorials, vol. 17, no. 4, pp. 2347-2376, 2015. https://doi.org/10.1109/COMST.2015.2444095doi:[[[10.1109/COMST.2015.2444095]]]
- 2 S. Liu, L. Liu, J. Tang, B. Y u, Y . Wang, and W. Shi, "Edge computing for autonomous driving: opportunities and challenges,"
*Proceedings of the IEEE*, vol. 107, no. 8, pp. 1697-1716, 2019. https://doi.org/10.1109/JPROC.2019.2915983doi:[[[10.1109/JPROC.2019.2915983]]] - 3 Q. Wu, H. Liu, R. Wang, P . Fan, Q. Fan, and Z. Li, "Delay-sensitive task offloading in the 802.11p-based vehicular fog computing systems," IEEE Internet of Things Journal, vol. 7, no. 1, pp. 773-785, 2020. https://doi.org/10.1109/JIOT.2019.2953047doi:[[[10.1109/JIOT.2019.2953047]]]
- 4 K. Zhang, Y . Zhu, S. Leng, Y . He, S. Maharjan, and Y . Zhang, "Deep Learning empowered task offloading for mobile edge computing in urban informatics,"
*IEEE Internet of Things Journal*, vol. 6, no. 5, pp. 76357647, 2019. https://doi.org/10.1109/JIOT.2019.2903191doi:[[[10.1109/JIOT.2019.2903191]]] - 5 Y . Dai, D. Xu, S. Maharjan, and Y . Zhang, "Joint load balancing and offloading in vehicular edge computing and networks," IEEE Internet of Things Journal, vol. 6, no. 3, pp. 4377-4387, 2019. https://doi.org/10.1109/JIOT.2018.2876298doi:[[[10.1109/JIOT.2018.2876298]]]
- 6 Y . Liu, H. Y u, S. Xie, and Y . Zhang, "Deep reinforcement learning for offloading and resource allocation in vehicle edge computing and networks,"
*IEEE Transactions on V ehicular Technology*, vol. 68, no. 11, pp. 11158-11168, 2019. https://doi.org/10.1109/TVT.2019.2935450doi:[[[10.1109/TVT.2019.2935450]]] - 7 A. Sadiki, J. Bentahar; R. Dssouli and A. En-Nouaary, "Deep reinforcement learning for the computation offloading in MIMO-based edge computing," Ad Hoc Networks , vol. 141, article no. 103080, 2023. https://doi.org/10.1016/j.adhoc.2022.103080doi:[[[10.1016/j.adhoc.2022.103080]]]
- 8 X. Chen, H. Zhang, C. Wu, S. Mao, Y . Ji, and M. Bennis, "Performance optimization in mobile-edge computing via deep reinforcement learning," in
*Proceedings of 2018 IEEE 88th V ehicular Technology Conference (VTC-Fall)*, Chicago, IL, USA, 2018, pp. 1-6. https://doi.org/10.1109/VTCFall.2018.8690980doi:[[[10.1109/VTCFall.2018.8690980]]] - 9 Z. Cheng, M. Min, M. Liwang, L. Huang, and G. Zhibin, "Multi-agent DDPG-based joint task partitioning and power control in fog computing networks," IEEE Internet of Things Journal, vol. 9, no. 1, pp. 104-116, 2022. https://doi.org/10.1109/JIOT.2021.3091508doi:[[[10.1109/JIOT.2021.3091508]]]
- 10 M. Li, J. Gao, L. Zhao, and X. Shen, "Deep reinforcement learning for collaborative edge computing in vehicular networks,"
*IEEE Transactions on Cognitive Communications and Networking*, vol. 6, no. 4, pp. 1122-1135, 2020. https://doi.org/10.1109/TCCN.2020.3003036doi:[[[10.1109/TCCN.2020.3003036]]] - 11 J. Ren and S. Xu, "DDPG based computation offloading and resource allocation for MEC Systems with energy harvesting," in
*Proceedings of 2021 IEEE 93rd V ehicular Technology Conference (VTC2021-Spring)*, Helsinki, Finland, 2021, pp. 1-5. https://doi.org/10.1109/VTC2021-Spring51267.2021.9448922doi:[[[10.1109/VTC-Spring51267.2021.9448922]]] - 12 X. Chen, H. Ge, L. Liu, S. Li, J. Han, and H. Gong, "Computing offloading decision based on DDPG algorithm in mobile edge computing," in
*Proceedings of 2021 IEEE 6th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA)*, Chengdu, China, 2021, pp. 391-399. https://doi.org/10.1109/ICCCBDA51879.2021.9442599doi:[[[10.1109/ICCCBDA51879.2021.9442599]]] - 13 S. Moon and Y . Lim, "Performance Comparison of Deep Reinforcement Learning based Computation Offloading in MEC,"
*Proceedings of Annual Conference of KIPS*, vol. 29, no. 1, pp. 52-55, 2022.custom:[[[-]]] - 14 K. Jiang, H. Zhou, D. Li, X. Liu, and S. Xu, "A Q-learning based method for energy-efficient computation offloading in mobile edge computing," in
*Proceedings of 2020 29th International Conference on Computer Communications and Networks (ICCCN)*, Honolulu, HI, USA, 2020, pp. 1-7. https://doi.org/10.1109/ICCCN49398.2020.9209738doi:[[[10.1109/ICCCN49398.2020.9209738]]] - 15 B. Dab, N. Aitsaadi, and R. Langar, "Q-learning algorithm for joint computation offloading and resource allocation in edge cloud," in
*Proceedings of 2019 IFIP/IEEE Symposium on Integrated Network and Service Management (IM)*, Arlington, V A, USA, Apr. 2019, pp. 45-52.custom:[[[https://ieeexplore.ieee.org/document/8717836]]] - 16 H. Zhu, Q. Wu, X. J. Wu, Q. Fan, P . Fan, and J. Wang, "Decentralized power allocation for MIMO-NOMA vehicular edge computing based on deep reinforcement learning," IEEE Internet of Things Journal, vol. 9, no. 14, pp. 12770-12782, 2022. https://doi.org/10.1109/JIOT.2021.3138434doi:[[[10.1109/JIOT.2021.3138434]]]
- 17 S. E. Mahmoodi, R. N. Uma, and K. P. Subbalakshmi, "Optimal joint scheduling and cloud offloading for mobile applications," IEEE Transactions on Cloud Computing, vol. 7, no. 2, pp. 301-313, 2019. https://doi.org/10.1109/TCC.2016.2560808doi:[[[10.1109/TCC.2016.2560808]]]
- 18 Y . Wang, M. Sheng, X. Wang, L. Wang, and J. Li, "Mobile-edge computing: partial computation offloading using dynamic voltage scaling,"
*IEEE Transactions on Communications*, vol. 64, no. 10, pp. 4268-4282, 2016. https://doi.org/10.1109/TCOMM.2016.2599530doi:[[[10.1109/TCOMM.2016.2599530]]] - 19 S. Raza, M. A. Mirza, S. Ahmad, M. Asif, M. B. Rasheed, and Y . Ghadi, "A V ehicle to vehicle relay-based task offloading scheme in vehicular communication networks,"
*PeerJ Computer Science*, vol. 7, article no. e486, 2021. https://doi.org/10.7717/peerj-cs.486doi:[[[10.7717/peerj-cs.486]]] - 20 P . A. Lopez, M. Behrisch, L. B. Walz, J. Erdmann, Y . P . Flotterod, R. Hilbrich, L. Lucken, J. Rummel, P . Wagner, and E. Wiessner, "Microscopic traffic simulation using SUMO," in
*Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC)*, Maui, HI, USA, 2018, pp. 25752582. https://doi.org/10.1109/ITSC.2018.8569938doi:[[[10.1109/ITSC.2018.8569938]]]