<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xml:lang="en" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CMC</journal-id>
<journal-id journal-id-type="nlm-ta">CMC</journal-id>
<journal-id journal-id-type="publisher-id">CMC</journal-id>
<journal-title-group>
<journal-title>Computers, Materials &#x0026; Continua</journal-title>
</journal-title-group>
<issn pub-type="epub">1546-2226</issn>
<issn pub-type="ppub">1546-2218</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">55614</article-id>
<article-id pub-id-type="doi">10.32604/cmc.2024.055614</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>A Task Offloading Strategy Based on Multi-Agent Deep Reinforcement Learning for Offshore Wind Farm Scenarios</article-title>
<alt-title alt-title-type="left-running-head">A Task Offloading Strategy Based on Multi-Agent Deep Reinforcement Learning for Offshore Wind Farm Scenarios</alt-title>
<alt-title alt-title-type="right-running-head">A Task Offloading Strategy Based on Multi-Agent Deep Reinforcement Learning for Offshore Wind Farm Scenarios</alt-title>
</title-group>
<contrib-group>
<contrib id="author-1" contrib-type="author">
<name name-style="western"><surname>Song</surname><given-names>Zeshuang</given-names></name><xref ref-type="aff" rid="aff-1">1</xref></contrib>
<contrib id="author-2" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Wang</surname><given-names>Xiao</given-names></name><xref ref-type="aff" rid="aff-1">1</xref><email>xwang9@gzu.edu.cn</email></contrib>
<contrib id="author-3" contrib-type="author">
<name name-style="western"><surname>Wu</surname><given-names>Qing</given-names></name><xref ref-type="aff" rid="aff-1">1</xref></contrib>
<contrib id="author-4" contrib-type="author">
<name name-style="western"><surname>Tao</surname><given-names>Yanting</given-names></name><xref ref-type="aff" rid="aff-1">1</xref></contrib>
<contrib id="author-5" contrib-type="author">
<name name-style="western"><surname>Xu</surname><given-names>Linghua</given-names></name><xref ref-type="aff" rid="aff-1">1</xref></contrib>
<contrib id="author-6" contrib-type="author">
<name name-style="western"><surname>Yin</surname><given-names>Yaohua</given-names></name><xref ref-type="aff" rid="aff-2">2</xref></contrib>
<contrib id="author-7" contrib-type="author">
<name name-style="western"><surname>Yan</surname><given-names>Jianguo</given-names></name><xref ref-type="aff" rid="aff-3">3</xref></contrib>
<aff id="aff-1"><label>1</label><institution>Department of Electrical Engineering, Guizhou University</institution>, <addr-line>Guiyang, 550025</addr-line>, <country>China</country></aff>
<aff id="aff-2"><label>2</label><institution>Powerchina Guiyang Engineering Corporation Limited</institution>, <addr-line>Guiyang, 550081</addr-line>, <country>China</country></aff>
<aff id="aff-3"><label>3</label><institution>Powerchina Guizhou Engineering Co</institution>., <institution>Ltd</institution>., <addr-line>Guiyang, 550001</addr-line>, <country>China</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>&#x002A;</label>Corresponding Author: Xiao Wang. Email: <email>xwang9@gzu.edu.cn</email></corresp>
</author-notes>
<pub-date date-type="collection" publication-format="electronic">
<year>2024</year></pub-date>
<pub-date date-type="pub" publication-format="electronic"><day>15</day><month>10</month><year>2024</year></pub-date>
<volume>81</volume>
<issue>1</issue>
<fpage>985</fpage>
<lpage>1008</lpage>
<history>
<date date-type="received">
<day>02</day>
<month>7</month>
<year>2024</year>
</date>
<date date-type="accepted">
<day>30</day>
<month>8</month>
<year>2024</year>
</date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2024 The Authors.</copyright-statement>
<copyright-year>2024</copyright-year>
<copyright-holder>Published by Tech Science Press.</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CMC_55614.pdf"></self-uri>
<abstract>
<p>This research is the first application of Unmanned Aerial Vehicles (UAVs) equipped with Multi-access Edge Computing (MEC) servers to offshore wind farms, providing a new task offloading solution to address the challenge of scarce edge servers in offshore wind farms. The proposed strategy is to offload the computational tasks in this scenario to other MEC servers and compute them proportionally, which effectively reduces the computational pressure on local MEC servers when wind turbine data are abnormal. Finally, the task offloading problem is modeled as a multi-intelligent deep reinforcement learning problem, and a task offloading model based on Multi-Agent Deep Reinforcement Learning (MADRL) is established. The Adaptive Genetic Algorithm (AGA) is used to explore the action space of the Deep Deterministic Policy Gradient (DDPG), which effectively solves the problem of slow convergence of the DDPG algorithm in the high-dimensional action space. The simulation results show that the proposed algorithm, AGA-DDPG, saves approximately 61.8%, 55%, 21%, and 33% of the overall overhead compared to local MEC, random offloading, TD3, and DDPG, respectively. The proposed strategy is potentially important for improving real-time monitoring, big data analysis, and predictive maintenance of offshore wind farm operation and maintenance systems.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Offshore wind</kwd>
<kwd>MEC</kwd>
<kwd>task offloading</kwd>
<kwd>MADRL</kwd>
<kwd>AGA-DDPG</kwd>
</kwd-group>
<funding-group>
<award-group id="awg1">
<funding-source>National Natural Science Foundation of China</funding-source>
<award-id>61861007</award-id>
</award-group>
<award-group id="awg2">
<funding-source>Guizhou Province Science and Technology Planning</funding-source>
<award-id>[2021]303</award-id>
</award-group>
<award-group id="awg3">
<funding-source>Guizhou Province Science Technology Support Plan</funding-source>
<award-id>[2022]264</award-id>
<award-id>[2023]096</award-id>
<award-id>[2023]409</award-id>
<award-id>[2023]412</award-id>
</award-group>
<award-group id="awg4">
<funding-source>Science Technology Project of POWERCHINA Guizhou Engineering</funding-source>
<award-id>DJ-ZDXM-2022-44</award-id>
</award-group>
<award-group id="awg5">
<funding-source>Project of POWERCHINA Guiyang Engineering Corporation Limited</funding-source>
<award-id>YJ2022-12</award-id>
</award-group>
</funding-group>
</article-meta>
</front>
<body>
<sec id="s1">
<label>1</label>
<title>Introduction</title>
<p>Under the new power system, offshore wind power is an important means for China to realize the goals of &#x201C;2030 carbon peak&#x201D; and &#x201C;2060 carbon neutral&#x201D; [<xref ref-type="bibr" rid="ref-1">1</xref>,<xref ref-type="bibr" rid="ref-2">2</xref>]. Offshore wind energy technology holds abundant resources and promising prospects, poised to become a cornerstone of future green energy [<xref ref-type="bibr" rid="ref-3">3</xref>]. Climate change and the increasing demand for renewable energy are driving the rapid development of offshore wind farms [<xref ref-type="bibr" rid="ref-4">4</xref>]. Beyond offering new opportunities for the energy industry, offshore wind power reduces greenhouse gas emissions, lowers energy costs, and fosters economic growth [<xref ref-type="bibr" rid="ref-5">5</xref>]. Ensuring the efficient operation of offshore wind farms has made real-time monitoring and optimization strategies crucial [<xref ref-type="bibr" rid="ref-6">6</xref>]. Traditional monitoring methods rely heavily on sensor networks and remote data transmission, which are often costly and susceptible to disruptions from marine environments [<xref ref-type="bibr" rid="ref-7">7</xref>]. Edge computing represents an emerging computing architecture placing computational resources near data sources to reduce latency and enhance responsiveness [<xref ref-type="bibr" rid="ref-8">8</xref>]. Task offloading plays a critical role in edge computing by shifting computational tasks from central data centers to edge nodes near data sources, effectively reducing data processing delays [<xref ref-type="bibr" rid="ref-9">9</xref>]. Therefore, addressing how task offloading strategies and resource allocation schemes can mitigate data processing latency and improve system responsiveness and real-time capabilities in offshore wind farms is a pressing issue.</p>
<p>Deep Reinforcement Learning (DRL) has gradually been applied to task offloading [<xref ref-type="bibr" rid="ref-10">10</xref>,<xref ref-type="bibr" rid="ref-11">11</xref>]. In [<xref ref-type="bibr" rid="ref-12">12</xref>], the authors investigate a Multiple Input Multiple Output (MIMO) system with a stochastic wireless channel, employing the DDPG method for handling continuous action DRL. However, DDPG&#x2019;s performance overly relies on the critic network, making it sensitive to critic updates and resulting in poor stability and slow convergence during computational offloading. In [<xref ref-type="bibr" rid="ref-13">13</xref>], the authors addressed the cost of offloading from user devices and pricing strategies for MEC servers, proposing a MADRL algorithm to solve profit-based pricing problems. Nevertheless, the Deep Q-Network (DQN) algorithm used faces instability in handling complex task offloading scenarios, which hinders performance assurance. In [<xref ref-type="bibr" rid="ref-14">14</xref>], the authors studied joint optimization schemes for wireless resource coordination and partial task offloading scheduling. To address the slow convergence issue caused by high-dimensional actions in DDPG, noise exploration is introduced in the action outputs of participating networks. However, similar to DDPG, this method also requires traversing the entire action space, limiting its practical application effectiveness. In [<xref ref-type="bibr" rid="ref-15">15</xref>], the authors formulated optimization problems such as latency, energy consumption, and operator costs during offloading as Markov Decision Processes (MDPs). They propose a DRL-based solution but encounter challenges with low efficiency in experience replay utilization, leading to suboptimal learning efficiency. In [<xref ref-type="bibr" rid="ref-16">16</xref>], the authors modelled the resource allocation problem as a Markov game and propose a generative adversarial LSTM framework to enhance resource allocation among unmanned aerial vehicles (UAVs) in machine-to-machine (M2M) communication. The study successfully addresses scenarios where multiple UAVs act as learning agents. However, the computational complexity of this algorithm may constrain its implementation in large-scale M2M networks, particularly in scenarios involving high-speed moving UAVs. While existing literature has made some strides in applying Deep Reinforcement Learning to task offloading, it still faces numerous challenges. Addressing these challenges, this study introduces a novel task offloading strategy combining Adaptive Genetic Algorithm and Deep Deterministic Policy Gradient algorithm (AGA-DDPG), aimed at enhancing operational efficiency and reducing maintenance costs in offshore wind farms.</p>
<p>In summary, operational challenges faced by offshore wind farms, particularly in computational offloading and edge computing, remain substantial. Existing research predominantly focuses on onshore environments, resulting in limited exploration of offloading strategies in marine settings. Studies involving multiple users and multiple MEC servers encounter exponential growth in state and action spaces, leading to slow convergence in problem resolution. Moreover, current binary offloading models lack flexibility and efficiency, potentially leading to increased operational costs and reduced efficiency. In response to the aforementioned issues, this paper proposes a task offloading strategy based on multi-agent deep reinforcement learning for offshore wind farm scenarios. The specific contributions are as follows:</p>
<p>1. Innovative task offloading strategy: We introduce a novel approach utilizing UAVs as airborne MEC servers to optimize computational resource allocation under dynamic network conditions, significantly improving monitoring and maintenance efficiency in offshore wind farms.</p>
<p>2. Development of AGA-DDPG algorithm: We develop the AGA-DDPG model, which enhances the traditional Deep Deterministic Policy Gradient algorithm using an Adaptive Genetic Algorithm to overcome slow convergence in high-dimensional action spaces, thereby improving overall task offloading performance.</p>
<p>3. Multi-agent system framework: This study models the task offloading problem as a Multi-Agent Deep Reinforcement Learning challenge, providing a framework for centralized training and decentralized execution tailored for dynamic offshore environments, representing significant technological advancements for practical applications.</p>
<p>4. Empirical validation and performance evaluation: Through extensive simulation experiments, we validate the effectiveness of our proposed algorithms and demonstrate significant reductions in overall operational costs compared to existing methods, thereby offering practical solutions for the sustainable development of offshore wind farms.</p>
<p>Through these innovations, this research not only advances theoretical developments in task offloading and edge computing but also provides crucial technological support and implementation guidelines for enhancing the efficient operation of offshore wind farms worldwide.</p>
</sec>
<sec id="s2">
<label>2</label>
<title>Related Work</title>
<p>MEC is being applied in multiple fields, such as healthcare, agriculture, industry, the Internet of Vehicles, and the IoT [<xref ref-type="bibr" rid="ref-17">17</xref>]. MEC&#x2019;s computing offloading technology involves offloading computing tasks to MEC servers with strong computing power. However, with the rapid increase of IoT terminal devices, MEC servers also face a shortage of their own resources [<xref ref-type="bibr" rid="ref-18">18</xref>]. Some researchers focused on improving the resource utilization of MEC servers to alleviate computational pressure [<xref ref-type="bibr" rid="ref-19">19</xref>]. On the other hand, some studies focused on the collaborative approach of multiple MEC servers [<xref ref-type="bibr" rid="ref-20">20</xref>]. The algorithms for task offloading can be divided into traditional algorithms and DRL-based algorithms.</p>
<sec id="s2_1">
<label>2.1</label>
<title>Conventional Methods for Task Offloading</title>
<p>Heuristic algorithms are widely used for task offloading. For instance, Chen et al. [<xref ref-type="bibr" rid="ref-21">21</xref>] proposed a heuristic algorithm-based multi-user capability-constrained time optimization method. It provides a viable solution for optimizing workflow completion time under energy constraints. However, heuristic algorithms are noted for their high complexity and are not well-suited for long-term task offloading strategies. Vijayaram et al. [<xref ref-type="bibr" rid="ref-22">22</xref>] introduced a distributed computing framework for efficient task computation offloading and resource allocation in mobile edge environments of wireless IoT devices. Task offloading is treated as a non-convex optimization problem and solved using a meta-heuristic algorithm. Similarly, Karatalay et al. [<xref ref-type="bibr" rid="ref-23">23</xref>] investigated energy-efficient resource allocation in device-to-device (D2D) fog computing scenarios. They proposed a low-complexity heuristic resource allocation strategy to minimize overall energy consumption due to limited transmission power, computational resources, and task processing time. Nevertheless, heuristic algorithms exhibit poor adaptability and may not achieve the effectiveness of DRL over extended operational periods.</p>
</sec>
<sec id="s2_2">
<label>2.2</label>
<title>DRL-Based Methods for Task Offloading</title>
<p>To meet the high demands brought by the explosive growth of computationally intensive and delay-sensitive tasks on mobile user devices. Li et al. [<xref ref-type="bibr" rid="ref-24">24</xref>] proposed a content caching strategy based on Deep Q-Network (DQN) and a computation offloading strategy based on a quantum ant colony algorithm. The content caching solution addresses latency and round-trip load issues associated with repeated requests to remote data centers. However, DQN algorithms face challenges related to convergence. Chen et al. [<xref ref-type="bibr" rid="ref-25">25</xref>] utilized drones to assist in task offloading, aiming to minimize the weighted sum of average latency and energy consumption. Their study highlighted slow convergence due to the high-dimensional action space involved in maneuvering drones for efficient offloading. Guo et al. [<xref ref-type="bibr" rid="ref-26">26</xref>] applied task offloading to emergency scenarios by leveraging the computational capabilities of redundant nodes in large-scale wireless sensor networks, employing a DDPG algorithm to optimize computation offloading strategies. Similarly, Truong et al. [<xref ref-type="bibr" rid="ref-27">27</xref>] formulated the optimization problem as a reinforcement learning model to minimize latency and energy consumption, proposing a DDPG-based solution. They noted challenges arising from the high-dimensional action space and low utilization of historical experience data, which contribute to slow convergence. Similarly, Ke et al. [<xref ref-type="bibr" rid="ref-28">28</xref>] proposed a DLR strategy based on actor-network and critic network structure. This strategy adds a noise after the output action of the actor-network to avoid the complexity brought by high-dimensional action space. To highlight the contribution of this article, we present <xref ref-type="table" rid="table-1">Table 1</xref>.</p>
<table-wrap id="table-1">
<label>Table 1</label>
<caption>
<title>Comparison between current studies and this study</title>
</caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th>Reference</th>
<th>Offloading algorithms</th>
<th>Disadvantages compared with this study</th>
</tr>
</thead>
<tbody>
<tr>
<td>[<xref ref-type="bibr" rid="ref-21">21</xref>&#x2013;<xref ref-type="bibr" rid="ref-23">23</xref>]</td>
<td>Heuristic algorithm</td>
<td>The complexity of the heuristic algorithm is high, and it is not suitable for long-term task offloading.</td>
</tr>
<tr>
<td>[<xref ref-type="bibr" rid="ref-12">12</xref>,<xref ref-type="bibr" rid="ref-24">24</xref>]</td>
<td>DQN</td>
<td>DQN algorithm has the problem of convergence difficulty.</td>
</tr>
<tr>
<td>[<xref ref-type="bibr" rid="ref-13">13</xref>,<xref ref-type="bibr" rid="ref-25">25</xref>,<xref ref-type="bibr" rid="ref-26">26</xref>]</td>
<td>DDPG</td>
<td>There is a problem of slow convergence due to high dimensional motion space.</td>
</tr>
<tr>
<td>[<xref ref-type="bibr" rid="ref-14">14</xref>]</td>
<td>DDPG</td>
<td>There are problems of slow convergence caused by high dimensional action space and low utilization of historical experience data space.</td>
</tr>
<tr>
<td>[<xref ref-type="bibr" rid="ref-27">27</xref>,<xref ref-type="bibr" rid="ref-28">28</xref>]</td>
<td>DDPG, actor and critic</td>
<td>There are problems of slow convergence caused by high dimensional action space and low utilization of historical experience data.</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
<sec id="s3">
<label>3</label>
<title>System Model and Problem Description</title>
<sec id="s3_1">
<label>3.1</label>
<title>Network Model</title>
<p>As shown in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>, offshore wind farms are in remote areas with no cellular coverage, so this paper proposes a space-air-maritime integrated network (SAMIN) to provide network access, task offloading, and other network functions for offshore Wind Turbine Generator (WTG). In the SAG-IoT network, there are three network segments, i.e., the maritime segment, the aerial segment, and the space segment. The WTGs constitute the maritime segment, and the maritime segment uploads the WTG data and the computational tasks to be performed. In the aerial segment, flying UAVs can be used as edge servers to provide task offloading to maritime users. The flying UAVs, such as the Facebook Aquila, can fly for months without charging by using solar panels. In the space segment, one or more LEO satellites provide full coverage of the area of interest, and connect UAVs processing data to cloud servers via a satellite backbone.</p>
<fig id="fig-1">
<label>Figure 1</label>
<caption>
<title>Computational model for multi-MEC collaboration in wind farms</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_55614-fig-1.tif"/>
</fig>
<p>The MEC scenario is assisted by multiple UAVs, consisting of n WSNs and m UAVs equipped with edge servers, with WSNs consisting of monitoring devices for wind turbines and multiple data sensors. To accommodate the complexity associated with the changing network environment in the MEC environment, software-defined networking SDN technology has been applied to the system [<xref ref-type="bibr" rid="ref-29">29</xref>]. The SDN controller is used to centrally train and issue control commands to maintain communication with the MEC server cluster. The set of WSNs is denoted as N&#x003D; {1, 2, . . . , n}, n&#x2208;N, and the set of MEC servers is denoted as M&#x003D; {1, 2, . . . , m}, m&#x2208;M [<xref ref-type="bibr" rid="ref-30">30</xref>]. The data processing tasks are defined as <inline-formula id="ieqn-1"><mml:math id="mml-ieqn-1"><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:msub><mml:mi>I</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msubsup><mml:mi>&#x03C4;</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mo movablelimits="true" form="prefix">max</mml:mo></mml:mrow></mml:msubsup><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula>. Here, <inline-formula id="ieqn-2"><mml:math id="mml-ieqn-2"><mml:msub><mml:mi>I</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, <inline-formula id="ieqn-3"><mml:math id="mml-ieqn-3"><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula id="ieqn-4"><mml:math id="mml-ieqn-4"><mml:msubsup><mml:mi>&#x03C4;</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mo movablelimits="true" form="prefix">max</mml:mo></mml:mrow></mml:msubsup></mml:math></inline-formula> are the task date size, task computational complexity, and maximum delay time to complete the task, respectively [<xref ref-type="bibr" rid="ref-31">31</xref>]. The continuous task processing period T&#x003D;{1, 2, . . .} is divided into multiple time slots, and the size of the time slots is <inline-formula id="ieqn-5"><mml:math id="mml-ieqn-5"><mml:msub><mml:mi>&#x03C4;</mml:mi><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula>. To simulate the realism of the WTG data, the data processing task is randomly generated at the beginning of each time slot. To improve the task offloading efficiency and offloading flexibility, it is assumed that the data processing tasks are divisible and that the offloading ratio decision is determined by the parameter &#x03B3;, which indicates that the local server offloads the computing tasks with ratio &#x03B3; to other servers. Next, the local MEC server is referred to as the offloading user. The symbols are summarized in <xref ref-type="table" rid="table-2">Table 2</xref>.</p>
<table-wrap id="table-2">
<label>Table 2</label>
<caption>
<title>Symbol summary</title>
</caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th>Symbols</th>
<th>Description</th>
<th>Symbols</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>M</td>
<td>MEC server collection</td>
<td><inline-formula id="ieqn-6"><mml:math id="mml-ieqn-6"><mml:msup><mml:mi>p</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula></td>
<td>Transmitted power</td>
</tr>
<tr>
<td>T</td>
<td>Time slot period</td>
<td><inline-formula id="ieqn-7"><mml:math id="mml-ieqn-7"><mml:msup><mml:mi>&#x03B4;</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula></td>
<td>The noise variance</td>
</tr>
<tr>
<td>&#x03B3;</td>
<td>Offloading ratio</td>
<td><inline-formula id="ieqn-8"><mml:math id="mml-ieqn-8"><mml:msub><mml:mi>S</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td>State space</td>
</tr>
<tr>
<td><inline-formula id="ieqn-9"><mml:math id="mml-ieqn-9"><mml:msubsup><mml:mi>&#x03C4;</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mo movablelimits="true" form="prefix">max</mml:mo></mml:mrow></mml:msubsup></mml:math></inline-formula></td>
<td>Maximum delay to complete the task</td>
<td><inline-formula id="ieqn-10"><mml:math id="mml-ieqn-10"><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td>Action space</td>
</tr>
<tr>
<td><inline-formula id="ieqn-11"><mml:math id="mml-ieqn-11"><mml:msub><mml:mi>I</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td>Task data size</td>
<td><inline-formula id="ieqn-12"><mml:math id="mml-ieqn-12"><mml:msup><mml:mi>r</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula></td>
<td>Reward mechanism</td>
</tr>
<tr>
<td><inline-formula id="ieqn-13"><mml:math id="mml-ieqn-13"><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td>Task computational complexity</td>
<td>K</td>
<td>Adaptive parameters</td>
</tr>
<tr>
<td><inline-formula id="ieqn-14"><mml:math id="mml-ieqn-14"><mml:msup><mml:mi>K</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula></td>
<td>MEC device correlation coefficient</td>
<td>W</td>
<td>Population size</td>
</tr>
<tr>
<td><inline-formula id="ieqn-15"><mml:math id="mml-ieqn-15"><mml:msup><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mi mathvariant="bold-italic">i</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula></td>
<td>MEC computing power</td>
<td><inline-formula id="ieqn-16"><mml:math id="mml-ieqn-16"><mml:mi>&#x03C4;</mml:mi></mml:math></inline-formula></td>
<td>Soft update factor</td>
</tr>
<tr>
<td><inline-formula id="ieqn-17"><mml:math id="mml-ieqn-17"><mml:mi>B</mml:mi></mml:math></inline-formula></td>
<td>Transmission bandwidth</td>
<td>A_LR</td>
<td>Actor-network learning rate</td>
</tr>
<tr>
<td><italic>H</italic><sub><italic>k</italic></sub></td>
<td>Channel gain</td>
<td>C_LR</td>
<td>Critic network learning rate</td>
</tr>
<tr>
<td><inline-formula id="ieqn-18"><mml:math id="mml-ieqn-18"><mml:msup><mml:mi>R</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula></td>
<td>Task transfer rate</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s3_2">
<label>3.2</label>
<title>Communications Model</title>
<p>It is assumed that the communication mode between MEC servers follows orthogonal frequency division multiple access (OFDMA) [<xref ref-type="bibr" rid="ref-32">32</xref>&#x2013;<xref ref-type="bibr" rid="ref-34">34</xref>]. It is assumed that the total bandwidth of the connection between MECs is set to Bi, which can be divided into E subchannels. Assuming that the channel state between MEC servers in each time slot is time-varying and obeys a Markov distribution, the channel state can be modeled as follows:
<disp-formula id="eqn-1"><label>(1)</label><mml:math id="mml-eqn-1" display="block"><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msqrt><mml:msub><mml:mi>&#x210F;</mml:mi><mml:mrow><mml:mi>e</mml:mi></mml:mrow></mml:msub><mml:mfrac><mml:mn>1</mml:mn><mml:msup><mml:mrow><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi>&#x03B4;</mml:mi><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:msup></mml:mfrac><mml:mo>&#x2217;</mml:mo><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub></mml:msqrt></mml:math></disp-formula>where <inline-formula id="ieqn-19"><mml:math id="mml-ieqn-19"><mml:msub><mml:mrow><mml:mi>&#x210F;</mml:mi></mml:mrow><mml:mrow><mml:mi>e</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is the path loss coefficient and <inline-formula id="ieqn-20"><mml:math id="mml-ieqn-20"><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is the distance between the MEC servers. <inline-formula id="ieqn-21"><mml:math id="mml-ieqn-21"><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is a predefined transfer probability matrix for the channel state.</p>
<p>For example, the channel states between MEC servers are [64,128,192,256,512]. Assuming that the current channel state <italic>h</italic><sub><italic>m</italic></sub>(<italic>t</italic>) is 192, the next time slot channel state <inline-formula id="ieqn-22"><mml:math id="mml-ieqn-22"><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> will be shifted to other states, e.g., 256, with a state transfer probability, in this way, the ever-changing channel states in the MEC environment will be modeled.</p>
<p>From the channel state model, the transmission rate between MEC servers can be obtained as:
<disp-formula id="eqn-2"><label>(2)</label><mml:math id="mml-eqn-2" display="block"><mml:msup><mml:mi>R</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mi>&#x03B2;</mml:mi><mml:msub><mml:mi>B</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mi>log</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mn>1</mml:mn><mml:mo>+</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:msup><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub></mml:mfrac><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula>where <inline-formula id="ieqn-23"><mml:math id="mml-ieqn-23"><mml:msub><mml:mi>B</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is the transmission bandwidth, <inline-formula id="ieqn-24"><mml:math id="mml-ieqn-24"><mml:mi>&#x03B2;</mml:mi></mml:math></inline-formula> is the bandwidth allocation ratio, <inline-formula id="ieqn-25"><mml:math id="mml-ieqn-25"><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is the transmission power, and <inline-formula id="ieqn-26"><mml:math id="mml-ieqn-26"><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> is Gaussian white noise.</p>
</sec>
<sec id="s3_3">
<label>3.3</label>
<title>Computational Model</title>
<p>When there is no abnormality in the motor set, the computation task is offloaded to the local MEC server for computation. In this paper&#x2019;s formulation, the right superscript <inline-formula id="ieqn-27"><mml:math id="mml-ieqn-27"><mml:mi>i</mml:mi></mml:math></inline-formula> represents the offloading user and the right superscript <inline-formula id="ieqn-28"><mml:math id="mml-ieqn-28"><mml:mi>j</mml:mi></mml:math></inline-formula> represents the offloading target MEC server. The local time delay and energy consumption are expressed as follows:
<disp-formula id="eqn-3"><label>(3)</label><mml:math id="mml-eqn-3" display="block"><mml:msup><mml:mi>T</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mfrac><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:msup><mml:mi>f</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msup></mml:mfrac></mml:math></disp-formula>
<disp-formula id="eqn-4"><label>(4)</label><mml:math id="mml-eqn-4" display="block"><mml:msup><mml:mi>E</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi>K</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msup><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:msup><mml:mi>f</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msup></mml:math></disp-formula>where <inline-formula id="ieqn-29"><mml:math id="mml-ieqn-29"><mml:msup><mml:mi>f</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> is the computational power of the MEC server and <inline-formula id="ieqn-30"><mml:math id="mml-ieqn-30"><mml:msup><mml:mi>K</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> is the device correlation coefficient of the MEC.</p>
<p>When abnormalities occur in the WTG, the local server is overloaded with computational pressure and offloads the computational tasks with a ratio of &#x03B3; to other servers for computation, and the local servers are referred to as offloaded users in the following. From the above, we can obtain the transmission delay for offloading the user offloading task to the target MEC server as follows:
<disp-formula id="eqn-5"><label>(5)</label><mml:math id="mml-eqn-5" display="block"><mml:msup><mml:mi>T</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mfrac><mml:msub><mml:mi>I</mml:mi><mml:mrow><mml:mi>K</mml:mi></mml:mrow></mml:msub><mml:msup><mml:mi>R</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msup></mml:mfrac></mml:math></disp-formula></p>
<p>The energy consumption is expressed as follows:
<disp-formula id="eqn-6"><label>(6)</label><mml:math id="mml-eqn-6" display="block"><mml:msup><mml:mi>E</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi>p</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msup><mml:msup><mml:mi>T</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msup></mml:math></disp-formula></p>
<p>The delay calculated on the target MEC server is expressed as follows:
<disp-formula id="eqn-7"><label>(7)</label><mml:math id="mml-eqn-7" display="block"><mml:msup><mml:mi>T</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mfrac><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mi>K</mml:mi></mml:mrow></mml:msub><mml:msup><mml:mi>f</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msup></mml:mfrac></mml:math></disp-formula></p>
<p>The energy consumption calculated on the target MEC server is expressed as follows:
<disp-formula id="eqn-8"><label>(8)</label><mml:math id="mml-eqn-8" display="block"><mml:msup><mml:mi>E</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi>K</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msup><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mi>f</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:math></disp-formula>where <inline-formula id="ieqn-31"><mml:math id="mml-ieqn-31"><mml:msup><mml:mi>f</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> denotes the computational resources allocated by the MEC server to the offloaded users.</p>
</sec>
<sec id="s3_4">
<label>3.4</label>
<title>Description of the Problem</title>
<p>When abnormalities occur in a wind turbine, a rapid response to the abnormal state is needed. On the one hand, the computation speed of the data processing task has real-time requirements; on the other hand, we need to consider the service life of the equipment because the equipment is required to run for a long time in the WTG anomaly monitoring environment. Therefore, it is necessary to consider the delay and energy consumption requirements, and the overall overhead of the system is expressed as the weighted sum of the delay and energy consumption. The overall delay and energy consumption of the system can be expressed as follows:
<disp-formula id="eqn-9"><label>(9)</label><mml:math id="mml-eqn-9" display="block"><mml:msup><mml:mi>T</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>o</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mi>&#x03B3;</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:msubsup><mml:mi>T</mml:mi><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>L</mml:mi></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:mi>&#x03B3;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msubsup><mml:mi>T</mml:mi><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>M</mml:mi><mml:mi>T</mml:mi></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi>T</mml:mi><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>M</mml:mi><mml:mi>C</mml:mi></mml:mrow></mml:msubsup><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula>
<disp-formula id="eqn-10"><label>(10)</label><mml:math id="mml-eqn-10" display="block"><mml:msup><mml:mi>E</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>o</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mi>&#x03B3;</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:msubsup><mml:mi>E</mml:mi><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>L</mml:mi></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:mi>&#x03B3;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msubsup><mml:mi>E</mml:mi><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>M</mml:mi><mml:mi>T</mml:mi></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi>E</mml:mi><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>M</mml:mi><mml:mi>C</mml:mi></mml:mrow></mml:msubsup><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>Thus the overall overhead of the system can be obtained as follows:
<disp-formula id="eqn-11"><label>(11)</label><mml:math id="mml-eqn-11" display="block"><mml:msub><mml:mi>U</mml:mi><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>a</mml:mi><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:msup><mml:mi>T</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>o</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:msup><mml:mo>+</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>a</mml:mi><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:msup><mml:mi>E</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>o</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:msup></mml:math></disp-formula>where <inline-formula id="ieqn-32"><mml:math id="mml-ieqn-32"><mml:msub><mml:mi>a</mml:mi><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is the weighting factor between delay and energy consumption.</p>
<p>To reduce the overall system overhead and efficiently use the system channel and computational resources, the system&#x2019;s optimal objective is transformed into a problem of minimizing the overall system overhead. Then the system optimization problem is formulated as P1.
<disp-formula id="eqn-12"><label>(12)</label><mml:math id="mml-eqn-12" display="block"><mml:munder><mml:mo movablelimits="true" form="prefix">min</mml:mo><mml:mrow><mml:msup><mml:mi mathvariant="normal">&#x0393;</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="normal">&#x03A5;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:msup><mml:mi>P</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:munder><mml:msubsup><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:msubsup><mml:msub><mml:mi>U</mml:mi><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mi>m</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mi>M</mml:mi></mml:math></disp-formula>
<disp-formula id="eqn-13"><label>(13)</label><mml:math id="mml-eqn-13" display="block"><mml:mn>0</mml:mn><mml:mo>&#x2264;</mml:mo><mml:msup><mml:mi>f</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msup><mml:mo>&#x2264;</mml:mo><mml:msub><mml:mrow><mml:msup><mml:mi>f</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:mo movablelimits="true" form="prefix">max</mml:mo></mml:mrow></mml:msub></mml:math></disp-formula>
<disp-formula id="eqn-14"><label>(14)</label><mml:math id="mml-eqn-14" display="block"><mml:mn>0</mml:mn><mml:mo>&#x2264;</mml:mo><mml:msub><mml:mi>a</mml:mi><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2264;</mml:mo><mml:mn>1</mml:mn></mml:math></disp-formula>
<disp-formula id="eqn-15"><label>(15)</label><mml:math id="mml-eqn-15" display="block"><mml:msup><mml:mi>T</mml:mi><mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mi>o</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:mrow></mml:msup><mml:mo>&#x2264;</mml:mo><mml:msubsup><mml:mi>&#x03C4;</mml:mi><mml:mrow><mml:mi>K</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi></mml:mrow></mml:msubsup></mml:math></disp-formula>
<disp-formula id="eqn-16"><label>(16)</label><mml:math id="mml-eqn-16" display="block"><mml:mn>0</mml:mn><mml:mo>&#x2264;</mml:mo><mml:mi>&#x03B3;</mml:mi><mml:mo>&#x2264;</mml:mo><mml:mn>1</mml:mn></mml:math></disp-formula>
<disp-formula id="eqn-17"><label>(17)</label><mml:math id="mml-eqn-17" display="block"><mml:msup><mml:mi>p</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msup><mml:mo>&#x2265;</mml:mo><mml:msub><mml:mrow><mml:msup><mml:mi>p</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:mo movablelimits="true" form="prefix">max</mml:mo></mml:mrow></mml:msub></mml:math></disp-formula>where <xref ref-type="disp-formula" rid="eqn-13">Eq. (13)</xref> is a constraint on MEC computing resources, <xref ref-type="disp-formula" rid="eqn-14">Eq. (14)</xref> is a weighting constraint on the ratio between delay and energy consumption, <xref ref-type="disp-formula" rid="eqn-15">Eq. (15)</xref> indicates that the task processing time must be less than the maximum allowed processing delay, <xref ref-type="disp-formula" rid="eqn-16">Eq. (16)</xref> is a constraint on the task offloading ratio, <xref ref-type="disp-formula" rid="eqn-17">Eq. (17)</xref> is a constraint on the transmission power, <inline-formula id="ieqn-33"><mml:math id="mml-ieqn-33"><mml:msup><mml:mrow><mml:mi mathvariant="normal">&#x0393;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> is the MEC server number selected by the agent, <inline-formula id="ieqn-34"><mml:math id="mml-ieqn-34"><mml:msup><mml:mi mathvariant="normal">&#x03A5;</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> is the task offload ratio selected by the agent, and <inline-formula id="ieqn-35"><mml:math id="mml-ieqn-35"><mml:msup><mml:mi>p</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> is the transmission power allocation.</p>
<p>P1 indicates that <inline-formula id="ieqn-36"><mml:math id="mml-ieqn-36"><mml:msub><mml:mi>U</mml:mi><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is minimized after offloading operations. In the offloading action, <inline-formula id="ieqn-37"><mml:math id="mml-ieqn-37"><mml:msup><mml:mi>p</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> and <inline-formula id="ieqn-38"><mml:math id="mml-ieqn-38"><mml:msup><mml:mi mathvariant="normal">&#x03A5;</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> are continuous variables, while <inline-formula id="ieqn-39"><mml:math id="mml-ieqn-39"><mml:msup><mml:mrow><mml:mi mathvariant="normal">&#x0393;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> is an integer variable. The feasible region formed by the optimization objective and constraints is non-convex. This can lead to the existence of multiple local minima, thereby complicating the optimization process. Additionally, P1 involves selecting a subset of MEC servers from a large set to offload tasks, which can be viewed as a combinatorial optimization problem. The goal is to identify the optimal server combination that minimizes overall system costs while adhering to various constraints. This is a mixed integer programming problem. It is not practical to use a specific mathematical derivation. Therefore, we introduced DRL to the optimization problem of task offloading.</p>
</sec>
</sec>
<sec id="s4">
<label>4</label>
<title>MADRL-Based Task Offloading Strategy</title>
<sec id="s4_1">
<label>4.1</label>
<title>MADRL-Based Task Offloading Model</title>
<p>P1 involves the complex task offloading optimization problem within a MEC environment involving multiple servers and users. This scenario can effectively be modeled as a Markov Decision Process (MDP). In this framework, the state space (S) includes parameters such as the computational loads of individual servers, task statuses, and device energy levels. The action space (A) encompasses decisions like selecting target MEC servers for task offloading, determining task offloading proportions, and allocating transmission power. The Transition Function (T) governs how the system state evolves from one moment to the next based on chosen actions, while the Reward Function (R) provides feedback by penalizing system overhead costs to minimize overall expenditure. The data collected by the WSNs change dramatically when a WTG anomaly occurs, while the channel state also changes at any time. To address this challenge and inspired by [<xref ref-type="bibr" rid="ref-35">35</xref>], we model problem P1 as a MADRL-based task offloading model. The offloading user is designated as an agent in MADRL. In the offshore wind network, a centralized training and distributed computing architecture is used. The SDN controller trains the agents in a centralized manner. The network parameters are periodically distributed to the agents. The MADRL-based computing framework is shown in <xref ref-type="fig" rid="fig-2">Fig. 2</xref>.</p>
<fig id="fig-2">
<label>Figure 2</label>
<caption>
<title>The computing framework based on MADRL</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_55614-fig-2.tif"/>
</fig>
<p>The state space, action space, and reward functions in MADRL are shown below.</p>
<sec id="s4_1_1">
<label>4.1.1</label>
<title>State Space</title>
<p>At the beginning of each time slot, all agents receive computational tasks from nearby WSNs. They also consider the available computing resources of all MEC servers during the current time slot. This indicates how much processing capacity is available for task execution, directly influencing the decision of whether tasks should be processed locally or offloaded to MEC servers. Additionally, there is information about the current network status, including channel conditions and potential device failures. These factors directly impact the efficiency and stability of task transmission within the network. To efficiently use the computational resources of the system and the channel resources in the MEC environment, the state space is defined as:
<disp-formula id="eqn-18"><label>(18)</label><mml:math id="mml-eqn-18" display="block"><mml:msub><mml:mi>S</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mi>K</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msubsup><mml:mi>A</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>A</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:msubsup><mml:mi>A</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>A</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:msubsup><mml:mi>A</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:msubsup><mml:mo>}</mml:mo></mml:mrow></mml:math></disp-formula>where <inline-formula id="ieqn-40"><mml:math id="mml-ieqn-40"><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mi>K</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> denotes the task, <inline-formula id="ieqn-41"><mml:math id="mml-ieqn-41"><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is the computational power of all MECs in the current time slot, and the goal of the system is to minimize overall overhead, therefore requiring collaboration among all agents, which requires actions from other agents.</p>
</sec>
<sec id="s4_1_2">
<label>4.1.2</label>
<title>Action Space</title>
<p>The action space aims to maximize expected long-term returns by efficiently utilizing available resources. Firstly, agents decide which MEC server to offload computational tasks to, based on factors such as server computing power, reliability, and geographical location, which impact task processing efficiency and latency. Secondly, agents determine the proportion of tasks to be offloaded to the chosen MEC server, taking into account its current workload and processing capabilities to ensure system balance and performance optimization. Lastly, agents allocate appropriate transmission power for tasks offloaded to MEC servers, directly influencing the stability and efficiency of data transmission. To efficiently utilize the spectrum, the transmission power of offloaded users is allocated. The appropriate offload ratio is selected based on the computing resources of the MEC cluster. The action space is represented as follows:
<disp-formula id="eqn-19"><label>(19)</label><mml:math id="mml-eqn-19" display="block"><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:msup><mml:mi mathvariant="normal">&#x0393;</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:msup><mml:mi mathvariant="normal">&#x03A5;</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:msup><mml:mi>P</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msup><mml:mo>}</mml:mo></mml:mrow></mml:math></disp-formula>where <inline-formula id="ieqn-42"><mml:math id="mml-ieqn-42"><mml:msup><mml:mrow><mml:mi mathvariant="normal">&#x0393;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> is the MEC server number selected by the agent, <inline-formula id="ieqn-43"><mml:math id="mml-ieqn-43"><mml:msup><mml:mi mathvariant="normal">&#x03A5;</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> is the task offload ratio selected by the agent, and <inline-formula id="ieqn-44"><mml:math id="mml-ieqn-44"><mml:msup><mml:mi>P</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> is the transmission power allocation.</p>
<p>The variable of the action space is normalized. For example, suppose that <inline-formula id="ieqn-45"><mml:math id="mml-ieqn-45"><mml:msub><mml:mrow><mml:msup><mml:mi>P</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:mo movablelimits="true" form="prefix">max</mml:mo></mml:mrow></mml:msub></mml:math></inline-formula> is 20 MHz when the agent selects the action as [0.03,0.6,0.6]. Then it means that the agent chooses to offload 60% of its computational tasks to the MEC server numbered 3 and allocate 12 MHz transmission power for the current task.</p>
</sec>
<sec id="s4_1_3">
<label>4.1.3</label>
<title>Reward Function</title>
<p>The reward space quantifies the feedback agents receive based on their actions, guiding them towards making optimal decisions step by step. In this study, the reward is designed to encourage efficient task offloading and resource utilization. The reward function describes the relationships among multi-agent systems. In this system, the goal of optimization is to minimize the overall system overhead, so there is a cooperative relationship between agents. However, when the target MEC servers of two offloading users are consistent, there is a resource competition relationship between the agents. The goal of DRL is to maximize the expected long-term rewards, while the system goal is to minimize the overall system overhead. Therefore, we use the negative value of the overall cost as the reward after the decision. The reward function for individual agent <inline-formula id="ieqn-46"><mml:math id="mml-ieqn-46"><mml:mi>i</mml:mi></mml:math></inline-formula> can be defined as follows:
<disp-formula id="eqn-20"><label>(20)</label><mml:math id="mml-eqn-20" display="block"><mml:msup><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>U</mml:mi><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub></mml:math></disp-formula></p>
<p>The overall reward for agents is as follows:
<disp-formula id="eqn-21"><label>(21)</label><mml:math id="mml-eqn-21" display="block"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:munderover><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>U</mml:mi><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mspace width="thinmathspace" /></mml:mrow><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:mi>M</mml:mi></mml:math></disp-formula></p>
<p>The main goal of the system is to maximize overall rewards.</p>
</sec>
</sec>
<sec id="s4_2">
<label>4.2</label>
<title>DRL-Based Online Computational Offloading Algorithm</title>
<p>DRL is an online algorithm that generates historical experience through constant interaction with the environment and uses it to learn. It uses deep neural networks based on reinforcement learning to fit state value functions and strategies &#x03C0;. It aims to maximize expected long-term returns through deep learning [<xref ref-type="bibr" rid="ref-36">36</xref>,<xref ref-type="bibr" rid="ref-37">37</xref>]. The proposed algorithm AGA-DDPG is an improvement of DDPG. First, intelligence acquires its current state from the MEC environment. Then, the agent&#x2019;s executor network outputs the offloading action based on the acquired state <inline-formula id="ieqn-47"><mml:math id="mml-ieqn-47"><mml:msub><mml:mi>S</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>. After that, the MEC environment provides immediate rewards rt based on the offloading action <inline-formula id="ieqn-48"><mml:math id="mml-ieqn-48"><mml:msub><mml:mi>a</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>. Finally, the critic network scores the offload action. It is recorded as the action state value Q. The experience groups (<inline-formula id="ieqn-49"><mml:math id="mml-ieqn-49"><mml:msub><mml:mi>S</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, <inline-formula id="ieqn-50"><mml:math id="mml-ieqn-50"><mml:msub><mml:mi>a</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, <inline-formula id="ieqn-51"><mml:math id="mml-ieqn-51"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, <inline-formula id="ieqn-52"><mml:math id="mml-ieqn-52"><mml:msub><mml:mi>S</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula>) are also collected and their priority is calculated. The prioritized experience groups will be stored in the playback memory pool. The actor and critic networks are trained based on the experience sets in the replay memory pool. The two important parts of the AGA-DDPG algorithm are the actor-critic network and the AGA exploration of the action space.</p>
<sec id="s4_2_1">
<label>4.2.1</label>
<title>Actor-Critic Network</title>
<p>The algorithm improves upon the DDPG, which is an online algorithm. The output of the offloading policy &#x03C0; is a deterministic action <inline-formula id="ieqn-53"><mml:math id="mml-ieqn-53"><mml:msub><mml:mi>a</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>. The purpose of the offloading strategy &#x03C0; is to enable the output action <inline-formula id="ieqn-54"><mml:math id="mml-ieqn-54"><mml:msub><mml:mi>a</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> to maximize expected long-term rewards. The actor-network fits the offloading policy &#x03C0; using deep learning techniques.
<disp-formula id="eqn-22"><label>(22)</label><mml:math id="mml-eqn-22" display="block"><mml:msub><mml:mi>a</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>&#x03C0;</mml:mi><mml:mrow><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>And the actor target network outputs the next actions based on the next states.
<disp-formula id="eqn-23"><label>(23)</label><mml:math id="mml-eqn-23" display="block"><mml:msub><mml:mi>a</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>&#x03C0;</mml:mi><mml:mrow><mml:msup><mml:mi>&#x03B8;</mml:mi><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>The critic network in AGA-DDPG is a deep neural network (DNN) used to fit the Q-value function of state actions. The Q-value is the expected reward for the current action, so it can evaluate the quality of the output actions of the actor-network in the current state.
<disp-formula id="eqn-24"><label>(24)</label><mml:math id="mml-eqn-24" display="block"><mml:msub><mml:mi>Q</mml:mi><mml:mrow><mml:mi>&#x03C9;</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>&#x03C0;</mml:mi><mml:mrow><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>The critic target network is used to fit the Q-value function of state action at the next state.
<disp-formula id="eqn-25"><label>(25)</label><mml:math id="mml-eqn-25" display="block"><mml:msub><mml:mi>Q</mml:mi><mml:mrow><mml:msup><mml:mi>w</mml:mi><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>&#x03C0;</mml:mi><mml:mrow><mml:msup><mml:mi>&#x03B8;</mml:mi><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>After the MEC environment gets the action output from the actor-network, it will calculate the reward under the current action. The ultimate goal of AGA-DDPG is to maximize the expected long-term reward:
<disp-formula id="eqn-26"><label>(26)</label><mml:math id="mml-eqn-26" display="block"><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mi>&#x03BB;</mml:mi><mml:msub><mml:mi>Q</mml:mi><mml:mrow><mml:mi>&#x03C9;</mml:mi><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>&#x03C0;</mml:mi><mml:mrow><mml:msup><mml:mi>&#x03B8;</mml:mi><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula>where <inline-formula id="ieqn-55"><mml:math id="mml-ieqn-55"><mml:mi>&#x03BB;</mml:mi></mml:math></inline-formula> is the discount factor.</p>
<p>We use minimizing loss values to update the parameters of critical networks.
<disp-formula id="eqn-27"><label>(27)</label><mml:math id="mml-eqn-27" display="block"><mml:mi>L</mml:mi><mml:mi>o</mml:mi><mml:mi>s</mml:mi><mml:mi>s</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mi>M</mml:mi></mml:mfrac><mml:munder><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:munder><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>Q</mml:mi><mml:mrow><mml:mi>&#x03C9;</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>a</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:msup><mml:mo stretchy="false">)</mml:mo><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:math></disp-formula></p>
<p>And policy gradient is used to update actor-network parameters.
<disp-formula id="eqn-28"><label>(28)</label><mml:math id="mml-eqn-28" display="block"><mml:msub><mml:mi mathvariant="normal">&#x2207;</mml:mi><mml:mrow><mml:mrow><mml:msub><mml:mi>&#x03C0;</mml:mi><mml:mrow><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:msub><mml:mi>J</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mi>M</mml:mi></mml:mfrac><mml:munder><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:munder><mml:msub><mml:mi mathvariant="normal">&#x2207;</mml:mi><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mi>Q</mml:mi><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>&#x03C0;</mml:mi><mml:mrow><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:msub><mml:mi mathvariant="normal">&#x2207;</mml:mi><mml:mrow><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mi>&#x03C0;</mml:mi><mml:mrow><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>The output action is unstable due to the quickly changing parameters of the online network. Therefore, we used soft update to update the target network making the output action more stable.
<disp-formula id="eqn-29"><label>(29)</label><mml:math id="mml-eqn-29" display="block"><mml:msup><mml:mi>&#x03B8;</mml:mi><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mi>&#x03C4;</mml:mi><mml:mi>&#x03B8;</mml:mi><mml:mo>+</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mi>&#x03C4;</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:msup><mml:mi>&#x03B8;</mml:mi><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup></mml:math></disp-formula>
<disp-formula id="eqn-30"><label>(30)</label><mml:math id="mml-eqn-30" display="block"><mml:msup><mml:mi>&#x03C9;</mml:mi><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mi>&#x03C4;</mml:mi><mml:mi>&#x03C9;</mml:mi><mml:mo>+</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mi>&#x03C4;</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:msup><mml:mi>&#x03C9;</mml:mi><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup></mml:math></disp-formula>where <inline-formula id="ieqn-56"><mml:math id="mml-ieqn-56"><mml:mi>&#x03C4;</mml:mi></mml:math></inline-formula> is soft updated parameter.</p>
</sec>
<sec id="s4_2_2">
<label>4.2.2</label>
<title>Exploring the Action Space with AGA</title>
<p>In traditional DDPG, the action space is explored using a greedy strategy, which requires traversing all action spaces, resulting in low learning efficiency and slow network training speed [<xref ref-type="bibr" rid="ref-38">38</xref>]. Therefore, instead of scoring the output of the action directly by the actor-network, we use the AGA to score the actions <inline-formula id="ieqn-57"><mml:math id="mml-ieqn-57"><mml:msup><mml:mrow><mml:msub><mml:mi>a</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:msup></mml:math></inline-formula> obtained after exploring the action space in the critic network in the AGA-DDPG algorithm.</p>
<p>First, the actor networks output actions <inline-formula id="ieqn-58"><mml:math id="mml-ieqn-58"><mml:msub><mml:mi>a</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and (W&#x2013;1) randomly generated offloading decision scheme form the initial population, where W is the population size. The initialized population is represented as follows:
<disp-formula id="eqn-31"><label>(31)</label><mml:math id="mml-eqn-31" display="block"><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mn>0</mml:mn><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>a</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mn>0</mml:mn><mml:mo>)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:msub><mml:mi>a</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mn>0</mml:mn><mml:mo>)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:msub><mml:mi>a</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mn>0</mml:mn><mml:mo>)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>a</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mn>0</mml:mn><mml:mo>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mn>3</mml:mn><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:mi>W</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:math></disp-formula></p>
<p>The steps of the AGA algorithm, as illustrated in <xref ref-type="fig" rid="fig-3">Fig. 3</xref>, are described as follows:</p>
<p>Step 1: Initialize algorithm parameters, including the number and size of sub-populations, the number of iterations, crossover and mutation probabilities, adaptive control parameters, etc.</p>
<p>Step 2: Randomly generate an initial population and divide it into multiple sub-populations, assigning independent threads to each sub-population.</p>
<p>Step 3: Each sub-population executes basic genetic operations in parallel, which include fitness evaluation, selection, crossover and mutation, and preservation of elite individuals.</p>
<p>Step 4: Check the current iteration count; if it reaches the maximum number of iterations, merge the sub-populations and output the optimal solution set. Otherwise, proceed to Step 5.</p>
<p>Step 5: Determine if the sub-populations meet the migration condition. If so, each sub-population completes a migration communication operation and then proceeds to Step 3 to continue the iteration. If the migration condition is not met, directly proceed to Step 3.</p>

<fig id="fig-3">
<label>Figure 3</label>
<caption>
<title>Sketch map of AGA</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_55614-fig-3.tif"/>
</fig>
</sec>
</sec>
<sec id="s4_3">
<label>4.3</label>
<title>Prioritized Experience Replay</title>
<p>The experience replay technique is a key technique in DRL, which enables an agent to remember and use past experiences for learning. In the traditional deep reinforcement learning DDPG, experience replay groups are drawn using random sampling [<xref ref-type="bibr" rid="ref-39">39</xref>]. However, this approach ignores the importance of different values and experience groups for training. Therefore, we use the PER [<xref ref-type="bibr" rid="ref-40">40</xref>] technique to extract experience replay groups. Different experience groups have different importance, and experience groups with higher importance are drawn for training with a higher probability.</p>
<p>The calculation of priority in PER is the core problem. Because the replay probability of different experience replay groups needs to be calculated according to the priority. TD-error is used as an important indicator to evaluate the priority of experience. The neural network can&#x2019;t estimate the true value of the action accurately when the absolute value of TD-error is high. At this time, giving it a higher weight helps the neural network to reduce the probability of wrong predictions. In addition, the overall task overhead is an important indicator to adjust whether the network is well-trained or not. Therefore, the calculation of priority takes into account the absolute value of TD-error and the overall task overhead. Scoring the experience group is as follows:
<disp-formula id="eqn-32"><label>(32)</label><mml:math id="mml-eqn-32" display="block"><mml:mi>s</mml:mi><mml:mi>c</mml:mi><mml:mi>o</mml:mi><mml:mi>r</mml:mi><mml:msubsup><mml:mi>e</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x03C6;</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mi>&#x03B4;</mml:mi><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:msubsup><mml:mi>&#x03B4;</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x03C6;</mml:mi></mml:mrow></mml:msubsup><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mi>&#x03B4;</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mi>z</mml:mi><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mi>&#x03C6;</mml:mi></mml:mrow></mml:msup><mml:mo stretchy="false">)</mml:mo></mml:math></disp-formula>where <inline-formula id="ieqn-59"><mml:math id="mml-ieqn-59"><mml:mi>&#x03B4;</mml:mi></mml:math></inline-formula> is the score control parameter, <inline-formula id="ieqn-60"><mml:math id="mml-ieqn-60"><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:msubsup><mml:mi>&#x03B4;</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x03C6;</mml:mi></mml:mrow></mml:msubsup><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow></mml:math></inline-formula> is the absolute value of TD-error, and <inline-formula id="ieqn-61"><mml:math id="mml-ieqn-61"><mml:mi>z</mml:mi><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mi>&#x03C6;</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> is a function related to the overall task overhead.</p>
<p>After obtaining <xref ref-type="disp-formula" rid="eqn-32">Eq. (32)</xref> again, the experience group is sorted from smallest to largest, and the order number of the experience group is rank(&#x03C6;) &#x003D; {1,2,3. . .}. Define the sampling values according to the order number.
<disp-formula id="eqn-33"><label>(33)</label><mml:math id="mml-eqn-33" display="block"><mml:mi>v</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi><mml:mi>u</mml:mi><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mi>&#x03C6;</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mi>r</mml:mi><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>k</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x03C6;</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:math></disp-formula></p>
<p>Based on the sampling values we can obtain the sampling probability from the following equation:
<disp-formula id="eqn-34"><label>(34)</label><mml:math id="mml-eqn-34" display="block"><mml:msup><mml:mi>p</mml:mi><mml:mrow><mml:mi>&#x03C6;</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>v</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi><mml:mi>u</mml:mi><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mi>&#x03C6;</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:msubsup><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>R</mml:mi><mml:mi>b</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>R</mml:mi><mml:mi>b</mml:mi></mml:mrow></mml:msubsup><mml:mi>v</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi><mml:mi>u</mml:mi><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mi>&#x03C6;</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:mfrac></mml:math></disp-formula></p>
<p>The experience replay groups generated in each training round are assigned priority according to the PER method. Experience groups with higher scores will receive higher sampling probabilities to effectively use more training-worthy replay experience groups. It can increase the training speed of the network in this way.</p>
<p>The MADRL-based online task offloading algorithm (AGA-DDPG) is shown in Algorithm 1.</p>
<fig id="fig-10">
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_55614-fig-10.tif"/>
</fig>
</sec>
<sec id="s4_4">
<label>4.4</label>
<title>Complexity Analysis of AGA-DDPG</title>
<p>The comparative solution is the DDPG algorithm, so it is necessary to analyze the computational complexity of DDPG. Referring to [<xref ref-type="bibr" rid="ref-41">41</xref>], the time complexity of DDPG can be expressed as follows:
<disp-formula id="eqn-35"><label>(35)</label><mml:math id="mml-eqn-35" display="block"><mml:mn>2</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mi>I</mml:mi></mml:mrow></mml:munderover><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi>A</mml:mi><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi>A</mml:mi><mml:mo>,</mml:mo><mml:mi>i</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mn>2</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mi>J</mml:mi></mml:mrow></mml:munderover><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi>C</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi>C</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mtext>O&#xA0;</mml:mtext></mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mi>I</mml:mi></mml:mrow></mml:munderover><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi>A</mml:mi><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi>A</mml:mi><mml:mo>,</mml:mo><mml:mi>i</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mi>J</mml:mi></mml:mrow></mml:munderover><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi>C</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi>C</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>]</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>In the above equation, <italic>I</italic> and <italic>J</italic> respectively represent the number of fully connected layers in the actor-network and critic network of the DDPG algorithm. Where <inline-formula id="ieqn-75"><mml:math id="mml-ieqn-75"><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi>A</mml:mi><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula id="ieqn-76"><mml:math id="mml-ieqn-76"><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi>C</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> represent the number of neurons in the i-th layer of the actor-network and the <italic>j</italic>-th layer of the critic network. The proposed algorithm AGA-DDPG incorporates the AGA exploration process in the actor-network and critic network. AGA is an optimization process, so the time complexity of AGA-DDPG can be expressed as follows:</p>
<p><disp-formula id="eqn-36"><label>(36)</label><mml:math id="mml-eqn-36" display="block"><mml:mn>2</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mi>I</mml:mi></mml:mrow></mml:munderover><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi>A</mml:mi><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi>A</mml:mi><mml:mo>,</mml:mo><mml:mi>i</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mn>2</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mi>J</mml:mi></mml:mrow></mml:munderover><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi>C</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi>C</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mi>K</mml:mi></mml:mrow></mml:munderover><mml:mi>K</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>W</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mtext>O&#xA0;</mml:mtext></mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mi>I</mml:mi></mml:mrow></mml:munderover><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi>A</mml:mi><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi>A</mml:mi><mml:mo>,</mml:mo><mml:mi>i</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mi>J</mml:mi></mml:mrow></mml:munderover><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi>C</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi>C</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mi>K</mml:mi></mml:mrow></mml:munderover><mml:mi>K</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>W</mml:mi><mml:mo>]</mml:mo></mml:mrow></mml:math></disp-formula>where <italic>K</italic> is the number of iterations for AGA exploration Similarly, <italic>W</italic> is the size of the initialization population in the AGA algorithm.</p>
</sec>
</sec>
<sec id="s5">
<label>5</label>
<title>Results</title>
<sec id="s5_1">
<label>5.1</label>
<title>Simulation Parameter Setting</title>
<p>The simulation experiment environment we ussed was PyTorch 1.12.1 with Python 3.9. The simulation scenario is a circular area with a radius of 1000 m, the number of user nodes at the edge is 50, and the number of MEC servers is 10. The MEC computing power is randomly generated at 11 GHz&#x007E;15 GHz.</p>
<p>For the deep neural network, the actor and critic networks at each agent consist of a four-layer fully connected neural network and two hidden layers. The numbers of neurons in the two hidden layers are 400 and 300. The neural network activation function uses the ReLu function, while the output function of the actor-network is a sigmoid function. The soft update coefficient of the target network is <inline-formula id="ieqn-77"><mml:math id="mml-ieqn-77"><mml:mi>&#x03C4;</mml:mi></mml:math></inline-formula> &#x003D; 0.01 and the memory size of the history experience group is set to <inline-formula id="ieqn-78"><mml:math id="mml-ieqn-78"><mml:mrow><mml:mi mathvariant="normal">&#x03A9;</mml:mi></mml:mrow></mml:math></inline-formula> &#x003D; 3 &#x00D7; 1025.</p>
<p>The simulation parameters are shown in <xref ref-type="table" rid="table-3">Table 3</xref>.</p>
<table-wrap id="table-3">
<label>Table 3</label>
<caption>
<title>Setting the simulation parameters</title>
</caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th>Parameter</th>
<th>Value</th>
<th>Parameter</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td><inline-formula id="ieqn-79"><mml:math id="mml-ieqn-79"><mml:msub><mml:mi>&#x03C4;</mml:mi><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td>1 ms</td>
<td><inline-formula id="ieqn-80"><mml:math id="mml-ieqn-80"><mml:msub><mml:mi>I</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td>100 KB&#x007E;1000 KB</td>
</tr>
<tr>
<td><inline-formula id="ieqn-81"><mml:math id="mml-ieqn-81"><mml:msubsup><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mi mathvariant="bold-italic">m</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="bold-italic">i</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula></td>
<td>11 GHz&#x007E;15 GHz</td>
<td>Kmax</td>
<td>100</td>
</tr>
<tr>
<td>Number of WSN</td>
<td>50</td>
<td>W</td>
<td>100</td>
</tr>
<tr>
<td>Number of MEC</td>
<td>10</td>
<td><inline-formula id="ieqn-82"><mml:math id="mml-ieqn-82"><mml:mi>&#x03C4;</mml:mi></mml:math></inline-formula></td>
<td>0.01</td>
</tr>
<tr>
<td>MEC bandwidth</td>
<td>100 MHz</td>
<td>&#x03A9;</td>
<td>3 &#x00D7; 104</td>
</tr>
<tr>
<td><inline-formula id="ieqn-83"><mml:math id="mml-ieqn-83"><mml:msup><mml:mi>&#x03B4;</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula></td>
<td>10&#x2212;9 W</td>
<td>Rb</td>
<td>32</td>
</tr>
<tr>
<td><inline-formula id="ieqn-84"><mml:math id="mml-ieqn-84"><mml:msub><mml:mi>K</mml:mi><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td>10&#x2013;31J</td>
<td><inline-formula id="ieqn-85"><mml:math id="mml-ieqn-85"><mml:mi>&#x03BB;</mml:mi></mml:math></inline-formula></td>
<td>0.8</td>
</tr>
<tr>
<td><inline-formula id="ieqn-86"><mml:math id="mml-ieqn-86"><mml:msubsup><mml:mi>p</mml:mi><mml:mrow><mml:mo movablelimits="true" form="prefix">max</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula></td>
<td>20 MHz</td>
<td><inline-formula id="ieqn-87"><mml:math id="mml-ieqn-87"><mml:mi>e</mml:mi><mml:mi>p</mml:mi><mml:mi>o</mml:mi><mml:mi>c</mml:mi><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mo movablelimits="true" form="prefix">max</mml:mo></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td>1500</td>
</tr>
<tr>
<td><inline-formula id="ieqn-88"><mml:math id="mml-ieqn-88"><mml:msubsup><mml:mi>&#x03C4;</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mo movablelimits="true" form="prefix">max</mml:mo></mml:mrow></mml:msubsup></mml:math></inline-formula></td>
<td>80 ms&#x007E;100 ms</td>
<td>Number of time slots</td>
<td>50</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>To verify the performance of the proposed algorithm, AGA-DDPG was compared with the following algorithms:
<list list-type="order">
<list-item>
<p>LC scheme: All calculation tasks are calculated in local MEC.</p></list-item>
<list-item>
<p>RC scheme: The offloading action is selected randomly, including the task division ratio, bandwidth allocation, and selected MEC server number.</p></list-item>
<list-item>
<p>Twin Delayed Deep Deterministic Policy Gradient (TD3): TD3 was developed from the shortcomings of DDPG. Its core idea is that the critic network should update faster than the actor-network. When the critic network is well-trained, it can effectively guide the actor-network to improve its learning. Its actor-network and critic network use the same structure as the proposed algorithm.</p></list-item>
<list-item>
<p>DDPG: The improved DDPG algorithm was selected as one of the comparative algorithms, which facilitates the verification of the effectiveness of the DDPG algorithm. The structure of the participant network and critic network of DDPG was the same as that of AGA-DDPG, and the empirical group extraction method used random extraction.</p></list-item>
</list></p>
</sec>
<sec id="s5_2">
<label>5.2</label>
<title>Convergence Performance</title>
<p>Simulation experiments are conducted for the proposed algorithm AGA-DDPG. It aims to maximize the expected long-term reward of the system. Network convergence can be judged when the overall average reward of the system tends to be stable while considering the learning rate as a hyperparameter that affects the learning efficiency of DRL. Therefore, the average reward variation of the actor-network and the critic network is plotted for different learning rates.</p>
<p><xref ref-type="fig" rid="fig-4">Fig. 4</xref> shows the effect of different learning rates on the average reward under the AGA-DDPG algorithm. When training starts, compared to performing all local computations, the cost savings from starting to offload computational tasks to other MEC servers are rapidly increasing. As training progresses, the average reward slowly increases over the long term with large fluctuations. When the number of training episodes reaches 400, the system stabilizes. The average reward basically stops increasing, and network training is completed. When the training effect was better, the actor-network learning rate (A_LR) and critic-network learning rate (C_LR) were 0.01 and 0.05, respectively. This is because the update of the actor-network depends on the critic network, so A_LR is biased lower than C_LR. We use the same learning rate in the following simulation settings.</p>
<fig id="fig-4">
<label>Figure 4</label>
<caption>
<title>Average reward of the system under the AGA-DDPG algorithm</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_55614-fig-4.tif"/>
</fig>
<p>For DRL, the loss value of the network is an important indicator for determining whether the network converges. Therefore, we present the loss value change curves for the critic network.</p>
<p><xref ref-type="fig" rid="fig-5">Fig. 5</xref> represents the variation in the loss values of the critic network with the number of iterations in the proposed algorithm. The critical network decreases sharply at the beginning of training and then enters a period of intense fluctuations. During this period, the critical network learns from historical experience and constantly updates its own parameters. After training for 400 episodes, the critical network tends to stabilize. Due to the constantly time-varying channel state, the loss value fluctuates slightly within an acceptable range. The convergence performance of the critic network and actor-network is interdependent. When critical converges, it provides better guidance for actor networks. The critical network fits the actor network&#x2019;s output action expectation reward function. When the actor-network convergence occurs, its output actions are more accurate, and the reward value of environmental feedback is greater.</p>
<fig id="fig-5">
<label>Figure 5</label>
<caption>
<title>Critic network loss value</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_55614-fig-5.tif"/>
</fig>
</sec>
<sec id="s5_3">
<label>5.3</label>
<title>Model Optimization</title>
<p>AGA-DDPG integrates an AGA exploration process between the actor and critic networks and incorporates a prioritized concept within the experience replay buffer to determine the sampling probabilities of experiences. To validate the improvements of AGA-DDPG over traditional DDPG, we compared the performance of AGA-DDPG, DDPG, and Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithms.</p>
<p><xref ref-type="fig" rid="fig-6">Fig. 6</xref> illustrates the variation of latency and energy consumption with training epochs for the three algorithms when the number of nodes in the WSN is 50. AGA-DDPG shows a rapid decrease in energy consumption and latency at the beginning of training, which is also observed to some extent in the other two approaches. However, AGA-DDPG demonstrates a faster convergence rate compared to the other two methods. As training progresses, tasks are effectively offloaded to different MEC servers, thereby improving the system&#x2019;s resource utilization. In contrast to AGA-DDPG, both DDPG and TD3 exhibit poorer stability. AGA-DDPG not only demonstrates stability, as shown in <xref ref-type="fig" rid="fig-6">Fig. 6</xref>, but also achieves quicker decreases in energy consumption and latency early in training compared to the other algorithms. This can be attributed to two main factors. Firstly, we employ AGA to explore the action space between the actor and critic networks, maximizing the actor network&#x2019;s output of better actions each time. Secondly, the inclusion of prioritization to determine the sampling probabilities of experience batches ensures that batches with higher value are sampled with higher probability, avoiding unnecessary training on batches with lower value.</p>
<fig id="fig-6">
<label>Figure 6</label>
<caption>
<title>(a) Impact of algorithm improvements on delay; (b) Impact of algorithm improvements on energy</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_55614-fig-6.tif"/>
</fig>
</sec>
<sec id="s5_4">
<label>5.4</label>
<title>Performance Analysis</title>
<p>The weight coefficients <inline-formula id="ieqn-89"><mml:math id="mml-ieqn-89"><mml:msub><mml:mi>a</mml:mi><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> of delay and energy consumption have an impact on the performance of the algorithm. To analyze this impact, we analyzed the delay and energy consumption under the condition of changing weight coefficients.</p>
<p><xref ref-type="fig" rid="fig-7">Fig. 7</xref> shows the influence of the weight coefficient on the average delay and average energy consumption of the system. As <inline-formula id="ieqn-90"><mml:math id="mml-ieqn-90"><mml:msub><mml:mi>a</mml:mi><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> gradually increases, AGA-DDPG tends to focus more on the delay cost paid by the system, while paying less attention to energy consumption. In contrast, as <inline-formula id="ieqn-91"><mml:math id="mml-ieqn-91"><mml:msub><mml:mi>a</mml:mi><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> gradually decreases, the proportion of delay in the overall system overhead decreases, and the system tends to pay more attention to energy consumption. In WTG anomaly monitoring, more emphasis is placed on reducing delays, as the ultimate goal is real-time monitoring to avoid WTG failures. Therefore, the parameter <inline-formula id="ieqn-92"><mml:math id="mml-ieqn-92"><mml:msub><mml:mi>a</mml:mi><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is set to 0.6 to ensure that more attention is given to delay. This decision aligns with the characteristics of WTG anomaly detection applications, where timeliness is critical. It ensures that the monitoring system can promptly detect and respond to anomalies in wind turbines, thereby enhancing system reliability and efficiency. Furthermore, simulation results may illustrate the performance trends of the system as parameter <inline-formula id="ieqn-93"><mml:math id="mml-ieqn-93"><mml:msub><mml:mi>a</mml:mi><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> varies. For example, one may observe that with higher values of <inline-formula id="ieqn-94"><mml:math id="mml-ieqn-94"><mml:msub><mml:mi>a</mml:mi><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, system latency decreases while energy consumption increases, whereas lower values of <inline-formula id="ieqn-95"><mml:math id="mml-ieqn-95"><mml:msub><mml:mi>a</mml:mi><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> may lead to slight increases in latency but effective control over system energy consumption.</p>
<fig id="fig-7">
<label>Figure 7</label>
<caption>
<title>The influence of the weight coefficient <inline-formula id="ieqn-96"><mml:math id="mml-ieqn-96"><mml:msub><mml:mi>a</mml:mi><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> on the average delay and average energy consumption of the system</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_55614-fig-7.tif"/>
</fig>
<p><xref ref-type="fig" rid="fig-8">Fig. 8</xref> shows the impact of the different schemes on the energy consumption and delay for WSNs ranging from 10 to 50. For the MEC solution, when the number of WSNs is low, the local server has fully sufficient capacity to handle it, and the produced energy consumption and latency start to increase linearly with the number of WSNs. However, when the number of WSNs increases to 30, the local MEC server appears to be under more calculation pressure and will incur more waiting delays. This indicates that even with the use of MEC, performance bottlenecks may occur under high load conditions. The random offloading scheme and MEC present similar tendencies but are more volatile. The random offloading generates more energy consumption and latency than does the MEC scheme. This is because random offloading requires additional latency and energy consumption for transmission compared with MEC when offloading tasks to other MEC servers. This additional overhead significantly increases the overall cost. When the number of WSNs is small, the delay and energy consumption generated by DDPG are similar to those generated by TD3, indicating the relatively stable performance of both algorithms in smaller-scale networks. However, as the number of WSNs increased, the performance of the AGA-DDPG algorithm gradually improved. This is attributed to AGA-DDPG&#x2019;s better optimization of the trade-off between energy consumption and latency in multi-sensor network environments, achieved through dynamically adjusting task offloading strategies to adapt to varying load conditions. In addition, to further demonstrate the effectiveness of AGA-DDPG, the overall overhead of all algorithms is shown in <xref ref-type="fig" rid="fig-9">Fig. 9</xref>.</p>
<fig id="fig-8">
<label>Figure 8</label>
<caption>
<title>(a) Impact of the number of WSNs on average delay; (b) Impact of the number of WSNs on average energy consumption</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_55614-fig-8.tif"/>
</fig><fig id="fig-9">
<label>Figure 9</label>
<caption>
<title>Overall overhead comparison of all solutions</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_55614-fig-9.tif"/>
</fig>
<p>The overall overhead changes little when the tasks are all computed on the local MEC server, but more system overhead is produced. This suggests that local MEC can efficiently handle tasks but may encounter performance bottlenecks due to resource constraints. The random offload scheme shows greater volatility because this approach does not consider the offload object resource situation. If the offload object has more sufficient computational resources, the system overhead is lower than that of the local MEC. In contrast, if the offloading object does not have enough arithmetic power, it does not work better and causes network blockage and energy consumption. TD3 and DDPG have a significant effect on the overall overhead reduction but exhibit slow convergence and unstable convergence. TD3 demonstrates superior performance compared to DDPG but converges slower than AGA-DDPG, reflecting the optimization and resource utilization advantages of the AGA-DDPG algorithm. Simulation results indicate that AGA-DDPG significantly reduces overall overhead compared to local MEC, random offloading, TD3, and DDPG algorithms, achieving savings of 61.8%, 55%, 21%, and 33%, respectively. This underscores AGA-DDPG&#x2019;s ability to more effectively optimize resource allocation in task offloading decisions, thereby lowering system costs.</p>
</sec>
</sec>
<sec id="s6">
<label>6</label>
<title>Conclusions</title>
<p>In this paper, the task offloading strategies to offshore wind farm operations address network congestion and high latency issues in cloud-based maintenance methods. Firstly, we propose the use of UAVs installed with MEC servers at offshore wind farms, marking a novel approach to task offloading services and addressing the scarcity of edge servers in offshore wind environments, thus filling a significant gap in the existing literature. Secondly, the implementation of the MADRL framework enables centralized training of agents with decentralized execution, crucial for dynamic offshore environments. Finally, by introducing the AGA to explore the action space of DDPG, we mitigate the slow convergence of DDPG in high-dimensional action spaces. Experimental results demonstrate that our proposed AGA-DDPG algorithm offers significant advantages over other methods in terms of overall cost savings.</p>
<p>While this research demonstrates the potential of new task offloading strategies for offshore wind farms, we acknowledge discrepancies between simulation environments and real-world conditions that may impact algorithm performance. Additionally, our model relies on idealized assumptions such as fixed task sizes and ideal communication conditions, limiting its applicability and generalizability in real-world settings. Future work should involve field experiments for validation, broader performance evaluations, and integration with IoT and big data analytics to further enhance algorithm efficiency and applicability in practical scenarios. These efforts will contribute to advancing the use of edge computing in offshore wind farms and other offshore environments, providing reliable technical support for future intelligent operations and resource management.</p>
</sec>
</body>
<back>
<ack>
<p>The authors would like to express their gratitude for the valuable feedback and suggestions provided by all the anonymous reviewers and the editorial team.</p>
</ack>
<sec><title>Funding Statement</title>
<p>This work was supported in part by the National Natural Science Foundation of China under grant 61861007; in part by the Guizhou Province Science and Technology Planning Project ZK [2021]303; in part by the Guizhou Province Science Technology Support Plan under grant [2022]264, [2023]096, [2023]409 and [2023]412; in part by the Science Technology Project of POWERCHINA Guizhou Engineering Co., Ltd. (DJ-ZDXM-2022-44); in part by the Project of POWERCHINA Guiyang Engineering Corporation Limited (YJ2022-12).</p>
</sec>
<sec><title>Author Contributions</title>
<p>The authors confirm contribution to the paper as follows: Conceptualization, Zeshuang Song and Xiao Wang; methodology, Xiao Wang and Zeshuang Song; software, Zeshuang Song and Qing Wu; validation, Zeshuang Song and Yanting Tao; formal analysis, Linghua Xu and Jianguo Yan; investigation, Yaohua Yin; writing&#x2014;original draft preparation, Zeshuang Song; writing&#x2014;review and editing, Zeshuang Song and Xiao Wang; visualization, Zeshuang Song and Qing Wu; supervision, Xiao Wang. All authors reviewed the results and approved the final version of the manuscript.</p>
</sec>
<sec sec-type="data-availability"><title>Availability of Data and Materials</title>
<p>According to the edge tasks offloading in the real world, we used Python language to simulate and validate our proposed method; the data source mainly comes from laboratory simulations, rather than real-life datasets, so we feel that there is no need to present these simulated data.</p>
</sec>
<sec><title>Ethics Approval</title>
<p>Not applicable.</p>
</sec>
<sec sec-type="COI-statement"><title>Conflicts of Interest</title>
<p>The authors declare that they have no conflicts of interest to report regarding the present study.</p>
</sec>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Qin</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Qu</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Song</surname></string-name>, <string-name><given-names>Q.</given-names> <surname>Wang</surname></string-name>, and <string-name><given-names>L.</given-names> <surname>Huang</surname></string-name></person-group>, &#x201C;<article-title>Planning of MTDC system for offshore wind farm clusters considering wind power curtailment caused by failure</article-title>,&#x201D; (in Chinese), <source>Smart Power</source>, vol. <volume>51</volume>, no. <issue>6</issue>, pp. <fpage>21</fpage>&#x2013;<lpage>27</lpage>, <year>2023</year>.</mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Zhang</surname></string-name> and <string-name><given-names>W.</given-names> <surname>Hao</surname></string-name></person-group>, &#x201C;<article-title>Development of offshore wind power and foundation technology for offshore wind turbines in China</article-title>,&#x201D; <source>Ocean Eng.</source>, vol. <volume>266</volume>, no. <issue>5</issue>, <year>Dec. 2022</year>. doi: <pub-id pub-id-type="doi">10.1016/j.oceaneng.2022.113256</pub-id>.</mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>H. M.</given-names> <surname>Toonen</surname></string-name>, and <string-name><given-names>H. J.</given-names> <surname>Lindeboom</surname></string-name></person-group>, &#x201C;<article-title>Dark green electricity comes from the sea: Capitalizing on ecological merits of offshore wind power?</article-title>&#x201D; <source>Renew. Sustain. Energ. Rev.</source>, vol. <volume>42</volume>, pp. <fpage>1023</fpage>&#x2013;<lpage>1033</lpage>, <year>2015</year>. doi: <pub-id pub-id-type="doi">10.1016/j.rser.2014.10.043</pub-id>.</mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Chen</surname></string-name> and <string-name><given-names>M. H.</given-names> <surname>Kim</surname></string-name></person-group>, &#x201C;<article-title>Review of recent offshore wind turbine research and optimization methodologies in their design</article-title>,&#x201D; <source>J. Mar. Sci. Eng.</source>, vol. <volume>10</volume>, no. <issue>1</issue>, <year>Jan. 2022</year>. doi: <pub-id pub-id-type="doi">10.3390/jmse10010028</pub-id>.</mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Y. S.</given-names> <surname>Hamed</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>Vibration performance, stability and energy transfer of wind turbine tower via PD controller</article-title>,&#x201D; <source>Comput. Mater. Contin.</source>, vol. <volume>64</volume>, no. <issue>2</issue>, pp. <fpage>871</fpage>&#x2013;<lpage>886</lpage>, <year>2020</year>. doi: <pub-id pub-id-type="doi">10.32604/cmc.2020.08120</pub-id>.</mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Y.</given-names> <surname>Song</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>Remotely monitoring offshore wind turbines via ZigBee networks embedded with an advanced routing strategy</article-title>,&#x201D; <source>J. Renew. Sustain. Energy</source>, vol. <volume>5</volume>, no. <issue>1</issue>, <year>Jan. 2013</year>. doi: <pub-id pub-id-type="doi">10.1063/1.4773467</pub-id>.</mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><given-names>R.</given-names> <surname>Flagg</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<chapter-title>Cabled community observatories for coastal monitoring-develo<sup>&#x002A;&#x002A;</sup> priorities and comparing results</chapter-title>,&#x201D; in <source>Global Oceans 2020: Singapore&#x2013;US Gulf Coast</source>, <publisher-loc>Biloxi, MS, USA</publisher-loc>, <year>2020</year>, vol. <volume>3</volume>, pp. <fpage>1</fpage>&#x2013;<lpage>8</lpage>. doi: <pub-id pub-id-type="doi">10.1109/IEEECONF38699.2020.9389268</pub-id>.</mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>H.</given-names> <surname>Kobari</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Du</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Wu</surname></string-name>, <string-name><given-names>T.</given-names> <surname>Yoshinaga</surname></string-name>, and <string-name><given-names>W.</given-names> <surname>Bao</surname></string-name></person-group>, &#x201C;<article-title>A reinforcement learning based edge cloud collaboration</article-title>,&#x201D; in <conf-name>2021 Int. Conf. Inf. Commun. Technol. Disaster Manag. (ICT-DM)</conf-name>, <publisher-loc>Hangzhou, China</publisher-loc>, <year>2021</year>, pp. <fpage>26</fpage>&#x2013;<lpage>29</lpage>. doi: <pub-id pub-id-type="doi">10.1109/ICT-DM52643.2021.9664025</pub-id>.</mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>P.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>He</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Cui</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Xu</surname></string-name>, and <string-name><given-names>L.</given-names> <surname>Ren</surname></string-name></person-group>, &#x201C;<article-title>An edge-computing framework for operational modal analysis of offshore wind-turbine tower</article-title>,&#x201D; <source>Ocean Eng.</source>, vol. <volume>287</volume>, no. <issue>1</issue>, <year>Nov. 2023</year>. doi: <pub-id pub-id-type="doi">10.1016/j.oceaneng.2023.115720</pub-id>.</mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Z.</given-names> <surname>Wu</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Yang</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Yang</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Lin</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Liu</surname></string-name> and <string-name><given-names>X.</given-names> <surname>Chen</surname></string-name></person-group>, &#x201C;<article-title>Joint deployment and trajectory optimization in UAV-assisted vehicular edge computing networks</article-title>,&#x201D; <source>J. Commun. Netw.</source>, vol. <volume>24</volume>, no. <issue>1</issue>, pp. <fpage>47</fpage>&#x2013;<lpage>58</lpage>, <year>Sep. 2021</year>. doi: <pub-id pub-id-type="doi">10.23919/JCN.2021.000026</pub-id>.</mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Yu</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Zhou</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Gong</surname></string-name>, and <string-name><given-names>D.</given-names> <surname>Wu</surname></string-name></person-group>, &#x201C;<article-title>When deep reinforcement learning meets federated learning: Intelligent multitimescale resource management for multiaccess edge computing in 5G ultradense network</article-title>,&#x201D; <source>IEEE Internet Things J.</source>, vol. <volume>8</volume>, no. <issue>4</issue>, pp. <fpage>2238</fpage>&#x2013;<lpage>2251</lpage>, <year>Feb. 2021</year>. doi: <pub-id pub-id-type="doi">10.1109/JIOT.2020.3026589</pub-id>.</mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Xue</surname></string-name>, <string-name><given-names>Q.</given-names> <surname>Wu</surname></string-name>, and <string-name><given-names>H.</given-names> <surname>Zhang</surname></string-name></person-group>, &#x201C;<article-title>Cost optimization of UAV-MEC network calculation offloading: A multi-agent reinforcement learning method</article-title>,&#x201D; <source>Arxiv Ad Hoc Netw.</source>, vol. <volume>136</volume>, <year>2022</year>. doi: <pub-id pub-id-type="doi">10.1016/j.adhoc.2022.102981</pub-id>.</mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Z.</given-names> <surname>Chen</surname></string-name> and <string-name><given-names>X.</given-names> <surname>Wang</surname></string-name></person-group>, &#x201C;<article-title>Decentralized computation offloading for multi-user mobile edge computing: A deep reinforcement learning approach</article-title>,&#x201D; <source>EURASIP J. Wirel. Commun. Netw.</source>, vol. <volume>2020</volume>, no. <issue>1</issue>, <year>Sep. 2022</year>. doi: <pub-id pub-id-type="doi">10.1186/s13638-020-01801-6</pub-id>.</mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>X.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>W.</given-names> <surname>Wu</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Liu</surname></string-name>, and <string-name><given-names>J.</given-names> <surname>Wang</surname></string-name></person-group>, &#x201C;<article-title>An efficient computation offloading and resource allocation algorithm in RIS empowered MEC</article-title>,&#x201D; <source>Comput. Commun.</source>, vol. <volume>197</volume>, pp. <fpage>113</fpage>&#x2013;<lpage>123</lpage>, <year>2023</year>. doi: <pub-id pub-id-type="doi">10.1016/j.comcom.2022.10.012</pub-id>.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>X.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Lu</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Sun</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Song</surname></string-name> and <string-name><given-names>M.</given-names> <surname>Nicolas</surname></string-name></person-group>, &#x201C;<article-title>Optimization scheme of trusted task offloading in IIoT scenario based on DQN</article-title>,&#x201D; <source>Comput. Mater. Contin.</source>, vol. <volume>74</volume>, no. <issue>1</issue>, pp. <fpage>2055</fpage>&#x2013;<lpage>2071</lpage>, <year>2023</year>. doi: <pub-id pub-id-type="doi">10.32604/cmc.2023.031750</pub-id>.</mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Y. H.</given-names> <surname>Xu</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>W.</given-names> <surname>Zhou</surname></string-name>, and <string-name><given-names>G.</given-names> <surname>Yu</surname></string-name></person-group>, &#x201C;<article-title>Generative adversarial LSTM networks learning for resource allocation in UAV-Served M2M communications</article-title>,&#x201D; <source>IEEE Wirel. Commun. Lett.</source>, vol. <volume>10</volume>, no. <issue>7</issue>, pp. <fpage>1601</fpage>&#x2013;<lpage>1605</lpage>, <year>2021</year>. doi: <pub-id pub-id-type="doi">10.1109/LWC.2021.3075467</pub-id>.</mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>P. K. R.</given-names> <surname>Maddikunta</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>Incentive techniques for the internet of things: A survey</article-title>,&#x201D; <source>J. Netw. Comput. Appl.</source>, vol. <volume>206</volume>, <year>Oct. 2023</year>. doi: <pub-id pub-id-type="doi">10.1016/j.jnca.2022.103464</pub-id>.</mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Z.</given-names> <surname>Gao</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Yang</surname></string-name>, and <string-name><given-names>Y.</given-names> <surname>Dai</surname></string-name></person-group>, &#x201C;<article-title>Fast adaptive task offloading and resource allocation via multiagent reinforcement learning in heterogeneous vehicular fog computing</article-title>,&#x201D; <source>IEEE Internet Things J.</source>, vol. <volume>10</volume>, no. <issue>8</issue>, pp. <fpage>6818</fpage>&#x2013;<lpage>6835</lpage>, <year>2023</year>. doi: <pub-id pub-id-type="doi">10.1109/JIOT.2022.3228246</pub-id>.</mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Gao</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Shen</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Shi</surname></string-name>, <string-name><given-names>W.</given-names> <surname>Qi</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Li</surname></string-name> and <string-name><given-names>Y.</given-names> <surname>Li</surname></string-name></person-group>, &#x201C;<article-title>Task partitioning and offloading in DNN-task enabled mobile edge computing networks</article-title>,&#x201D; <source>IEEE Trans. Mob. Comput.</source>, vol. <volume>22</volume>, no. <issue>4</issue>, pp. <fpage>2435</fpage>&#x2013;<lpage>2445</lpage>, <year>2023</year>. doi: <pub-id pub-id-type="doi">10.1109/TMC.2021.3114193</pub-id>.</mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>H.</given-names> <surname>Zhou</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Long</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Gong</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Zhu</surname></string-name>, <string-name><given-names>D. T.</given-names> <surname>Hoang</surname></string-name> and <string-name><given-names>D.</given-names> <surname>Niyato</surname></string-name></person-group>, &#x201C;<article-title>Hierarchical multi-agent deep reinforcement learning for energy-efficient hybrid computation offloading</article-title>,&#x201D; <source>IEEE Trans. Veh. Technol.</source>, vol. <volume>72</volume>, no. <issue>1</issue>, pp. <fpage>986</fpage>&#x2013;<lpage>1001</lpage>, <year>2023</year>. doi: <pub-id pub-id-type="doi">10.1109/TVT.2022.3202525</pub-id>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>L.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Lu</surname></string-name>, and <string-name><given-names>H.</given-names> <surname>Sun</surname></string-name></person-group>, &#x201C;<article-title>Energy-aware and mobility-driven computation offloading in MEC</article-title>,&#x201D; <source>J. Grid Computing</source>, vol. <volume>21</volume>, no. <issue>2</issue>, <year>Jun. 2023</year>. doi: <pub-id pub-id-type="doi">10.1007/s10723-023-09654-1</pub-id>.</mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>B.</given-names> <surname>Vijayaram</surname></string-name> and <string-name><given-names>V.</given-names> <surname>Vasudevan</surname></string-name></person-group>, &#x201C;<article-title>Wireless edge device intelligent task offloading in mobile edge computing using hyper-heuristics</article-title>,&#x201D; <source>EURASIP J. Adv. Signal Process</source>, vol. <volume>2022</volume>, no. <issue>1</issue>, <year>Dec. 2022</year>. doi: <pub-id pub-id-type="doi">10.1186/s13634-022-00965-1</pub-id>.</mixed-citation></ref>
<ref id="ref-23"><label>[23]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>O.</given-names> <surname>Karatalay</surname></string-name>, <string-name><given-names>I.</given-names> <surname>Psaromiligkos</surname></string-name>, and <string-name><given-names>B.</given-names> <surname>Champagne</surname></string-name></person-group>, &#x201C;<article-title>Energy-efficient resource allocation for D2D-assisted fog computing</article-title>,&#x201D; <source>IEEE Trans. Green Commun. Netw.</source>, vol. <volume>6</volume>, no. <issue>4</issue>, pp. <fpage>1990</fpage>&#x2013;<lpage>2002</lpage>, <year>Dec. 2022</year>. doi: <pub-id pub-id-type="doi">10.1109/TGCN.2022.3190085</pub-id>.</mixed-citation></ref>
<ref id="ref-24"><label>[24]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>F.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Yao</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Du</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Jiang</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Han</surname></string-name> and <string-name><given-names>Y.</given-names> <surname>Liu</surname></string-name></person-group>, &#x201C;<article-title>Auction design for edge computation offloading in sdn-based ultra dense networks</article-title>,&#x201D; <source>IEEE Trans. Mob. Comput.</source>, vol. <volume>21</volume>, no. <issue>5</issue>, pp. <fpage>1580</fpage>&#x2013;<lpage>1595</lpage>, <year>May 2022</year>. doi: <pub-id pub-id-type="doi">10.1109/TMC.2020.3026319</pub-id>.</mixed-citation></ref>
<ref id="ref-25"><label>[25]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>Z.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Lei</surname></string-name>, and <string-name><given-names>X.</given-names> <surname>Song</surname></string-name></person-group>, &#x201C;<article-title>Multi-agent DDPG enpowered uav trajectory optimization for computation task offloading</article-title>,&#x201D; in <conf-name>Int. Conf. Commun. Technol. (ICCT)</conf-name>, <publisher-loc>Nanjing, China</publisher-loc>, <year>Nov. 2022</year>, pp. <fpage>608</fpage>&#x2013;<lpage>612</lpage>. doi: <pub-id pub-id-type="doi">10.1109/ICCT56141.2022.10073166</pub-id>.</mixed-citation></ref>
<ref id="ref-26"><label>[26]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Z.</given-names> <surname>Guo</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Chen</surname></string-name>, and <string-name><given-names>S.</given-names> <surname>Li</surname></string-name></person-group>, &#x201C;<article-title>Deep reinforcement learning-based one-to-multiple cooperative computing in large-scale event-driven wireless sensor networks</article-title>,&#x201D; <source>Sensors</source>, vol. <volume>23</volume>, no. <issue>6</issue>, <year>Mar. 2023</year>. doi: <pub-id pub-id-type="doi">10.3390/s23063237</pub-id>; <pub-id pub-id-type="pmid">36991947</pub-id></mixed-citation></ref>
<ref id="ref-27"><label>[27]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>T. P.</given-names> <surname>Truong</surname></string-name>, <string-name><given-names>N. N.</given-names> <surname>Dao</surname></string-name>, and <string-name><given-names>S.</given-names> <surname>Cho</surname></string-name></person-group>, &#x201C;<article-title>HAMEC-RSMA: Enhanced aerial computing systems with rate splitting multiple access</article-title>,&#x201D; <source>IEEE Access</source>, vol. <volume>10</volume>, pp. <fpage>52398</fpage>&#x2013;<lpage>52409</lpage>, <year>2022</year>. doi: <pub-id pub-id-type="doi">10.1109/ACCESS.2022.3173125</pub-id>.</mixed-citation></ref>
<ref id="ref-28"><label>[28]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>H.</given-names> <surname>Ke</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Deng</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Ge</surname></string-name>, and <string-name><given-names>H.</given-names> <surname>Wang</surname></string-name></person-group>, &#x201C;<article-title>Deep reinforcement learning-based adaptive computation offloading for MEC in heterogeneous vehicular networks</article-title>,&#x201D; <source>IEEE Trans. Veh. Technol.</source>, vol. <volume>69</volume>, no. <issue>7</issue>, pp. <fpage>7916</fpage>&#x2013;<lpage>7929</lpage>, <year>Jul. 2020</year>. doi: <pub-id pub-id-type="doi">10.1109/TVT.2020.2993849</pub-id>.</mixed-citation></ref>
<ref id="ref-29"><label>[29]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M. R.</given-names> <surname>Haque</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>Unprecedented Smart algorithm for uninterrupted sdn services during DDoS attack</article-title>,&#x201D; <source>Comput. Mater. Contin.</source>, vol. <volume>70</volume>, no. <issue>1</issue>, pp. <fpage>875</fpage>&#x2013;<lpage>894</lpage>, <year>2022</year>. doi: <pub-id pub-id-type="doi">10.32604/cmc.2022.018505</pub-id>.</mixed-citation></ref>
<ref id="ref-30"><label>[30]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>H.</given-names> <surname>Ke</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Wang</surname></string-name>, and <string-name><given-names>H.</given-names> <surname>Sun</surname></string-name></person-group>, &#x201C;<article-title>Multi-agent deep reinforcement learning-based partial task offloading and resource allocation in edge computing environment</article-title>,&#x201D; <source>Electronics</source>, vol. <volume>11</volume>, no. <issue>15</issue>, <year>Aug. 2022</year>. doi: <pub-id pub-id-type="doi">10.3390/electronics11152394</pub-id>.</mixed-citation></ref>
<ref id="ref-31"><label>[31]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>X.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>He</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Sun</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Yuan</surname></string-name>, and <string-name><given-names>M.</given-names> <surname>Peng</surname></string-name></person-group>, &#x201C;<article-title>Joint sensing, communication, and computation resource allocation for cooperative perception in fog-based vehicular networks</article-title>,&#x201D; in <conf-name>13th Int. Conf. Wirel. Commun. Signal Process. (WCSP)</conf-name>, <publisher-loc>Changsha, China</publisher-loc>, <year>2021</year>, pp. <fpage>1</fpage>&#x2013;<lpage>6</lpage>.</mixed-citation></ref>
<ref id="ref-32"><label>[32]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Yang</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Zhang</surname></string-name>, and <string-name><given-names>W.</given-names> <surname>Wu</surname></string-name></person-group>, &#x201C;<article-title>A federated learning and deep reinforcement learning-based method with two types of agents for computation offload</article-title>,&#x201D; <source>Sensors</source>, vol. <volume>23</volume>, no. <issue>4</issue>, <year>Feb. 2023</year>. doi: <pub-id pub-id-type="doi">10.3390/s23042243</pub-id>; <pub-id pub-id-type="pmid">36850846</pub-id></mixed-citation></ref>
<ref id="ref-33"><label>[33]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>C.</given-names> <surname>You</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Huang</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Chae</surname></string-name>, and <string-name><given-names>B. H.</given-names> <surname>Kim</surname></string-name></person-group>, &#x201C;<article-title>Energy-efficient resource allocation for mobile-edge computation offloading</article-title>,&#x201D; <source>IEEE Trans. Wirel. Commun.</source>, vol. <volume>16</volume>, no. <issue>3</issue>, pp. <fpage>1397</fpage>&#x2013;<lpage>1411</lpage>, <year>2016</year>. doi: <pub-id pub-id-type="doi">10.1109/TWC.2016.2633522</pub-id>.</mixed-citation></ref>
<ref id="ref-34"><label>[34]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>L.</given-names> <surname>Kang</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Hu</surname></string-name>, <string-name><given-names>F.</given-names> <surname>Jiang</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Bai</surname></string-name> and <string-name><given-names>Y.</given-names> <surname>Deng</surname></string-name></person-group>, &#x201C;<article-title>JUTAR: Joint user-association, task-partition, and resource-allocation algorithm for MEC networks</article-title>,&#x201D; <source>Sensors</source>, vol. <volume>23</volume>, no. <issue>3</issue>, <year>Feb. 2023</year>. doi: <pub-id pub-id-type="doi">10.3390/s23031601</pub-id>; <pub-id pub-id-type="pmid">36772641</pub-id></mixed-citation></ref>
<ref id="ref-35"><label>[35]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A. M.</given-names> <surname>Seid</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Lu</surname></string-name>, <string-name><given-names>H. N.</given-names> <surname>Abishu</surname></string-name>, and <string-name><given-names>T. A.</given-names> <surname>Ayall</surname></string-name></person-group>, &#x201C;<article-title>Blockchain-enabled task offloading with energy harvesting in multi-uav-assisted IoT networks: A multi-agent drl approach</article-title>,&#x201D; <source>IEEE J. Sel. Areas Commun.</source>, vol. <volume>40</volume>, no. <issue>12</issue>, pp. <fpage>3517</fpage>&#x2013;<lpage>3532</lpage>, <year>Dec. 2022</year>. doi: <pub-id pub-id-type="doi">10.1109/JSAC.2022.3213352</pub-id>.</mixed-citation></ref>
<ref id="ref-36"><label>[36]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Q.</given-names> <surname>Luo</surname></string-name>, <string-name><given-names>T. H.</given-names> <surname>Luan</surname></string-name>, <string-name><given-names>W.</given-names> <surname>Shi</surname></string-name>, and <string-name><given-names>P.</given-names> <surname>Fan</surname></string-name></person-group>, &#x201C;<article-title>Deep reinforcement learning based computation offloading and trajectory planning for multi-uav cooperative target search</article-title>,&#x201D; <source>IEEE J. Sel. Areas Commun.</source>, vol. <volume>41</volume>, no. <issue>2</issue>, pp. <fpage>504</fpage>&#x2013;<lpage>520</lpage>, <year>Feb. 2023</year>. doi: <pub-id pub-id-type="doi">10.1109/JSAC.2022.3228558</pub-id>.</mixed-citation></ref>
<ref id="ref-37"><label>[37]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>H.</given-names> <surname>Huang</surname></string-name>, <string-name><given-names>Q.</given-names> <surname>Ye</surname></string-name>, and <string-name><given-names>Y.</given-names> <surname>Zhou</surname></string-name></person-group>, &#x201C;<article-title>Deadline-aware task offloading with partially-observable deep reinforcement learning for multi-access edge computing</article-title>,&#x201D; <source>IEEE Trans. Netw. Sci. Eng.</source>, vol. <volume>9</volume>, no. <issue>6</issue>, pp. <fpage>3870</fpage>&#x2013;<lpage>3885</lpage>, <year>Nov. 2022</year>. doi: <pub-id pub-id-type="doi">10.1109/TNSE.2021.3115054</pub-id>.</mixed-citation></ref>
<ref id="ref-38"><label>[38]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>X.</given-names> <surname>Deng</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Yin</surname></string-name>, <string-name><given-names>P.</given-names> <surname>Guan</surname></string-name>, <string-name><given-names>N. N.</given-names> <surname>Xiong</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Zhang</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Mumtaz</surname></string-name></person-group>, &#x201C;<article-title>Intelligent delay-aware partial computing task offloading for multiuser industrial internet of things through edge computing</article-title>,&#x201D; <source>IEEE Internet Things J.</source>, vol. <volume>10</volume>, no. <issue>4</issue>, pp. <fpage>2954</fpage>&#x2013;<lpage>2966</lpage>, <year>Feb. 2023</year>. doi: <pub-id pub-id-type="doi">10.1109/JIOT.2021.3123406</pub-id>.</mixed-citation></ref>
<ref id="ref-39"><label>[39]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Cao</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Zhan</surname></string-name> and <string-name><given-names>W.</given-names> <surname>Zhang</surname></string-name></person-group>, &#x201C;<article-title>Research on hybrid computation offloading strategy for MEC based on DDPG</article-title>,&#x201D; <source>Electronics</source>, vol. <volume>12</volume>, no. <issue>3</issue>, <year>Feb. 2023</year>. doi: <pub-id pub-id-type="doi">10.3390/electronics12030562</pub-id>.</mixed-citation></ref>
<ref id="ref-40"><label>[40]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>C.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>W.</given-names> <surname>Gao</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Shi</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Shang</surname></string-name>, and <string-name><given-names>S.</given-names> <surname>Zhang</surname></string-name></person-group>, &#x201C;<article-title>Task scheduling based on adaptive priority experience replay on cloud platforms</article-title>,&#x201D; <source>Electronics</source>, vol. <volume>12</volume>, no. <issue>6</issue>, <year>Mar. 2023</year>. doi: <pub-id pub-id-type="doi">10.3390/electronics12061358</pub-id>.</mixed-citation></ref>
<ref id="ref-41"><label>[41]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>G.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Ni</surname></string-name>, and <string-name><given-names>P.</given-names> <surname>Zhao</surname></string-name></person-group>, &#x201C;<article-title>Learning-based joint optimization of energy delay and privacy in multiple-user edge-cloud collaboration MEC systems</article-title>,&#x201D; <source>IEEE Internet Things J.</source>, vol. <volume>9</volume>, no. <issue>2</issue>, pp. <fpage>1491</fpage>&#x2013;<lpage>1502</lpage>, <year>2022</year>. doi: <pub-id pub-id-type="doi">10.1109/JIOT.2021.3088607</pub-id>.</mixed-citation></ref>
</ref-list>
</back></article>