<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CMC</journal-id>
<journal-id journal-id-type="nlm-ta">CMC</journal-id>
<journal-id journal-id-type="publisher-id">CMC</journal-id>
<journal-title-group>
<journal-title>Computers, Materials &#x0026; Continua</journal-title>
</journal-title-group>
<issn pub-type="epub">1546-2226</issn>
<issn pub-type="ppub">1546-2218</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">31698</article-id>
<article-id pub-id-type="doi">10.32604/cmc.2022.031698</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>A Novel Method for Routing Optimization in Software-Defined Networks</article-title>
<alt-title alt-title-type="left-running-head">A Novel Method for Routing Optimization in Software-Defined Networks</alt-title>
<alt-title alt-title-type="right-running-head">A Novel Method for Routing Optimization in Software-Defined Networks</alt-title>
</title-group>
<contrib-group content-type="authors">
<contrib id="author-1" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Alkhalaf</surname><given-names>Salem</given-names></name><email>s.alkhalaf@qu.edu.sa</email></contrib>
<contrib id="author-2" contrib-type="author">
<name name-style="western"><surname>Alturise</surname><given-names>Fahad</given-names></name></contrib>
<aff id="aff-1"><institution>Department of Computer, College of Science and Arts in ArRass Qassim University</institution>, <addr-line>Ar Rass, Qassim</addr-line>, <country>Saudi Arabia</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>&#x002A;</label>Corresponding Author: Salem Alkhalaf. Email: <email>s.alkhalaf@qu.edu.sa</email></corresp>
</author-notes>
<pub-date pub-type="epub" date-type="pub" iso-8601-date="2022-07-25"><day>25</day>
<month>07</month>
<year>2022</year></pub-date>
<volume>73</volume>
<issue>3</issue>
<fpage>6393</fpage>
<lpage>6405</lpage>
<history>
<date date-type="received"><day>25</day><month>4</month><year>2022</year></date>
<date date-type="accepted"><day>07</day><month>6</month><year>2022</year></date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2022 Alkhalaf and Alturise</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Alkhalaf and Alturise</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CMC_31698.pdf"></self-uri>
<abstract>
<p>Software-defined network (SDN) is a new form of network architecture that has programmability, ease of use, centralized control, and protocol independence. It has received high attention since its birth. With SDN network architecture, network management becomes more efficient, and programmable interfaces make network operations more flexible and can meet the different needs of various users. The mainstream communication protocol of SDN is OpenFlow, which contains a Match Field in the flow table structure of the protocol, which matches the content of the packet header of the data received by the switch, and completes the corresponding actions according to the matching results, getting rid of the dependence on the protocol to avoid designing a new protocol. In order to effectively optimize the routing for SDN, this paper proposes a novel algorithm based on reinforcement learning. The proposed technique can maximize numerous objectives to dynamically update the routing strategy, and it has great generality and is not reliant on any specific network state. The control of routing strategy is more complicated than many Q-learning-based algorithms due to the employment of reinforcement learning. The performance of the method is tested by experiments using the OMNe&#x002B;&#x002B; simulator. The experimental results reveal that our PPO-based SDN routing control method has superior performance and stability than existing algorithms.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Reinforcement learning</kwd>
<kwd>routing algorithm</kwd>
<kwd>software-defined network</kwd>
<kwd>optimization</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1"><label>1</label><title>Introduction</title>
<p>Software-Defined Networking (SDN) breaks the vertical structure of the traditional network model, separates control and forwarding, and provides more possibilities for network innovation and evolution. It is considered to be the main architecture of future networks, but in recent years, network scale Continuing to expand, network traffic has also grown exponentially, and the requirements for network control have become increasingly sophisticated [<xref ref-type="bibr" rid="ref-1">1</xref>&#x2013;<xref ref-type="bibr" rid="ref-5">5</xref>]. Traditional network routing schemes rely primarily on the shortest path method, which has drawbacks such as sluggish convergence and difficulties coping with network congestion [<xref ref-type="bibr" rid="ref-6">6</xref>&#x2013;<xref ref-type="bibr" rid="ref-9">9</xref>]. As a result, establishing an effective optimization technique for network routing through the controller is a critical step in ensuring network services and advancing SDN [<xref ref-type="bibr" rid="ref-10">10</xref>].</p>
<p>Machine learning, particularly deep learning technology, has gained a lot of attention because of its outstanding performance in large-scale data processing, classification, and intelligent decision-making [<xref ref-type="bibr" rid="ref-11">11</xref>&#x2013;<xref ref-type="bibr" rid="ref-14">14</xref>]. Many researchers employ SDN networks to overcome the challenges in network operation and administration [<xref ref-type="bibr" rid="ref-15">15</xref>&#x2013;<xref ref-type="bibr" rid="ref-17">17</xref>]. The authors of [<xref ref-type="bibr" rid="ref-18">18</xref>] presented a machine learning-based route pre-design approach. This method first extracts network traffic characteristics using a clustering algorithm (such as Gaussian mixture model or K-means model), then predicts traffic demand using a supervised learning method (such as extreme learning machine), and finally handles traffic routing problems using analytic hierarchy process (AHP)-based automatic adapt dynamic algorithms. The authors devised a heuristic technique to improve SDN routing in [<xref ref-type="bibr" rid="ref-19">19</xref>], but the algorithm&#x2019;s performance does not remain steady as the network evolves. The ant colony algorithm and the genetic algorithm [<xref ref-type="bibr" rid="ref-20">20</xref>&#x2013;<xref ref-type="bibr" rid="ref-21">21</xref>] are examples of heuristic algorithms that have been used to address routing problems. These algorithms, on the other hand, perform poorly in terms of generalization. When the network&#x2019;s condition changes, the parameters of these heuristic algorithms must also change, making the process difficult to repeat consistently.</p>
<p>Reinforcement learning is frequently used to tackle route optimization issues because it can conduct dynamic decision management by continually engaging with the environment. Reference [<xref ref-type="bibr" rid="ref-22">22</xref>] offers a reward scheme for network service quality measurements using Q-Learning. An end-to-end adaptive hypertext transfer protocol (HTTP) streaming intelligent transmission architecture based on a partially observable Markov decision process with a Q-Learning decision algorithm is shown in reference [<xref ref-type="bibr" rid="ref-23">23</xref>]. These Q-Learning based reinforcement learning algorithms require solving of Q-tables for control decisions [<xref ref-type="bibr" rid="ref-24">24</xref>]. However, the state space of SDN network is very huge, and the algorithm based on Q-Learning cannot describe the state well. At the same time, as a value method in reinforcement learning, the Q-Learning only works in the discrete action space for the output control action, and the decision action space is very limited. Therefore, it is a huge challenge to design a dynamic routing strategy that can perform fine-grained analysis and control of SDN networks.</p>
<p>This work proposed the proximal policy optimization (PPO) technique, which is based on this algorithm, for making routing scheme judgements in the SDN control plane. The following are some of the benefits of the suggested algorithm:
<list list-type="bullet">
<list-item><p>First, the approach employs a neural network to precisely determine the Q value of the SDN network state, bypassing the Q table&#x2019;s restrictions and inefficiencies.</p></list-item>
<list-item><p>Simultaneously, the algorithm is part of the reinforcement learning strategy technique, which may produce a finer-grained control scheme for network decision-making.</p></list-item>
<list-item><p>Finally, the system may adapt the reinforcement learning reward function according to different optimization targets to dynamically optimize the routing method. Because it is not dependent on any specific network state and has great generalization performance, this technique successfully performs black-box optimization.</p></list-item>
</list></p>
<p>The performance of the algorithm is evaluated by experiments based on Omnet&#x002B;&#x002B; simulation software. The experimental findings reveal that the proposed algorithm outperforms both the classic shortest path-based static routing approach and the QAR method reported in [<xref ref-type="bibr" rid="ref-25">25</xref>] in terms of performance and stability.</p>
</sec>
<sec id="s2"><label>2</label><title>Background</title>
<sec id="s2_1"><label>2.1</label><title>SDN and Knowledge Plane Network</title>
<p>The conventional Internet architecture is primarily based on the OSI seven-layer or TCP/IP four-layer protocol model, with data transfer between network devices taking place at each layer via matching network protocols (exchange, routing, labeling, security and other protocols). The general workflow is as follows: the neighbor establishment-information sharing-path selection is realized in three steps. In addition, the transmission of information between network devices adopts a typical distributed architecture. The devices exchange information in the form of &#x201C;baton&#x201D;, then establish database information, and then transmit data according to the relevant path algorithm (such as Dijkstra&#x2019;s shortest path algorithm). Each layer of equipment calculates independently, has independent controllers and forwarding hardware, and communicates through protocols.</p>
<p>This distributed architecture facilitated the Internet&#x2019;s flourishing in the past when protocol specifications were incomplete. However, with the gradual unification and improvement of communication equipment protocols, the distributed architecture has gradually reached a bottleneck, and many problems have been highlighted, such as redundant transmission table information, difficult traffic control, and inability of equipment to customize transmission. The root cause of these problems is that the data and control of network equipment are coupled in the traditional architecture, and the equipment does not have an open programmable interface, so it is impossible to separate data forwarding and data control.</p>
<p>The concept of SDN is proposed to solve this problem. Its core idea is to centralize all the information on the network to a core controller, so that the controller can directly manipulate the underlying infrastructure in a centralized way. This method can provide great flexibility for data transfer, so it has been widely used in recent years [<xref ref-type="bibr" rid="ref-26">26</xref>].</p>
<p>It separates the control plane from the data plane, allowing the control plane&#x2019;s controller to fully comprehend all the information in the network, resulting in a more efficient and speedy overall routing system. A knowledge plane network paradigm is presented in [<xref ref-type="bibr" rid="ref-27">27</xref>], which adds a knowledge plane to the typical SDN design. As shown in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>, the knowledge-defined networking (KDN) needs to process the information collected by the lower plane and use it. The machine learning methods are used to make decisions about network management, thereby improving the overall efficiency of SDN.</p>
<fig id="fig-1"><label>Figure 1</label><caption><title>Illustration of KDN structure</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_31698-fig-1.png"/></fig>
<p>The routing strategy optimization algorithm proposed in this paper is also developed based on this control model architecture, and the reinforcement learning method will be introduced in the knowledge plane to centrally and dynamically manage the routing scheme of the data plane.</p>
</sec>
<sec id="s2_2"><label>2.2</label><title>Reinforcement Learning and PPO Algorithms</title>
<p>Reinforcement learning is a machine learning paradigm that uses Markov decision processes to learn the interdependence of states, actions, and rewards between a decision-making agent and its environment. <xref ref-type="fig" rid="fig-2">Fig. 2</xref> depicts the interaction process in reinforcement learning between the decision agent and the environment.</p>
<fig id="fig-2"><label>Figure 2</label><caption><title>The interactive process of decision-making subject and environment in reinforcement learning</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_31698-fig-2.png"/></fig>
<p>The PPO [<xref ref-type="bibr" rid="ref-28">28</xref>] is based on Actor-Critic architecture, deploys a TRPO-based model-free policy gradient approach [<xref ref-type="bibr" rid="ref-29">29</xref>]. The algorithm&#x2019;s purpose is to find a decision policy <inline-formula id="ieqn-1"><mml:math id="mml-ieqn-1"><mml:msub><mml:mi>&#x03C0;</mml:mi><mml:mrow><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>s</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> in the aforementioned interaction process that maximizes the cumulative return. Among these, <inline-formula id="ieqn-2"><mml:math id="mml-ieqn-2"><mml:mi>s</mml:mi></mml:math></inline-formula> is the model&#x2019;s input, which is a vector that describes the present environment. In network control, it can be the traffic matrix of the entire network and the topology adjacency matrix. The decision function outputs an action, which can be a weight describing the network link in the routing strategy, and an optimal routing scheme can be uniquely determined through the weight. A neural network can fit the decision function, and is the network parameter <inline-formula id="ieqn-3"><mml:math id="mml-ieqn-3"><mml:mi>&#x03B8;</mml:mi></mml:math></inline-formula>.</p>
<p>The policy gradient algorithm works by estimating the policy gradient and using gradient ascent to update the policy parameters. The policy loss <inline-formula id="ieqn-4"><mml:math id="mml-ieqn-4"><mml:mi>J</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x03B8;</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> and its gradient are estimated by running the policy in the environment to obtain samples:
<disp-formula id="eqn-1"><label>(1)</label><mml:math id="mml-eqn-1" display="block"><mml:mi>J</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x03B8;</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mi>E</mml:mi><mml:mrow><mml:mi>&#x03C4;</mml:mi><mml:mo>&#x223C;</mml:mo><mml:mrow><mml:msub><mml:mi>&#x03C0;</mml:mi><mml:mrow><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x03C4;</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>[</mml:mo><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:munder><mml:mi>R</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>a</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mi>E</mml:mi><mml:mrow><mml:mi>&#x03C4;</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:msub><mml:mi>&#x03C0;</mml:mi><mml:mrow><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x03C4;</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>[</mml:mo><mml:mi>R</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x03C4;</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:math></disp-formula>
<disp-formula id="eqn-2"><label>(2)</label><mml:math id="mml-eqn-2" display="block"><mml:msub><mml:mi mathvariant="normal">&#x2207;</mml:mi><mml:mrow><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:msub><mml:mi>J</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x03B8;</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mi>E</mml:mi><mml:mrow><mml:mi>&#x03C4;</mml:mi><mml:mo>&#x223C;</mml:mo><mml:mrow><mml:msub><mml:mi>&#x03C0;</mml:mi><mml:mrow><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x03C4;</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:msubsup><mml:mrow><mml:mo>&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msubsup><mml:msub><mml:mi mathvariant="normal">&#x2207;</mml:mi><mml:mrow><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:msub><mml:mi>ln</mml:mi><mml:mo>&#x2061;</mml:mo><mml:msub><mml:mi>&#x03C0;</mml:mi><mml:mrow><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>a</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo fence="false" stretchy="false">|</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mi>R</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x03C4;</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>The primary goal of policy gradient approaches is to minimize the variation of gradient estimations so that better policies may be developed. The end results after deploying such architecture is as follows:
<disp-formula id="eqn-3"><label>(3)</label><mml:math id="mml-eqn-3" display="block"><mml:msup><mml:mi>Q</mml:mi><mml:mrow><mml:mi>&#x03C0;</mml:mi></mml:mrow></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mi>a</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:munder><mml:mi>E</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mi>R</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>a</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo fence="false" stretchy="false">|</mml:mo><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mi>a</mml:mi><mml:mo>]</mml:mo></mml:mrow></mml:math></disp-formula>
<disp-formula id="eqn-4"><label>(4)</label><mml:math id="mml-eqn-4" display="block"><mml:msup><mml:mi>V</mml:mi><mml:mrow><mml:mi>&#x03C0;</mml:mi></mml:mrow></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mi>s</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:munder><mml:msub><mml:mi>E</mml:mi><mml:mrow><mml:mrow><mml:msub><mml:mi>&#x03C0;</mml:mi><mml:mrow><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>[</mml:mo><mml:mi>R</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>a</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo fence="false" stretchy="false">|</mml:mo><mml:mi>s</mml:mi><mml:mo>]</mml:mo></mml:mrow></mml:math></disp-formula>
<disp-formula id="eqn-5"><label>(5)</label><mml:math id="mml-eqn-5" display="block"><mml:msup><mml:mi>A</mml:mi><mml:mrow><mml:mi>&#x03C0;</mml:mi></mml:mrow></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mi>a</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msup><mml:mi>Q</mml:mi><mml:mrow><mml:mi>&#x03C0;</mml:mi></mml:mrow></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mi>a</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:msup><mml:mi>V</mml:mi><mml:mrow><mml:mi>&#x03C0;</mml:mi></mml:mrow></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mi>s</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula>where <inline-formula id="ieqn-5"><mml:math id="mml-ieqn-5"><mml:msup><mml:mi>A</mml:mi><mml:mrow><mml:mi>&#x03C0;</mml:mi></mml:mrow></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mi>a</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is the benefit function in the above formula which assesses the quality of a certain choice in the state, the algorithm&#x2019;s ultimate purpose is to optimise to constantly raise this function. The Actor network is the ultimate decision-making strategy network in the Actor-Critic architecture, the goal of the Critic network is to examine the current strategy&#x2019;s benefit function. On this basis, the PPO method transforms the optimization goal into the surrogate objective function below, which improves the model&#x2019;s stability and efficiency during training.
<disp-formula id="eqn-6"><label>(6)</label><mml:math id="mml-eqn-6" display="block"><mml:mi>L</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x03B8;</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>E</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mo movablelimits="true" form="prefix">min</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x03B8;</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi>c</mml:mi><mml:mi>l</mml:mi><mml:mi>i</mml:mi><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x03B8;</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mi>&#x03F5;</mml:mi><mml:mo>,</mml:mo><mml:mn>1</mml:mn><mml:mo>+</mml:mo><mml:mi>&#x03F5;</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>Among them, <inline-formula id="ieqn-6"><mml:math id="mml-ieqn-6"><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is the advantage function, <inline-formula id="ieqn-7"><mml:math id="mml-ieqn-7"><mml:mi>&#x03F5;</mml:mi></mml:math></inline-formula> is the hyperparameter, <inline-formula id="ieqn-8"><mml:math id="mml-ieqn-8"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x03B8;</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is the important sampling correction parameter, and the expression is as follows:
<disp-formula id="eqn-7"><label>(7)</label><mml:math id="mml-eqn-7" display="block"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x03B8;</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mi>&#x03C0;</mml:mi><mml:mrow><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>a</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msub><mml:mi>&#x03C0;</mml:mi><mml:mrow><mml:mrow><mml:msub><mml:mi>&#x03B8;</mml:mi><mml:mrow><mml:mrow><mml:mtext>old</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>a</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:math></disp-formula></p>
<p>Finally, when the algorithm is working, it is necessary to set up two networks for Actor and Critic. To get the interaction history, the Actor network takes decisions for the agent. The agent objective function can be updated according to the above formula.</p>
</sec>
</sec>
<sec id="s3"><label>3</label><title>Algorithm Design</title>
<p>This section introduces the proposed algorithm. By implementing the algorithm in SDN, performance variables like as throughput, latency, and jitter may be automatically modified, allowing for real-time network control and considerably reducing network operational and maintenance strain.</p>
<sec id="s3_1"><label>3.1</label><title>Task Modeling</title>
<p>The most significant aspect of using reinforcement learning to enhance SDN routing is determining the environmental state, reward, and decision-action space gained by the decision-making agent. As a dynamic optimization algorithm, the reinforcement learning does not need to perceive the amount that will not change in the whole process when making a decision. For example, when routing optimization, the specific physical link and the corresponding network topology have been given, the SDN controller only needs to continuously plan the routing scheme according to the current network traffic information. Therefore, for a specific SDN network, the state <italic>s</italic> input by the agent can be represented by the vector transformed by the traffic matrix of the current network load. Representation learning has already been obtained during the training process of the body.</p>
<p>After the agent perceives the traffic situation, it needs to give an action a, which can determine the only optimal solution for the current network environment. Here, the action is set to a vector representing all link weights. Through the link weight vector, the Floyd algorithm can be used to uniquely determine a set of optimal routing schemes for the network.</p>
<p>The reward obtained by the agent is related to the overall performance indicators of the network, such as network delay, throughput, or a comprehensive reward considering various parameters. For example, <xref ref-type="disp-formula" rid="eqn-8">Eq. (8)</xref> is a reward function that comprehensively considers multiple performance indicators.
<disp-formula id="eqn-8"><label>(8)</label><mml:math id="mml-eqn-8" display="block"><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>R</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo><mml:mi>j</mml:mi><mml:mo fence="false" stretchy="false">|</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>a</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mo>&#x2212;</mml:mo><mml:mi>h</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>a</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:mi>&#x03B1;</mml:mi></mml:mrow><mml:mi>d</mml:mi><mml:mi>e</mml:mi><mml:mi>l</mml:mi><mml:mi>a</mml:mi><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mi>&#x03B2;</mml:mi><mml:mi>B</mml:mi><mml:msub><mml:mi>W</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mi>&#x03B3;</mml:mi><mml:mi>l</mml:mi><mml:mi>o</mml:mi><mml:mi>s</mml:mi><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></disp-formula></p>
<p>Among them, <inline-formula id="ieqn-9"><mml:math id="mml-ieqn-9"><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>represents the current state of the network, and <inline-formula id="ieqn-10"><mml:math id="mml-ieqn-10"><mml:msub><mml:mi>a</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is the action generated by the control agent, through which the network link weight is adjusted, and to achieve the unique optimum routing method, the network&#x2019;s point-to-point optimal path is recalculated according to the link weight. Assume that the best path after adjusting the link weights is <inline-formula id="ieqn-11"><mml:math id="mml-ieqn-11"><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>. The function <italic>h</italic> represents the cost of adjusting this path, such as the action effect on the switching operation. <inline-formula id="ieqn-12"><mml:math id="mml-ieqn-12"><mml:mi>&#x03B1;</mml:mi><mml:mo>,</mml:mo><mml:mi>&#x03B2;</mml:mi><mml:mo>,</mml:mo><mml:mi>&#x03B3;</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo></mml:mrow><mml:mrow><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo><mml:mn>1</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> are adjustable weights, <inline-formula id="ieqn-13"><mml:math id="mml-ieqn-13"><mml:mrow><mml:mtext mathvariant="italic">delay</mml:mtext></mml:mrow></mml:math></inline-formula> represents the delay time under the path, BW is the bandwidth, loss is the packet loss rate, and he operation and maintenance approach can determine the reward function in a variety&#x00A0;of ways.</p>
</sec>
<sec id="s3_2"><label>3.2</label><title>Proposed Framework</title>
<p><xref ref-type="fig" rid="fig-3">Fig. 3</xref> depicts the proposed framework. The three variables that the PPO agent uses to interact with the environment are Status, Action, and Reward. The traffic matrix (TM) of the present network load is one of them, and the agent&#x2019;s action to the environment is to modify the weight vector of the network&#x2019;s connections. Through the weight vector, a routing scheme in a network can be uniquely determined. The pseudo code of the training process is shown in Algorithm 1.</p>
<fig id="fig-3"><label>Figure 3</label><caption><title>Overall model architecture</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_31698-fig-3.png"/></fig>
<fig id="fig-9">
<graphic mimetype="image" mime-subtype="png" xlink:href="CMC_31698-fig-9.png"/>
</fig>
<p>In the above algorithm, the 6th row to the 11th row is the sampling process, and the 12th row to the 16th row is the training process of the model parameters. The <inline-formula id="ieqn-35"><mml:math id="mml-ieqn-35"><mml:mi>E</mml:mi><mml:mi>n</mml:mi><mml:mi>v</mml:mi></mml:math></inline-formula> in line 8 encapsulates the environment other than the PPO decision body. In addition, the decision function <inline-formula id="ieqn-36"><mml:math id="mml-ieqn-36"><mml:msub><mml:mi>&#x03C0;</mml:mi><mml:mrow><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>s</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> usually represents a high-dimensional normal distribution, and the action a is sampled from this distribution.</p>
<p>The PPO agent&#x2019;s training goal is to discover the optimum action a based on the input states s in order to maximize the cumulative reward. The following is a summary of the whole working process: The PPO agent can collect precise network status s and identify the best operation, that is, given a set of link weights <inline-formula id="ieqn-37"><mml:math id="mml-ieqn-37"><mml:mrow><mml:mo>[</mml:mo><mml:msub><mml:mi>&#x03C9;</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>&#x03C9;</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>&#x03C9;</mml:mi><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula>, using the SDN controller&#x2019;s network analysis and measurement. The new flow path is recalculated using updated weights, and the Floyd shortest path method may be used to solve the particular path creation using the weights. As a result, the SDN controller generates new rules in order to build a new route. Following the path update, the following network analysis and measurement produces the reward r and the new network state, allowing for iterative network optimization.</p>
<p>The traditional machine learning-based routing optimization algorithm learns the routing scheme from the network data of a specific configuration, so it can only work under the corresponding configuration. When the network hardware equipment changes, the routing scheme becomes invalid. The suggested approach is adaptive because it can update the agent gradient in interactions with the network environment using current experience fragments. That is, when the physical equipment in the network changes locally, the agent may make a rational judgement to acquire the best routing scheme for the optimization aim. Furthermore, as compared to existing algorithms, the proposed algorithm may directly output the actions, allowing for more precise network routing management. Furthermore, the suggested technique establishes a direct relationship between the output action vector and routing performance, making it easier to train the agent&#x2019;s neural network parameters.</p>
</sec>
</sec>
<sec id="s4"><label>4</label><title>Experimental Results</title>
<sec id="s4_1"><label>4.1</label><title>Environment</title>
<p>The computer hardware configuration used in the experiment is NVIDIA Geforce1080Ti GPU, 32 GB memory, i9-9900k CPU, and Ubuntu16.04 as the operating system. We used tensorflow as the code framework to build the algorithm, and use OMNeT&#x002B;&#x002B; [<xref ref-type="bibr" rid="ref-30">30</xref>] as the software for network simulation. The differences in routing performance between the proposed algorithm proposed and the traditional shortest-path-based mainstream routing algorithm and the randomly generated routing strategy are compared.</p>
<p>The Sprint structure network [<xref ref-type="bibr" rid="ref-31">31</xref>] is used in the experiment, which has 25 nodes and 53 connections with the same bandwidth on each link. The experiment sets several different levels of traffic load (TL) for the network structure to simulate real network scenarios. Each distinct TL level represents a proportion of a network&#x2019;s overall capacity. For the same level of traffic load, the gravity model is used [<xref ref-type="bibr" rid="ref-32">32</xref>] to generate a variety of different traffic matrices.</p>
<p>The PPO decision agent is trained and tested on various levels of traffic load to verify the efficiency of the proposed method.</p>
</sec>
<sec id="s4_2"><label>4.2</label><title>Convergence and Efficiency Evaluation</title>
<p>The experiment initially tested the suggested algorithm&#x2019;s training on different levels of traffic loads, utilizing four traffic loads: 10&#x0025;, 40&#x0025;, 70&#x0025;, and 100&#x0025; of the full network capacity, respectively. 250 traffic matrices are randomly generated under each traffic load, of which 200 are used as training environments and 50 are used as test environments. When training the model, the average performance of the model in the test environment is tested every 1000 steps of training for each level of traffic load. The OMNeT&#x002B;&#x002B; is used to obtain performance parameters such as network latency given the traffic matrix and routing system. The final PPO model is trained for the network delay test results under different traffic loads as shown in <xref ref-type="fig" rid="fig-4">Fig. 4</xref>. As can be seen from <xref ref-type="fig" rid="fig-4">Fig. 4</xref>, as the training continues, the routing scheme given by the model can reduce the network delay continuously, and finally converge to the optimal value.</p>
<fig id="fig-4"><label>Figure 4</label><caption><title>Comparison of the average network latency of the training process</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_31698-fig-4.png"/></fig>
<p>Next, in order to validate the impact of the proposed algorithm, the delay performance of the proposed algorithm and 50000 randomly generated routing schemes under the above four different levels of traffic load is compared. After the proposed model is trained in the training environment under these several traffic loads until the performance converges, it is tested multiple times in the test environment. These 50,000 random routing schemes can provide representative comparative data for the PPO performance data. Finally, the network delay performance data is obtained through the simulation of these schemes, and the box plot is drawn as shown in <xref ref-type="fig" rid="fig-5">Fig. 5</xref>.</p>
<fig id="fig-5"><label>Figure 5</label><caption><title>Comparison of network delay of the proposed and random routing algorithms</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_31698-fig-5.png"/></fig>
<p>As can be seen from <xref ref-type="fig" rid="fig-5">Fig. 5</xref>, the upper and lower quartiles of network latency are represented by the top and bottom of the rectangle, respectively, while the median of latency is represented by the line in the center of the rectangle. The greatest and minimum retardation values are represented by the top and lower ends of the straight line extending from the rectangle. The experimental findings demonstrate that the proposed algorithm&#x2019;s shortest latency is extremely near to the ideal result of a randomly generated routing configuration, and the variation is very small, demonstrating the suggested algorithm&#x2019;s usefulness in SDN optimization.</p>
</sec>
<sec id="s4_3"><label>4.3</label><title>Model Performance Comparison</title>
<p>The experiment compares the performance difference between the proposed algorithm and the QAR routing algorithm [<xref ref-type="bibr" rid="ref-25">25</xref>] in optimizing the average delay and maximum delay of the network, and gives the performance of the traditional shortest path-based routing algorithm as a reference. The QAR algorithm builds a model using the Q-Learning algorithm in reinforcement learning and generates routing decisions by constructing a state-action value function for the network state, which is a typical technique for SDN network routing.</p>
<p>In the specific experiment process, after the convergence training of the proposed model and the QAR model at different load intensity levels, 1000 traffic matrices of the same level are used as network inputs to test the average network delay and network end-to-end of these different models. <xref ref-type="fig" rid="fig-6">Figs. 6</xref> and <xref ref-type="fig" rid="fig-7">7</xref> shows the results after taking the average value.</p>
<fig id="fig-6"><label>Figure 6</label><caption><title>Comparison of the average network latency of the proposed and existing algorithms</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_31698-fig-6.png"/></fig>
<fig id="fig-7"><label>Figure 7</label><caption><title>End-to-end maximum delay</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_31698-fig-7.png"/></fig>
<p>The experiments on network throughput are also undertaken to prove the model&#x2019;s broad applicability for other network optimization measures. The experimental configuration is basically the same as above, the only difference is that the reward function is positively correlated with the throughput rate. The experimental results are shown in <xref ref-type="fig" rid="fig-8">Fig. 8</xref>.</p>
<fig id="fig-8"><label>Figure 8</label><caption><title>Comparison of network throughput of the proposed and existing algorithms</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_31698-fig-8.png"/></fig>
<p>The experimental results show that the three algorithms have little difference in performance when the load intensity is low. However, when the traffic intensity is high, the traditional shortest path-based routing algorithm will cause problems such as traffic congestion, while the QAR algorithm and the proposed algorithm can effectively avoid such problems. It can also be found from the experimental data that with the traffic strength continues to increase, the routing schemes given by these two types of reinforcement learning-based models improve the performance of traditional algorithms more and more. Especially in the optimization of network throughput, when the traffic intensity exceeds 50&#x0025;, the throughput of the static shortest path routing algorithm decreases rapidly, and the two routing methods based on reinforcement learning avoid this problem to a certain extent. Therefore, by introducing machine learning methods such as reinforcement learning in SDN routing management, the routing efficiency can indeed be improved.</p>
<p>When the proposed method is compared to the QAR algorithm, it is obvious that the proposed approach outperforms QAR in terms of overall performance. The reasons for the analysis are as follows. During the training of the QAR algorithm, the Q table needs to be continuously optimized, and the state space of the complex network is very large, and it is difficult to handle only relying on the Q table. At the same time, when the QAR outputs actions, due to the limitation of the value method, it only relies on discrete actions to generate a single action for the SDN controller in the specific control route, and cannot perform fine control. On the one hand, the proposed algorithm uses a neural network to fit the network state value, and at the same time directly outputs the action vector related to the network routing to complete more refined control, thus having better optimization performance.</p>
<p>In conclusion, the proposed algorithm reduces the average network delay by 29.3&#x0025;, the end-to-end maximum delay by 17.4&#x0025;, and the throughput by 31.77&#x0025; when TL&#x2009;&#x003D;&#x2009;70&#x0025; as compared with the shortest path routing method. Such superior performance indicated that the proposed algorithm has significant impact in the SDN.</p>
</sec>
</sec>
<sec id="s5"><label>5</label><title>Conclusion</title>
<p>On the basis of a knowledge plane network, this research provides an SDN routing optimization technique based on reinforcement learning. It uses a PPO deep reinforcement learning technique to improve SDN network routing, allowing for real-time intelligent control and administration. The suggested method exhibits high convergence and efficacy, according to the experimental findings. The suggested method, when compared to existing routing solutions, can increase network performance by providing robust and high-quality routing services.</p>
<p>The proposed algorithm can deal with the network routing optimization problem relatively effectively. However, because it is an on-policy technique, it is difficult to collect a significant number of training samples for large-scale complex networks in practice. Although the proposed model incorporates the importance sampling during training, the overall sample utilization during training is still low. Therefore, if some model-based reinforcement learning algorithms can be combined, there must be a more efficient solution to the problem of SDN routing optimization.</p>
</sec>
</body>
<back>
<ack>
<p>We would like to thank the anonymous reviewers for their valuable and helpful comments, which substantially improved this paper. We also would also like to thank all of the editors for their professional advice and support.</p>
</ack>
<fn-group>
<fn fn-type="other"><p><bold>Funding Statement:</bold> The researchers would like to thank the Deanship of Scientific Research, Qassim University for funding the publication of this project.</p></fn>
<fn fn-type="conflict"><p><bold>Conflicts of Interest:</bold> The authors declare that they have no conflicts of interest to report regarding the present study.</p></fn>
</fn-group>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Y.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>Q.</given-names> <surname>Hu</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Flor</surname></string-name> and <string-name><given-names>M.</given-names> <surname>Jalalitabar</surname></string-name></person-group>, &#x201C;<article-title>Routing and spectrum allocation in spectrum-sliced elastic optical networks: A primal-dual framework</article-title>,&#x201D; <source>Electronics</source>, vol. <volume>10</volume>, no. <issue>22</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>22</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>X.</given-names> <surname>Yu</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Ning</surname></string-name>, <string-name><given-names>Q.</given-names> <surname>Zhu</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Lv</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Zhao</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Multi-dimensional routing, wavelength, and timeslot allocation (RWTA) in quantum key distribution optical networks (QKD-ON)</article-title>,&#x201D; <source>Applied Sciences</source>, vol. <volume>11</volume>, no. <issue>1</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>18</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>E.</given-names> <surname>Virgillito</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Ferrari</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Damico</surname></string-name> and <string-name><given-names>V.</given-names> <surname>Curri</surname></string-name></person-group>, &#x201C;<article-title>Statistical assessment of open optical networks</article-title>,&#x201D; <source>Photonics</source>, vol. <volume>6</volume>, no. <issue>2</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>16</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Ricciardi</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Sembroiz</surname></string-name> and <string-name><given-names>F.</given-names> <surname>Palimieri</surname></string-name></person-group>, &#x201C;<article-title>A hybrid load-balancing and energy-aware RWA algorithm for telecommunication networks</article-title>,&#x201D; <source>Computer Communications</source>, vol. <volume>77</volume>, no. <issue>3</issue>, pp. <fpage>85</fpage>&#x2013;<lpage>99</lpage>, <year>2015</year>.</mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>F.</given-names> <surname>Muro</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Garrich</surname></string-name>, <string-name><given-names>I.</given-names> <surname>Castreno</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Zahir</surname></string-name> and <string-name><given-names>P.</given-names> <surname>Marino</surname></string-name></person-group>, &#x201C;<article-title>Emulating software-defined disaggregated optical networks in a containerized framework</article-title>,&#x201D; <source>Applied Sciences</source>, vol. <volume>11</volume>, no. <issue>5</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>17</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Y.</given-names> <surname>Zhao</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Ji</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>Q.</given-names> <surname>Xiong</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Software-defined networking (SDN) controlled all optical switching networks with multi-dimensional switching architecture</article-title>,&#x201D; <source>Optical Fiber Technology</source>, vol. <volume>20</volume>, no. <issue>2</issue>, pp. <fpage>353</fpage>&#x2013;<lpage>357</lpage>, <year>2014</year>.</mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Xue</surname></string-name>, <string-name><given-names>E.</given-names> <surname>Tangdiongga</surname></string-name> and <string-name><given-names>N.</given-names> <surname>Calabretta</surname></string-name></person-group>, &#x201C;<article-title>Low-latency optical wireless data-center networks using nanoseconds semiconductor-based wavelength selectors and arrayed waveguide grating router</article-title>,&#x201D; <source>Photonics</source>, vol. <volume>9</volume>, no. <issue>3</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>17</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Yan</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Nejabati</surname></string-name> and <string-name><given-names>R.</given-names> <surname>Simeonidou</surname></string-name></person-group>, &#x201C;<article-title>Data-driven network analytics and network optimization in SDN-based programmable optical networks</article-title>,&#x201D; in <conf-name>IEEE Int. Conf. on Optical Network Design and Modeling</conf-name>, <conf-loc>Dublin, Ireland</conf-loc>, pp. <fpage>234</fpage>&#x2013;<lpage>238</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Q.</given-names> <surname>Zhou</surname></string-name>, <string-name><given-names>T.</given-names> <surname>Zhao</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Zhong</surname></string-name> and <string-name><given-names>H.</given-names> <surname>Luo</surname></string-name></person-group>, &#x201C;<article-title>A Fault-tolerant transmission scheme in SDN-based industrial iot (IIoT) over fiber-wireless networks</article-title>,&#x201D; <source>Entropy</source>, vol. <volume>24</volume>, no. <issue>2</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>15</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>K.</given-names> <surname>Kondepu</surname></string-name> and <string-name><given-names>C.</given-names> <surname>Jackson</surname></string-name></person-group>, &#x201C;<article-title>Fully SDN-enabled all-optical architecture for data center virtualization with time and space multiplexing</article-title>,&#x201D; <source>Journal of Optical Communications and Networking</source>, vol. <volume>7</volume>, no. <issue>10</issue>, pp. <fpage>90</fpage>&#x2013;<lpage>101</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>C.</given-names> <surname>Jackson</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Kondepu</surname></string-name> and <string-name><given-names>Y.</given-names> <surname>Ou</surname></string-name></person-group>, &#x201C;<article-title>COSIGN: A complete SDN enabled all-optical architecture for data centre virtualization with time and space multiplexing</article-title>,&#x201D; in <conf-name>IEEE European Conf. on Optical Communication (ECOC)</conf-name>, <conf-loc>Gothenburg, Sweden</conf-loc>, pp. <fpage>1</fpage>&#x2013;<lpage>3</lpage>, <year>2017</year>.</mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>R.</given-names> <surname>Luis</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Furukawa</surname></string-name>, <string-name><given-names>G.</given-names> <surname>Rademacher</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Puttnam</surname></string-name> and <string-name><given-names>N.</given-names> <surname>Wada</surname></string-name></person-group>, &#x201C;<article-title>Demonstration of an SDM network testbed for joint spatial circuit and packet switching</article-title>,&#x201D; <source>Photonics</source>, vol. <volume>5</volume>, no. <issue>3</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>15</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Sun</surname></string-name>, <string-name><given-names>G.</given-names> <surname>Huang</surname></string-name>, <string-name><given-names>G.</given-names> <surname>Sun</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Yu</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Sangiah</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>A Q-learning-based approach for deploying dynamic service function chains</article-title>,&#x201D; <source>Symmetry</source>, vol. <volume>10</volume>, no. <issue>11</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>12</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Z.</given-names> <surname>Beheshti</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Shamsuddin</surname></string-name></person-group>, &#x201C;<article-title>Memetic binary particle swarm optimization for discrete optimization problems</article-title>,&#x201D; <source>Information Sciences</source>, vol. <volume>299</volume>, no. <issue>3</issue>, pp. <fpage>58</fpage>&#x2013;<lpage>84</lpage>, <year>2015</year>.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>F.</given-names> <surname>Cugini</surname></string-name> and <string-name><given-names>N.</given-names> <surname>Sambo</surname></string-name></person-group>, &#x201C;<article-title>Enhancing GMPLS signaling protocol for encompassing quality of transmission (QoT) in all-optical networks</article-title>,&#x201D; <source>Journal of Lightwave Technology</source>, vol. <volume>26</volume>, no. <issue>19</issue>, pp. <fpage>3318</fpage>&#x2013;<lpage>3328</lpage>, <year>2008</year>.</mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>R.</given-names> <surname>Amin</surname></string-name>, <string-name><given-names>E.</given-names> <surname>Rojas</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Aqdus</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Ramzan</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Perez</surname></string-name></person-group>. <etal>et al.</etal>, &#x201C;<article-title>A survey on machine learning techniques for routing optimization in SDN</article-title>,&#x201D; <source>IEEE Access</source>, vol. <volume>9</volume>, pp. <fpage>104582</fpage>&#x2013;<lpage>104611</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Z.</given-names> <surname>Fadlullah</surname></string-name>, <string-name><given-names>F.</given-names> <surname>Tang</surname></string-name> and <string-name><given-names>B.</given-names> <surname>Mao</surname></string-name></person-group>, &#x201C;<article-title>State-of-the-art deep learning: Evolving machine intelligence toward tomorrow&#x2019;s intelligent network traffic control systems</article-title>,&#x201D; <source>IEEE Communications Surveys &#x0026; Tutorials</source>, vol. <volume>19</volume>, no. <issue>4</issue>, pp. <fpage>2432</fpage>&#x2013;<lpage>2455</lpage>, <year>2017</year>.</mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>E.</given-names> <surname>Genesan</surname></string-name>, <string-name><given-names>I.</given-names> <surname>Hwang</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Liem</surname></string-name> and <string-name><given-names>M.</given-names> <surname>Rahman</surname></string-name></person-group>, &#x201C;<article-title>SDN-Enabled FiWi-IoT smart environment network traffic classification using supervised ML models</article-title>,&#x201D; <source>Photonics</source>, vol. <volume>8</volume>, no. <issue>6</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>18</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>F.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Liu</surname></string-name> and <string-name><given-names>L.</given-names> <surname>Zhang</surname></string-name></person-group>, &#x201C;<article-title>Dynamic routing and spectrum assignment based on multilayer virtual topology and ant colony optimization in elastic software-defined optical network</article-title>,&#x201D; <source>Optical Engineering</source>, vol. <volume>56</volume>, no. <issue>7</issue>, pp. <fpage>8877</fpage>&#x2013;<lpage>8884</lpage>, <year>2017</year>.</mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Parsaei</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Mohammad</surname></string-name> and <string-name><given-names>R.</given-names> <surname>Javidan</surname></string-name></person-group>, &#x201C;<article-title>A new adaptive traffic engineering method for tele-surgery using ACO algorithm over software defined networks</article-title>,&#x201D; <source>European Research in Telemedicine</source>, vol. <volume>6</volume>, no. <issue>4</issue>, pp. <fpage>173</fpage>&#x2013;<lpage>180</lpage>, <year>2017</year>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Z.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>G.</given-names> <surname>Wu</surname></string-name> and <string-name><given-names>H.</given-names> <surname>Ren</surname></string-name></person-group>, &#x201C;<article-title>Multi-attribute-based QoS-aware virtual network function placement and service chaining algorithms in smart cities</article-title>,&#x201D; <source>Computers &#x0026; Electrical Engineering</source>, vol. <volume>96</volume>, no. <issue>2</issue>, pp. <fpage>3077</fpage>&#x2013;<lpage>3089</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Jiang</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Hu</surname></string-name> and <string-name><given-names>P.</given-names> <surname>Hao</surname></string-name></person-group>, &#x201C;<article-title>Q-FDBA: Improving QoE fairness for video streaming</article-title>,&#x201D; <source>Multimedia Tools and Applications</source>, vol. <volume>77</volume>, no. <issue>9</issue>, pp. <fpage>10787</fpage>&#x2013;<lpage>10806</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-23"><label>[23]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Guerrero</surname></string-name> and <string-name><given-names>L.</given-names> <surname>Lamata</surname></string-name></person-group>, &#x201C;<article-title>Reinforcement learning and physics</article-title>,&#x201D; <source>Applied Sciences</source>, vol. <volume>11</volume>, no. <issue>18</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>24</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-24"><label>[24]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>C.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>W.</given-names> <surname>Ju</surname></string-name> and <string-name><given-names>G.</given-names> <surname>Zhang</surname></string-name></person-group>, &#x201C;<article-title>A SDN-based active measurement method to traffic QoS sensing for smart network access</article-title>,&#x201D; <source>Wireless Networks</source>, vol. <volume>2</volume>, pp. <fpage>592</fpage>&#x2013;<lpage>604</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-25"><label>[25]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Lin</surname></string-name>, <string-name><given-names>I.</given-names> <surname>Akydiz</surname></string-name> and <string-name><given-names>P.</given-names> <surname>Wang</surname></string-name></person-group>, &#x201C;<article-title>QoS-Aware adaptive routing in multi-layer hierarchical software defined networks: A reinforcement learning approach</article-title>,&#x201D; in <conf-name>IEEE Int. Conf. on Service Computing</conf-name>, <conf-loc>New York, USA</conf-loc>, pp. <fpage>25</fpage>&#x2013;<lpage>33</lpage>, <year>2016</year>.</mixed-citation></ref>
<ref id="ref-26"><label>[26]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>F.</given-names> <surname>Xiong</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Li</surname></string-name> and <string-name><given-names>H.</given-names> <surname>Wang</surname></string-name></person-group>, &#x201C;<article-title>SDN-MQTT based communication system for battlefield UAV swarms</article-title>,&#x201D; <source>IEEE Communications Magazine</source>, vol. <volume>57</volume>, no. <issue>8</issue>, pp. <fpage>41</fpage>&#x2013;<lpage>47</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-27"><label>[27]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Mestres</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Rodrigueznatal</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Carner</surname></string-name></person-group>, &#x201C;<article-title>Knowledge-defined networking</article-title>,&#x201D; <source>ACM Special Interest Group on Data Communication</source>, vol. <volume>47</volume>, no. <issue>3</issue>, pp. <fpage>2</fpage>&#x2013;<lpage>10</lpage>, <year>2017</year>.</mixed-citation></ref>
<ref id="ref-28"><label>[28]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>C.</given-names> <surname>Yu</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Lan</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Guo</surname></string-name> and <string-name><given-names>Y.</given-names> <surname>Hu</surname></string-name></person-group>, &#x201C;<article-title>DROM: Optimizing the routing in software-defined networks with deep reinforcement learning</article-title>,&#x201D; <source>IEEE Access</source>, vol. <volume>6</volume>, pp. <fpage>64533</fpage>&#x2013;<lpage>64539</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-29"><label>[29]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>W.</given-names> <surname>Meng</surname></string-name>, <string-name><given-names>Q.</given-names> <surname>Zheng</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Shi</surname></string-name> and <string-name><given-names>G.</given-names> <surname>Pan</surname></string-name></person-group>, &#x201C;<article-title>An off-policy trust region policy optimization method with monotonic improvement guarantee for deep reinforcement learning</article-title>,&#x201D; <source>IEEE Transactions on Neural Networks and Learning Systems</source>, vol. <volume>4</volume>, no. <issue>5</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>13</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-30"><label>[30]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>D.</given-names> <surname>Ramos</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Almeida</surname></string-name> and <string-name><given-names>U.</given-names> <surname>Moreno</surname></string-name></person-group>, &#x201C;<article-title>Integrated robotic and network simulation method</article-title>,&#x201D; <source>Sensors</source>, vol. <volume>19</volume>, no. <issue>20</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>18</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-31"><label>[31]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>N.</given-names> <surname>Josbert</surname></string-name>, <string-name><given-names>W.</given-names> <surname>Ping</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Wei</surname></string-name> and <string-name><given-names>Y.</given-names> <surname>Li</surname></string-name></person-group>, &#x201C;<article-title>Industrial networks driven by SDN technology for dynamic fast resilience</article-title>,&#x201D; <source>Information Journal</source>, vol. <volume>12</volume>, no. <issue>10</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>18</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-32"><label>[32]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Roughan</surname></string-name></person-group>, &#x201C;<article-title>Simplifying the synthesis of internet traffic matrices</article-title>,&#x201D; <source>ACM SIGCOMM Computer Communication Review</source>, vol. <volume>35</volume>, no. <issue>3</issue>, pp. <fpage>93</fpage>&#x2013;<lpage>106</lpage>, <year>2005</year>.</mixed-citation></ref>
</ref-list>
</back>
</article>