<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xml:lang="en" article-type="review-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CMC</journal-id>
<journal-id journal-id-type="nlm-ta">CMC</journal-id>
<journal-id journal-id-type="publisher-id">CMC</journal-id>
<journal-title-group>
<journal-title>Computers, Materials &#x0026; Continua</journal-title>
</journal-title-group>
<issn pub-type="epub">1546-2226</issn>
<issn pub-type="ppub">1546-2218</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">73540</article-id>
<article-id pub-id-type="doi">10.32604/cmc.2025.073540</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Review</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>A State-of-the-Art Survey of Adversarial Reinforcement Learning for IoT Intrusion Detection</article-title>
<alt-title alt-title-type="left-running-head">A State-of-the-Art Survey of Adversarial Reinforcement Learning for IoT Intrusion Detection</alt-title>
<alt-title alt-title-type="right-running-head">A State-of-the-Art Survey of Adversarial Reinforcement Learning for IoT Intrusion Detection</alt-title>
</title-group>
<contrib-group>
<contrib id="author-1" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Al-Haija</surname><given-names>Qasem Abu</given-names></name><xref ref-type="aff" rid="aff-1">1</xref><email>qsabuhaija@just.edu.jo</email></contrib>
<contrib id="author-2" contrib-type="author">
<name name-style="western"><surname>Tamimi</surname><given-names>Shahad Al</given-names></name><xref ref-type="aff" rid="aff-2">2</xref></contrib>
<aff id="aff-1"><label>1</label><institution>Department of Cybersecurity, Faculty of Computer &#x0026; Information Technology, Jordan University of Science and Technology</institution>, <addr-line>P.O. Box 3030, Irbid, 22110</addr-line>, <country>Jordan</country></aff>
<aff id="aff-2"><label>2</label><institution>Department of Cybersecurity, King Hussein School of Computing Sciences, Princess Sumaya University for Technology</institution>, <addr-line>P.O. Box 1438, Amman, 11941</addr-line>, <country>Jordan</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>&#x002A;</label>Corresponding Author: Qasem Abu Al-Haija. Email: <email>qsabuhaija@just.edu.jo</email></corresp>
</author-notes>
<pub-date date-type="collection" publication-format="electronic">
<year>2026</year>
</pub-date>
<pub-date date-type="pub" publication-format="electronic">
<day>10</day><month>2</month><year>2026</year>
</pub-date>
<volume>87</volume>
<issue>1</issue>
<elocation-id>2</elocation-id>
<history>
<date date-type="received">
<day>20</day>
<month>09</month>
<year>2025</year>
</date>
<date date-type="accepted">
<day>25</day>
<month>11</month>
<year>2025</year>
</date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2026 The Authors.</copyright-statement>
<copyright-year>2026</copyright-year>
<copyright-holder>Published by Tech Science Press.</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CMC_73540.pdf"></self-uri>
<abstract>
<p>Adversarial Reinforcement Learning (ARL) models for intelligent devices and Network Intrusion Detection Systems (NIDS) improve system resilience against sophisticated cyber-attacks. As a core component of ARL, Adversarial Training (AT) enables NIDS agents to discover and prevent new attack paths by exposing them to competing examples, thereby increasing detection accuracy, reducing False Positives (FPs), and enhancing network security. To develop robust decision-making capabilities for real-world network disruptions and hostile activity, NIDS agents are trained in adversarial scenarios to monitor the current state and notify management of any abnormal or malicious activity. The accuracy and timeliness of the IDS were crucial to the network&#x2019;s availability and reliability at this time. This paper analyzes ARL applications in NIDS, revealing State-of-The-Art (SoTA) methodology, issues, and future research prospects. This includes Reinforcement Machine Learning (RML)-based NIDS, which enables an agent to interact with the environment to achieve a goal, and Deep Reinforcement Learning (DRL)-based NIDS, which can solve complex decision-making problems. Additionally, this survey study addresses cybersecurity adversarial circumstances and their importance for ARL and NIDS. Architectural design, RL algorithms, feature representation, and training methodologies are examined in the ARL-NIDS study. This comprehensive study evaluates ARL for intelligent NIDS research, benefiting cybersecurity researchers, practitioners, and policymakers. The report promotes cybersecurity defense research and innovation.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Reinforcement learning</kwd>
<kwd>network intrusion detection</kwd>
<kwd>adversarial training</kwd>
<kwd>deep learning</kwd>
<kwd>cybersecurity defense</kwd>
<kwd>intrusion detection system</kwd>
<kwd>and machine learning</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<label>1</label>
<title>Introduction</title>
<p>A security mechanism known as the NIDS is developed to maintain surveillance on network traffic and identify any suspicious or illegal activity. NIDS are vital in detecting and countering network attacks by analyzing data packets and traffic patterns [<xref ref-type="bibr" rid="ref-1">1</xref>]. Abuse detection is also referred to as rule-based intrusion detection [<xref ref-type="bibr" rid="ref-2">2</xref>]. DRL has numerous applications in the real world [<xref ref-type="bibr" rid="ref-3">3</xref>]. NIDS are key security defense technologies that monitor computer networks or systems for network-based threats or malicious attacks that might compromise the system&#x2019;s functionality. A misuse-based NIDS system relies on many attack patterns for intrusion detection. However, this system is vulnerable to zero-day attacks and has a lengthy processing time. An anomaly-based IDS detects concealed attacks on computer systems by identifying atypical traffic patterns. This approach can be used to identify zero-day attacks, and the discovery findings can be stored in a database for future detection using signature-based methods. The approach will be implemented and will begin to train the agent and the environment using the DRL algorithm [<xref ref-type="bibr" rid="ref-4">4</xref>]. In recent years, several authors have adapted classical methodologies to the issue of network information security. While dealing with smaller data sets of lower dimension, Machine Learning (ML) algorithms can achieve better classification results, which are all characterized as learning algorithms.</p>
<p>Recently, a new algorithm or approach combining the advancement of Deep Learning (DL), called DRL, has been developed. The generator is taught to produce fresh data, while the discriminator model attempts to distinguish between genuine and generated data. The RL approach is also used to train the generator. However, this DL-based system is extremely vulnerable to adversaries and also has an obvious limitation [<xref ref-type="bibr" rid="ref-5">5</xref>,<xref ref-type="bibr" rid="ref-6">6</xref>]. As a branch of ML, ARL studies the behavior of several agents in a hostile or competitive environment. Its main focus is on building algorithms and techniques that enable agents to acquire and modify knowledge in contexts with other agents whose goals are at odds with their own. As a branch of ML, ARL studies the behavior of several agents in a hostile or competitive environment. Thus, <xref ref-type="fig" rid="fig-1">Fig. 1</xref> illustrates the core targets of ARL. Thus, <xref ref-type="fig" rid="fig-2">Fig. 2</xref> represents the Artificial Intelligence (AI) taxonomy, where RL is a subclass of ML. Thus, this is the first comprehensive survey considering the studies employing ARL-IDS.</p>
<fig id="fig-1">
<label>Figure 1</label>
<caption>
<title>ARL targets</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_73540-fig-1.tif"/>
</fig><fig id="fig-2">
<label>Figure 2</label>
<caption>
<title>AI taxonomy</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_73540-fig-2.tif"/>
</fig>
<p><bold>Robustness:</bold> ARL aims to develop agents that can survive and thrive in the face of hostile methods and forces. This requires techniques or rules that are immune to flaw-based attacks.</p>
<p><bold>Counteracting Strategies:</bold> ARL enables agents to anticipate &#x0026; counter opposing strategies. They may need to predict their opponents&#x2019; actions, adjust their policies, and capitalize on their weaknesses.</p>
<p><bold>Exploration and Exploitation:</bold> ARL algorithms should enable agents to devise effective strategies and leverage successful approaches against opponents, striking a balance between exploration and exploitation.</p>
<p><bold>Stability and Convergence:</bold> ARL aims to maintain learning.</p>
<p>Algorithm stability and convergence in competitive environments. Algorithms that can converge to Nash equilibria or other acceptable solutions are needed.</p>
<p><bold>Transferability:</bold> ARL algorithms might adapt to hazardous environments under ideal conditions. Because of this, agents may apply what they&#x2019;ve learned in one situation to perform better in another, unknown one.</p>
<p><bold>Real-World Applications:</bold> ARL aims to develop algorithms and strategies for addressing adversarial or competitive real-world scenarios. Gaming, cybersecurity, negotiation systems, and multi-agent simulations are examples of these applications.</p>
<p><bold>Scope of Survey:</bold> ARL has surpassed ML methods since incorporating RL and AT. In ARL, the generator plays the role of the adversary against the discriminator, attempting to deceive it. Tasks that require the generation of realistic data samples, such as Generative Adversarial Networks (GANs), benefit from ARL [<xref ref-type="bibr" rid="ref-7">7</xref>]. The underlying concept behind ARL is to demonstrate that a generator can replicate real data effectively, while training a discriminator to distinguish between the two types of data. Additionally, ARL aims to improve the data quality by using AT to generate more representative samples. ARL also combines AT regarding RL to enhance the efficacy and data quality of the generative model.</p>
<sec id="s1_1">
<label>1.1</label>
<title>Motivation</title>
<p>The increasing demand for ARL in AI and ML inspired this study. Additionally, there is a growing need for enhancements to NIDS&#x2019;s adaptability, robustness, and intelligence in dynamic and adversarial network environments. However, traditional ML and DL based NIDS often struggle to detect evolving and invisible attack patterns (i.e., Zero-Day Attacks) due to their static learning mechanisms. Moreover, ARL presents a viable framework to address these challenges by amalgamating RL with adversarial learning to provide ongoing self-enhancement and fortitude against complex cyber-attacks. ARL needs an exhaustive examination for distinct reasons:
<list list-type="bullet">
<list-item>
<p><bold>Autonomous Adaptation:</bold> Unlike traditional RL, ARL offers new challenges and opportunities for success. Concerns have been raised regarding security, scalability, sample efficiency, Transfer Learning (TL), and durability. Furthermore, it may enable autonomous decision-making in hostile conditions, making NIDS more adaptive to complex and changing traffic behaviors.</p></list-item>
<list-item>
<p><bold>Dynamic Threat Landscape:</bold> Network attacks evolve rapidly, rendering static detection models ineffective. ARL agents can continuously learn optimal defense strategies by interacting with the environment and enhancing their responsiveness to emerging threats.</p></list-item>
<list-item>
<p><bold>Adversarial Robustness:</bold> ARL explicitly models the attacker-defender interaction, allowing NIDS to anticipate and counter adversarial attacks such as evasion, poisoning, and spoofing.</p></list-item>
<list-item>
<p><bold>Bridging Theory and Practice:</bold> By studying ARL-driven NIDS, this research aims to reduce the gap between theoretical frameworks and real-world network defense systems through dynamic, risk-aware learning.</p></list-item>
<list-item>
<p><bold>Cross-Domain Relevance:</bold> ARL has proven useful in robotics, cybersecurity, and finance; thus, extending it to NIDS allows leveraging shared advances in autonomous learning and adversarial defense.</p></list-item>
</list></p>
</sec>
<sec id="s1_2">
<label>1.2</label>
<title>Problem Statement</title>
<p>Traditional NIDS struggle to maintain efficiency in a dynamic and unfavorable environment, where attackers constantly develop their strategies to evade detection. While ML has improved detection options, it remains vulnerable to negative manipulation, such as theft, poisoning, and reward-based attacks. Strengthening RL provides adaptive learning ability, but the standalone application lacks strength against intelligent opponents. ARL is an emerging area that may be an asset for seeking aid in complex circumstances, enabling proactive and adaptable risk assessment. However, the fragmented nature of existing research, diverse functions, and inconsistent evaluation framework may allow and help the systematic development of ARL-NIDS. This study identifies advanced NIDS and classified SOTA ARL-NIDS to identify large trends, boundaries, and future instructions for integrating intelligent security mechanisms and applications that work to prevent and detect malicious or abnormal activities. These are integrated, classified, and evaluated directly by researchers.</p>
</sec>
<sec id="s1_3">
<label>1.3</label>
<title>Research Gaps</title>
<p>Despite advances in utilizing ARL-NIDS, the majority of current research examines these techniques in isolation or fails to demonstrate practical applicability in hostile environments. To adapt to cyber-attack environments, research is needed to build an ARL-based NIDS framework that exhibits dynamic learning, resilience, and scalability. This survey study addresses such gaps by integrating SoTA advances and representing the ARL-NIDS architecture that unites theory with actual network security implementations. Moreover, the study aims to represent ARL-NIDS within real-world scenarios and cyber risks that may be affected. Despite its progress, ARL-NIDS is still facing significant intervals and boundaries. The difference lies in the incomplete coverage of types of attacks. At the same time, ARL models are distinguished by detecting the stolen or poisoning efforts; they can secretly weaken the inner formulas against the dangers of APT. In addition, normalization remains a weakness, as trained models on a dataset or domain cannot adapt to different network environments without withdrawing. Another limit is the lack of a standardized assessment structure. Without a standard scale, comparing the ARL approaches impartially or establishing best practices is a challenge. Despite this, the complexity of the ARL model makes it less interpretable, which reduces confidence among doctors and limits its adoption in regulated industries, such as healthcare or finance.</p>
</sec>
<sec id="s1_4">
<label>1.4</label>
<title>Study Organization</title>
<p>ARL aims to provide resilient, adaptable, and effective learning algorithms and methods for agents in competitive or adversarial contexts to achieve desired outcomes despite competing goals. The rest of the paper is arranged as follows: <xref ref-type="sec" rid="s1">Section 1</xref> introduces our research. <xref ref-type="sec" rid="s2">Section 2</xref> provides background on NIDS and RL. Thus, <xref ref-type="sec" rid="s3">Section 3</xref> included related works. Past 154 research studies will be examined in the linked work. <xref ref-type="sec" rid="s4">Section 4</xref>, part provides the ARL-NIDS method. <xref ref-type="sec" rid="s5">Section 5</xref> includes findings and comments. <xref ref-type="sec" rid="s6">Section 6</xref> concludes our study and outlines the next directions. <xref ref-type="fig" rid="fig-3">Fig. 3</xref> depicts the study&#x2019;s sections and course. A survey may disclose key issues and solutions to advance the field. Our contribution to this subject is an early worldwide appraisal of the existing knowledge regarding technology, its applications, and ARL-based IDS detection of attacks and harmful activity.</p>
<fig id="fig-3">
<label>Figure 3</label>
<caption>
<title>Overview for proposed survey study</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_73540-fig-3.tif"/>
</fig>
</sec>
</sec>
<sec id="s2">
<label>2</label>
<title>Background</title>
<p>This section provides a detailed illustration of the background. Were this section categorized as follows:
<list list-type="simple">
<list-item><label>A.</label><p><bold>Reinforcement Learning (RL)</bold></p></list-item>
</list></p>
<p>RL is an ML model that depends on an agent connecting with their environment to accomplish a goal. Thus, RL and ARL require separate but linked algorithms and formulas. Throughout RL, the agent learns to make choices by optimizing accumulated reward through interactions with the environment. At the same time, the ARL provides enemies that test the agent&#x2019;s learning process, which is sometimes depicted as a two-player game. The opponent aims to limit the payoff, while the agent seeks to maximize it.</p>
<p><bold>MDP and Bellman Equations for RL:</bold> RL relies on rewards and punishments, rather than supervised and unsupervised learning, which utilize labeled data for model training and pattern discovery. Adapting to rewards and punishments from the environment, the agent maximizes cumulative rewards over time. RL algorithms aim to make the best judgments in complex and uncertain scenarios. RL requires agents, environments, states, actions, rewards, and regulations. RL is used in robotics, gaming, autonomous automobiles, recommendation systems, and finance. Thus, RL defines automation as an agent interacting with an environment and learning through trial and error [<xref ref-type="bibr" rid="ref-8">8</xref>]. RL elements are described in <xref ref-type="fig" rid="fig-4">Fig. 4</xref>. The following core elements of the Markov Decision Process (MDP) concept are discussed for their importance.</p>
<fig id="fig-4">
<label>Figure 4</label>
<caption>
<title>The core of an RL system-based MDP</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_73540-fig-4.tif"/>
</fig>
<p><bold>RL:</bold> The training of ML models to make a sequence of decisions.</p>
<p><bold>Agent Function:</bold> May choose an action, receive a reward from the environment, and transition to a new state.</p>
<p><bold>Reward Function:</bold> The feedback the agent receives based on the action it performed.
<list list-type="bullet">
<list-item>
<p>If the feedback is positive, it receives a reward.</p></list-item>
<list-item>
<p>If the feedback is negative, it receives a punishment.</p></list-item>
</list></p>
<p><bold>The environment</bold> provides the agent with a state.</p>
<p><bold>Bellman optimality equation</bold> for an MDP&#x2019;s state-value function V(s).
<disp-formula id="eqn-1"><label>(1)</label><mml:math id="mml-eqn-1" display="block"><mml:mrow><mml:mtext>V</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:munder><mml:mo movablelimits="true" form="prefix">max</mml:mo><mml:mrow><mml:mrow><mml:mtext>a</mml:mtext></mml:mrow></mml:mrow></mml:munder><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>R</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mtext>a</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:msup><mml:mi mathvariant="normal">&#x03A5;</mml:mi><mml:mrow><mml:mrow><mml:mtext>V</mml:mtext></mml:mrow></mml:mrow></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p><bold>Value Function:</bold> The value function V(s) represents the expected cumulative reward starting from state s under a policy &#x03C0;:
<disp-formula id="eqn-2"><label>(2)</label><mml:math id="mml-eqn-2" display="block"><mml:msup><mml:mrow><mml:mtext>V</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x03C0;</mml:mi></mml:mrow></mml:mrow></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mtext>E</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x03C0;</mml:mi></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>[</mml:mo><mml:msubsup><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mrow><mml:mtext>i</mml:mtext></mml:mrow><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x221E;</mml:mi></mml:mrow></mml:mrow></mml:msubsup><mml:msup><mml:mrow><mml:mi mathvariant="normal">&#x03B3;</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mtext>t</mml:mtext></mml:mrow></mml:mrow></mml:msup><mml:mrow><mml:mtext>r</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>t</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mtext>a</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>t</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mn>0</mml:mn><mml:mo>=</mml:mo><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p><bold>Action-Value Function (Q-Function):</bold> The action-value function Q (s, a) represents the expected cumulative reward starting from state (s), acting (a), and following policy &#x03C0;:
<disp-formula id="eqn-3"><label>(3)</label><mml:math id="mml-eqn-3" display="block"><mml:msup><mml:mrow><mml:mtext>Q</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x03C0;</mml:mi></mml:mrow></mml:mrow></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mtext>a</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mtext>E</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x03C0;</mml:mi></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>[</mml:mo><mml:msubsup><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mrow><mml:mtext>i</mml:mtext></mml:mrow><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x221E;</mml:mi></mml:mrow></mml:mrow></mml:msubsup><mml:msup><mml:mrow><mml:mi mathvariant="normal">&#x03B3;</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mtext>t</mml:mtext></mml:mrow></mml:mrow></mml:msup><mml:mrow><mml:mtext>r</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>t</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mtext>a</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>t</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo fence="false" stretchy="false">|</mml:mo><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mn>0</mml:mn><mml:mo>=</mml:mo><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mtext>a</mml:mtext></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mtext>a</mml:mtext></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p><bold>Bellman Equation for Value Function:</bold> The Bellman equation expresses the value of a state as the immediate reward plus the discounted value of the next state:
<disp-formula id="eqn-4"><label>(4)</label><mml:math id="mml-eqn-4" display="block"><mml:msup><mml:mrow><mml:mtext>V</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x03C0;</mml:mi></mml:mrow></mml:mrow></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>a</mml:mi><mml:mtext>&#x03B5;</mml:mtext><mml:mi>A</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mi mathvariant="normal">&#x03C0;</mml:mi></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>a</mml:mtext></mml:mrow><mml:mo fence="false" stretchy="false">|</mml:mo><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:munder><mml:mrow><mml:mtext>P</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup><mml:mo fence="false" stretchy="false">|</mml:mo><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mtext>a</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mtext>R</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mtext>a</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mi mathvariant="normal">&#x03B3;</mml:mi></mml:mrow><mml:msup><mml:mrow><mml:mtext>V</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x03C0;</mml:mi></mml:mrow></mml:mrow></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p><bold>Bellman Equation for Q-Function:</bold> Similarly, the Bellman equation for the Q-function is:
<disp-formula id="eqn-5"><label>(5)</label><mml:math id="mml-eqn-5" display="block"><mml:msup><mml:mrow><mml:mtext>Q</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x03C0;</mml:mi></mml:mrow></mml:mrow></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mtext>a</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mtext>R</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mtext>a</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mi mathvariant="normal">&#x03B3;</mml:mi></mml:mrow><mml:msub><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:msub><mml:mrow><mml:mtext>P</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup><mml:mo fence="false" stretchy="false">|</mml:mo><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mtext>a</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:msub><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mtext>a</mml:mtext></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:msub><mml:mrow><mml:mi mathvariant="normal">&#x03C0;</mml:mi></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mrow><mml:mtext>a</mml:mtext></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup><mml:mo fence="false" stretchy="false">|</mml:mo><mml:msup><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow><mml:msup><mml:mrow><mml:mtext>Q</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x03C0;</mml:mi></mml:mrow></mml:mrow></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:msup><mml:mrow><mml:mtext>a</mml:mtext></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p><bold>Optimal Policy (Bellman Optimality Equation):</bold> RL aims to maximize the cumulative reward:
<disp-formula id="eqn-6"><label>(6)</label><mml:math id="mml-eqn-6" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd /><mml:mtd><mml:msup><mml:mrow><mml:mtext>V</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:munder><mml:mo movablelimits="true" form="prefix">max</mml:mo><mml:mrow><mml:mrow><mml:mtext>a</mml:mtext></mml:mrow></mml:mrow></mml:munder><mml:mo stretchy="false">[</mml:mo><mml:mrow><mml:mtext>R</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mtext>a</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mi mathvariant="normal">&#x03B3;</mml:mi></mml:mrow><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:munder><mml:mrow><mml:mtext>P</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mtext>a</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:msup><mml:mrow><mml:mtext>V</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow><mml:mo stretchy="false">]</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="eqn-7"><label>(7)</label><mml:math id="mml-eqn-7" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd /><mml:mtd><mml:msup><mml:mrow><mml:mtext>Q</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mtext>a</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mtext>R</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mtext>a</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mi mathvariant="normal">&#x03B3;</mml:mi></mml:mrow><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:munder><mml:mrow><mml:mtext>P</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mtext>a</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:munder><mml:mo movablelimits="true" form="prefix">max</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mtext>a</mml:mtext></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:munder><mml:msup><mml:mrow><mml:mtext>Q</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:msup><mml:mrow><mml:mtext>a</mml:mtext></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p><bold>Policy Gradient (PG) (for Policy-Based Methods):</bold> In policy gradient methods, we aim to optimize the policy by adjusting the parameters &#x03B8;. The gradient of the expected cumulative reward with respect to &#x03B8; is:
<disp-formula id="eqn-8"><label>(8)</label><mml:math id="mml-eqn-8" display="block"><mml:msub><mml:mi mathvariant="normal">&#x2207;</mml:mi><mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x03B8;</mml:mi></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mtext>J</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi mathvariant="normal">&#x03B8;</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mtext>E&#xA0;</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x03C0;</mml:mi></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>[</mml:mo><mml:msub><mml:mi mathvariant="normal">&#x2207;</mml:mi><mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x03B8;</mml:mi></mml:mrow></mml:mrow></mml:msub><mml:msub><mml:mi>log</mml:mi><mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x03C0;</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x03B8;</mml:mi></mml:mrow></mml:mrow></mml:msub><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>a</mml:mtext></mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:msup><mml:mrow><mml:mtext>Q</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x03C0;</mml:mi></mml:mrow></mml:mrow></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mtext>a</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p><bold>Baseline RL for Intrusion Detection:</bold> In the RL framework, the IDS agent interacts with the environment (network traffic). The agent observes states. <inline-formula id="ieqn-1"><mml:math id="mml-ieqn-1"><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:mtext>S</mml:mtext></mml:mrow></mml:math></inline-formula>, takes actions <inline-formula id="ieqn-2"><mml:math id="mml-ieqn-2"><mml:mrow><mml:mtext>a</mml:mtext></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:mtext>A</mml:mtext></mml:mrow></mml:math></inline-formula> &#x005C;in, and receives rewards <inline-formula id="ieqn-3"><mml:math id="mml-ieqn-3"><mml:mrow><mml:mtext>r</mml:mtext></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:mtext>R</mml:mtext></mml:mrow></mml:math></inline-formula> to optimize its detection policy &#x03C0; <inline-formula id="ieqn-4"><mml:math id="mml-ieqn-4"><mml:mrow><mml:mi mathvariant="normal">&#x03C0;</mml:mi></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>a</mml:mtext></mml:mrow><mml:mo>|</mml:mo></mml:mrow><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>.</mml:mo></mml:math></inline-formula>
<disp-formula id="eqn-9"><label>(9)</label><mml:math id="mml-eqn-9" display="block"><mml:mrow><mml:mtext>Q</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mtext>a</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo stretchy="false">&#x2190;</mml:mo><mml:mrow><mml:mtext>Q</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mtext>a</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mi mathvariant="normal">&#x03B1;</mml:mi></mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mtext>r</mml:mtext></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mi mathvariant="normal">&#x03B3;</mml:mi></mml:mrow><mml:munder><mml:mo movablelimits="true" form="prefix">max</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mtext>a</mml:mtext></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:munder><mml:mrow><mml:mtext>Q</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:msup><mml:mrow><mml:mtext>a</mml:mtext></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:mtext>Q</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mtext>a</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>Additionally, the notation for the mathematical formulas of the MDP and the Bellman approach for the RL algorithm (as shown in Algorithm 1: MPD-RL Algorithm) is provided in <xref ref-type="table" rid="table-1">Table 1</xref>.</p>
<table-wrap id="table-1">
<label>Table 1</label>
<caption>
<title>List of notations</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th></th>
<th>Keyword</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><bold>S</bold></td>
<td>State function</td>
<td>Determine all possible states for the agent.</td>
</tr>
<tr>
<td><bold>a</bold></td>
<td>Actions function</td>
<td>Predict the agent&#x2019;s state-specific behavior.</td>
</tr>
<tr>
<td><bold>P</bold></td>
<td>Transition probability</td>
<td>Post-activity state alteration.</td>
</tr>
<tr>
<td><bold>r</bold></td>
<td>Detection accuracy reward</td>
<td>Offer rewards for prominent levels of performance.</td>
</tr>
<tr>
<td><bold>E</bold></td>
<td>Expected value</td>
<td>Expectation or reward accumulation average.</td>
</tr>
<tr>
<td><bold>Q</bold></td>
<td>Q-value</td>
<td>AKA the action-value function, is a key concept for evaluating state-action pairings.</td>
</tr>
<tr>
<td><inline-formula id="ieqn-5"><mml:math id="mml-ieqn-5"><mml:mi mathvariant="bold-italic">&#x03B3;</mml:mi></mml:math></inline-formula></td>
<td>Discount factor</td>
<td>Reduce future rewards to strike a balance between short-term and long-term advantages.</td>
</tr>
<tr>
<td><bold>V</bold></td>
<td>Value function</td>
<td>Policy evaluation and strategy optimization.</td>
</tr>
<tr>
<td><bold>&#x03C0;</bold></td>
<td>Policy (&#x03C0;)</td>
<td>Acts of each condition according to the agent&#x2019;s plan.</td>
</tr>
<tr>
<td><bold>V(s)</bold></td>
<td>State-value function</td>
<td>An agent&#x2019;s estimated cumulative benefit from state (s) under a policy (agent&#x2019;s state value).</td>
</tr>
<tr>
<td><bold>max</bold><sub><bold>a</bold></sub></td>
<td>Maximization over actions</td>
<td>An MDP involves choosing the action <italic>a</italic> that maximizes anticipated value or reward. The ideal action for maximizing future rewards is often found in Bellman equations.</td>
</tr>
<tr>
<td><bold>V(s<sup>&#x2032;</sup>)</bold></td>
<td>Value of next state</td>
<td>The following state&#x2019;s value function represents the predicted return from s<sup>&#x2032;</sup> forward. It updates state values in Bellman equations and other recursive computations.</td>
</tr>
<tr>
<td><bold>&#x03B1;</bold></td>
<td>Learning rate</td>
<td>Learning rate affects RL speed. A higher rate enables quick adaptation but can lead to instability, while a lower rate offers stable updates but slows adaptation.</td>
</tr>
<tr>
<td><bold>s</bold><sup><bold>&#x2032;</bold></sup></td>
<td>Next state</td>
<td>Agents modify the environment, showing the system&#x2019;s new state based on network packets and adversarial perturbations. This state indicates whether the IDS successfully detected an intrusion or was misled by an attack, which in turn influences policy decisions and responses.</td>
</tr>
</tbody>
</table>
</table-wrap>
<fig id="fig-23">
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_73540-fig-23.tif"/>
</fig>
<p>Consequently, fields like DRL have made great strides, combining RL algorithms with DNN to tackle more difficult problems and provide SOTA outcomes. Regarding AI, RL remains a hotspot for study, and it shows promise in tackling tough challenges. Thus, DRL appears to be subject to adversarial attacks, which restrict its applicability in crucial real-world applications or systems. Furthermore, DRL can perform well in various situations with limited manual intervention. A key component in classifying RL algorithms is the agent&#x2019;s ability to learn its surroundings, including functions that predict state transitions and rewards. The following categories of RL approaches are shown in <xref ref-type="fig" rid="fig-5">Fig. 5</xref>.</p>
<fig id="fig-5">
<label>Figure 5</label>
<caption>
<title>RL classification</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_73540-fig-5.tif"/>
</fig>
<p><bold>Model-based RL:</bold> An agent learns a model of the environment and then utilizes it to anticipate and plan for the future. Agents change and restructure models periodically. How many data samples do agents require to maximize rewards and perform successfully in their environments? When model-based algorithms state sample efficiently, they imply this. This is because the agent does not need to see every result. Learning (world models), model-based value growth, model-based fine-tuning, or being given are methods to acquire a model.</p>
<p><bold>Model-free RL:</bold> Instead of constructing or getting a model, an agent uses its knowledge to select optimal behaviors in a state. Model-free algorithms learn state values, not policies. The agent&#x2019;s main goal is to behave optimally. Therefore, the agent learns by exploring rather than exploiting its environment. It covers policy optimization and Q-Learning algorithms. Policy-based and value-based model-free methods exist. Some refer to policy-based or PG methods as on-policy. In policy optimization, algorithms optimize parameters &#x03B8; directly by gradient ascent on the performance object, such as the cost function, or indirectly by maximizing a local estimate. The primary objective is to select a PG that rewards good acts and penalizes negative ones. PG can learn an acceptable policy even when the Q-function is too difficult. It converges quickly and can learn stochastic rules. Furthermore, continuous spaces are simpler to model. Policy-based techniques aim to benefit from learning an acceptable policy even when the Q-function is too difficult to learn. It converges quickly and can learn stochastic rules. Furthermore, continuous spaces are simpler to model. Policy-based techniques aim to maximize a performance metric, such as the parametrized policy&#x2019;s true value function, over all starting states [<xref ref-type="bibr" rid="ref-9">9</xref>].
<list list-type="simple">
<list-item><label><bold>B</bold>.</label><p><bold>Adversarial Reinforcement Learning (ARL)</bold></p></list-item>
</list></p>
<p>ARL combines AT and RL, both strong ML paradigms designed to solve problems in dynamic and hostile settings. AT, which stems from the discipline of adversarial ML, entails exposing models to adversarial cases during training to improve their resilience to possible assaults. In contrast, RL is concerned with agents engaging in an environment to learn optimal methods via trial and error, thereby maximizing cumulative rewards over time. ARL combines these two approaches, enabling agents to acquire resilient rules for making informed choices in uncertain and hostile environments. ARL combines AT&#x2019;s combative resistance with RL&#x2019;s decision-making skills and offers new pathways for handling difficult real-world challenges in various disciplines, including cybersecurity and ML [<xref ref-type="bibr" rid="ref-10">10</xref>]. A powerful DRL method, ARL incorporates the DQN model for IDS and is specially designed for Internet of Things (IoT) and Wireless Sensor Networks (WSNs). Findings show that DRL significantly improves detection accuracy compared to traditional ML methods like K-Nearest Neighbors (KNN) and Q-learning. The ARL framework can dynamically identify threats in hostile situations surrounding vital public infrastructure by constantly learning and adapting. ARL enhances the total security posture during NIDS in complex and evolving threat environments, utilizing the actor as an IDS instrument and mood as packets from the network to improve classification judgments and accuracy [<xref ref-type="bibr" rid="ref-11">11</xref>]. ARL Taxonomy is provided via <xref ref-type="fig" rid="fig-6">Fig. 6</xref>, where ARL is classified into model adjustment, application domain, technique, framework feature, and adaptability.</p>
<fig id="fig-6">
<label>Figure 6</label>
<caption>
<title>ARL taxonomy</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_73540-fig-6.tif"/>
</fig>
<p>During ARL, adversarial attacks manipulate the incentive system to have the agent make poor choices. The agent may adopt rude habits or fail to fulfill its goals due to these assaults. Adversarial attacks in ARL are challenging to identify and counteract, necessitating robust defenses to ensure agent performance and reliability. These attacks can lead to improper agent behavior or prevent it from achieving its goals. Adversarial attacks in RL are often difficult to detect and require robust defense approaches to ensure agent performance and reliability. However, <xref ref-type="fig" rid="fig-7">Fig. 7</xref> categorizes the taxonomy of adversarial attacks. Furthermore, <xref ref-type="table" rid="table-2">Table 2</xref> outlines the differences between RL, DRL, and ARL.</p>
<fig id="fig-7">
<label>Figure 7</label>
<caption>
<title>Latest adversarial attacks</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_73540-fig-7.tif"/>
</fig><table-wrap id="table-2">
<label>Table 2</label>
<caption>
<title>Overall comparison between RL, ARL &#x0026; DRL</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th></th>
<th>RL</th>
<th>ARL</th>
<th>DRL</th>
</tr>
</thead>
<tbody>
<tr>
<td><bold>Objective</bold></td>
<td>Acquiring skills in maximizing cumulative reward via choice sequences.</td>
<td>Protecting ML models from Adv. Attacks or leveraging them.</td>
<td>To address complicated sequential decision-making issues, we combine DL via RL.</td>
</tr>
<tr>
<td><bold>Scope</bold></td>
<td>Interactive settings for the purpose of learning optimal policies.</td>
<td>Focuses on the security of ML models and how to break them.</td>
<td>Improved by using DNNs to manage complex state and action spaces.</td>
</tr>
<tr>
<td><bold>Usage</bold></td>
<td>Numerous Apps in various fields, including robotics, gaming, recommendation systems, etc.</td>
<td>Applicable within security, classification of images, NLP, etc.</td>
<td>Implemented in areas that require sequential decision-making, such as healthcare, autonomous vehicles, &#x0026; banking.</td>
</tr>
<tr>
<td><bold>Complexity</bold></td>
<td>Very high</td>
<td>Very high</td>
<td>High</td>
</tr>
<tr>
<td><bold>Cost</bold></td>
<td>Very high</td>
<td>Very high</td>
<td>High</td>
</tr>
<tr>
<td><bold>Algorithms</bold></td>
<td>Includes Monte Carlo methods, policy gradients, Q-learning, etc.</td>
<td>Defense tactics include AT, disruption creation, etc.</td>
<td>It supports RL/DL algorithms, including SAC, PPO, A3C, and DQN.</td>
</tr>
<tr>
<td><bold>Classification</bold></td>
<td>RL is an aspect of ML that belongs to the field of RL.</td>
<td>A subfield of ML, it aims to be more resistant to attacks from adversaries.</td>
<td>Subfield of RL that uses DNN to approximate functions.</td>
</tr>
<tr>
<td><bold>Output accuracy</bold></td>
<td>Very high</td>
<td>Moderate</td>
<td>High</td>
</tr>
<tr>
<td><bold>Missing data?</bold></td>
<td>Yes</td>
<td>Yes</td>
<td>Yes</td>
</tr>
<tr>
<td><bold>Feedback</bold></td>
<td>Yes</td>
<td>Yes</td>
<td>Yes</td>
</tr>
</tbody>
</table>
</table-wrap>
<p><list list-type="simple">
<list-item><label><bold>C</bold>.</label><p><bold>Intrusion Detection System (IDS)</bold></p></list-item>
</list></p>
<p>IDS is a crucial component of today&#x2019;s cybersecurity techniques. IDS is generally a software or hardware technology that observes network and system activity to detect vulnerabilities and respond to unauthorized use, criminal activity, and possible security breaches. An IDS is used similarly to a firewall, serving as a watchful guardian that provides real-time alerts or implements automated actions to secure systems containing sensitive data. It analyzes recurring activity patterns and compares them with known malicious activity signatures or unusual deviations, as shown in <xref ref-type="fig" rid="fig-8">Fig. 8</xref>. IDS plays a crucial role in proactively identifying and removing cyber threats, thereby enhancing the robustness and security of technological systems, whether within an organization&#x2019;s internal networks or beyond its perimeter. Thus, NIDS aim to accurately detect and classify complex attacks in real-time [<xref ref-type="bibr" rid="ref-12">12</xref>]. There are various varieties of IDS; consider <xref ref-type="fig" rid="fig-9">Fig. 9</xref> to demonstrate IDS taxonomy, detection approach, detection methodology, architecture, response method, analysis target, and analysis time.</p>
<fig id="fig-8">
<label>Figure 8</label>
<caption>
<title>General IDS model</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_73540-fig-8.tif"/>
</fig><fig id="fig-9">
<label>Figure 9</label>
<caption>
<title>Overall IDS taxonomy</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_73540-fig-9.tif"/>
</fig>
<p><bold>NIDS</bold> are crucial for detecting and preventing network security threats. Furthermore, NIDS is the initial step in building security status awareness since it is the primary technology for detecting various network attacks and analyzing network data. NIDS can detect patterns or signatures already existing in the training dataset. To meet the categorization standards, it must be updated in real-time. The basic goal of NIDS is to monitor and control network connections while blocking unauthorized ones. <xref ref-type="fig" rid="fig-10">Fig. 10</xref> illustrates the NIDS architecture, which is intended to protect data and systems. As a result, it may be used to determine whether an NIDS occurred.</p>
<fig id="fig-10">
<label>Figure 10</label>
<caption>
<title>Common NIDS architecture</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_73540-fig-10.tif"/>
</fig>
<p><list list-type="bullet">
<list-item>
<p><bold>Hybrid-based IDS (Hybrid-IDS):</bold> Hybrid-IDS integrates various detection techniques, such as signature-based, anomaly-based, and behavior-based methods, to provide comprehensive security measures. By combining these approaches, Hybrid-IDS can effectively detect and mitigate a wide range of cyber threats, providing a more robust defense mechanism than a single-method IDS. Leveraging ML and DL algorithms, Hybrid-IDS can analyze complex patterns and behaviors in network traffic, enhancing threat detection accuracy while minimizing FP. Hybrid-IDS&#x2019;s continuous evolution and refinement ensure their ability to adapt to the ever-changing landscape of cybersecurity threats, making them an asset in safeguarding networks and systems from malicious activities [<xref ref-type="bibr" rid="ref-13">13</xref>].</p></list-item>
<list-item>
<p><bold>Host-based IDS (Host-IDS):</bold> Host-IDS is a security mechanism that monitors and analyzes the internal activities and behaviors of a single host or endpoint within a network. It focuses on detecting suspicious or malicious activities that may indicate a security breach or unauthorized access to the host. Host-IDS examines log files, system calls, file integrity, and network traffic on the specific host to identify any anomalies or signs of intrusion. By providing detailed insights into the activities occurring on a particular host, a Host-IDS plays a crucial role in enhancing an organization&#x2019;s overall security posture by enabling proactive threat detection and response at the individual host level [<xref ref-type="bibr" rid="ref-14">14</xref>].</p></list-item>
</list></p>
<p><bold>Anomaly Detection (AD):</bold> AD is a critical aspect of cybersecurity that involves identifying unusual or suspicious behavior in a system. It is vital for detecting security threats, such as network intrusions, fraud, or system malfunctions. By leveraging ML and DL techniques, AD systems can analyze vast amounts of data to establish patterns of normal behavior and flag deviations that may indicate anomalies. These systems are designed to continuously adapt and learn from new data, thereby enhancing their ability to detect. AD is essential for proactively identifying and mitigating security risks in various domains, including network security, financial transactions, and industrial processes [<xref ref-type="bibr" rid="ref-15">15</xref>]. Moreover, <xref ref-type="table" rid="table-3">Table 3</xref> illustrates the key differences between the various types of IDS. Finally, IDS seeks to categorize hostile network events in real-time, learn from past experiences, eliminate mistakes, and strengthen network defenses against attacks. It removes the need for humans to develop criteria and indicators to detect and prevent attacks. RL can educate IDS systems on how to respond properly to incentives and penalties [<xref ref-type="bibr" rid="ref-16">16</xref>].</p>
<table-wrap id="table-3">
<label>Table 3</label>
<caption>
<title>Overall classification differences of IDS types</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th></th>
<th>NIDS</th>
<th>Hybrid-IDS</th>
<th>Host-IDS</th>
<th>AD</th>
</tr>
</thead>
<tbody>
<tr>
<td><bold>Definition</bold></td>
<td>Monitors network traffic for suspicious activity.</td>
<td>Combines features of both NIDS and HIDS for better detection.</td>
<td>Monitors a specific host/system for signs of intrusion.</td>
<td>Detects deviations from normal behavior patterns.</td>
</tr>
<tr>
<td><bold>Location of deployment</bold></td>
<td>At strategic points in the network, such as routers and firewalls.</td>
<td>Both network and host levels.</td>
<td>Installed on individual devices or servers.</td>
<td>It can be applied at the network or host level.</td>
</tr>
<tr>
<td><bold>Detection approach</bold></td>
<td>Examine network packets for known threats.</td>
<td>Uses network and host-based logs for comprehensive analysis.</td>
<td>Analyzes system logs, processes, and file integrity to ensure optimal system performance and integrity.</td>
<td>Identifies unusual behavior rather than predefined signatures.</td>
</tr>
<tr>
<td><bold>Strengths</bold></td>
<td>Can detect external attacks before they reach hosts.</td>
<td>Provides a broad and accurate detection mechanism.</td>
<td>Effective at detecting internal threats and system misuse.</td>
<td>Can detect zero-day and novel attacks.</td>
</tr>
<tr>
<td><bold>Weaknesses</bold></td>
<td>May not detect encrypted or host-specific attacks.</td>
<td>More complex to manage and configure.</td>
<td>Cannot detect attacks that do not generate system logs.</td>
<td>High False Positive Rate (FPR) due to deviations from normal behavior.</td>
</tr>
<tr>
<td><bold>Example threats</bold></td>
<td>DDoS attacks, port scanning, and malware propagation.</td>
<td>Advanced Persistent Threats combined attack vectors.</td>
<td>Unauthorized file access, privilege escalation, &#x0026; malware.</td>
<td>Unknown malware, insider threats, unusual login patterns.</td>
</tr>
<tr>
<td><bold>FP</bold></td>
<td>Moderate</td>
<td>Lower than individual IDS types.</td>
<td>Low if properly configured.</td>
<td>High, due to deviations from standard behavior.</td>
</tr>
<tr>
<td><bold>Real-time detection</bold></td>
<td>Yes</td>
<td>Yes</td>
<td>Usually, but depends on system logs.</td>
<td>Yes, but requires proper training for accuracy.</td>
</tr>
<tr>
<td><bold>Definition</bold></td>
<td>Monitors network traffic for suspicious activity.</td>
<td>Combines features of both NIDS and HIDS for better detection.</td>
<td>Monitors a specific host/system for signs of intrusion.</td>
<td>Detects deviations from normal behavior patterns.</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s3">
<label>3</label>
<title>Literature Review</title>
<p>The following section discusses previous ARL-based NIDS studies and how they have identified attacks and malicious behavior in various capacities, employing different methodologies and strategies. In 1999, ML was conceived and used for intrusion detection. &#x201C;Wenke Lee&#x201D; led his team in developing an AD model to assess the network flow of data via log auditing, setting the framework for future ML advancements in IDS. Since the main concept is addressed in 154 related studies that utilize the ARL technique to enhance current NIDS, which are developed using similar technology, <xref ref-type="table" rid="table-4">Table 4</xref> presents a comparison between relevant studies from the past few years, including contributions, datasets, models, and results. Furthermore, <xref ref-type="fig" rid="fig-11">Fig. 11</xref> illustrates the latest advancement of ARL over the last 11 years. Researchers have recently developed and enhanced ARL for use in various environments and applications.</p>
<table-wrap id="table-4">
<label>Table 4</label>
<caption>
<title>Analysis of past studies</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th>Ref.</th>
<th>Contribution</th>
<th>Dataset</th>
<th>Model</th>
<th>Results</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" colspan="5"><bold>Algorithm-oriented classification</bold></td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-16">16</xref>]</bold></td>
<td>RL is employed for NIDS, with attention processes implemented, model efficacy is compared under adversarial assaults, and resilience, accuracy, &#x0026; precision are evaluated in IDS.</td>
<td>NSL-KDD and CICIDS2017 datasets.</td>
<td>DQN model.</td>
<td>Achieved 97.4% accuracy on the NSL-KDD dataset, 98.7% accuracy on the CICIDS2017 dataset, displayed high accuracy along with a low FPR against adversarial assaults, and beat current IDS models in terms of efficacy.</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-17">17</xref>]</bold></td>
<td>Multi-armed bandit for an unsupervised smart home AD to enhance IDS.</td>
<td>MAGPIE dataset.</td>
<td>Multi-armed bandit-based IDS-DQN model.</td>
<td>Improved the accuracy and efficiency of smart infrastructure network-based attack detection.</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-18">18</xref>]</bold></td>
<td>Proposed a DQN-IoT-IDS and showed its superiority. They provided insights into the potential of RL in strengthening cybersecurity defenses for IoT devices.</td>
<td>TON-IoT GPS &#x0026; TON-IoT-2020, Modbus.</td>
<td>DQN model.</td>
<td>DQN Accuracy: 0.7969, Precision: 0.7678, Recall: 0.7838 for TON-IoT GPS Dataset. Thus, TON-IoT Modbus Dataset: DQN Accuracy: 0.8051, Precision: 0.8013, Recall: 0.8051.</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-19">19</xref>]</bold></td>
<td>The following study suggests utilizing DRL to identify IDS features.</td>
<td>NSL-KDD, UNSW-NB15, CICIDS2017.</td>
<td>DQN model.</td>
<td>DRL-based feature allocation is evaluated in terms of accuracy, performance, &#x0026; robustness against adversarial attacks, &#x0026; compared to data gain, the chi-square test, &#x0026; genetic algorithms.</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-20">20</xref>]</bold></td>
<td>This paper summarizes the advancements made in using DRL in IoT systems.</td>
<td>NSL-KDD, BoTNeTIoT-L015, N-BaIoT.</td>
<td></td>
<td>The results demonstrate that DRL-based IDS has successfully detected various threats with accuracy rates ranging from 82% to 98.5%.</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-21">21</xref>]</bold></td>
<td>A novel ensemble-based LSTM model was created to detect corrupted agent behavior in MARL systems. Detailed results and baselines applying attacks show an increased performance.</td>
<td>MADDPG dataset.</td>
<td>Binary LSTM via MARL model.</td>
<td>The ensemble model detects irregularities better than baseline methods. It is highlighted since it boosts average recall by 7% in StarCraft II. It exhibits strong detection capabilities, accurately identifying approximately 100% of attacks with increasing accuracy as the attack rate increases.</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-22">22</xref>]</bold></td>
<td>The study uses ADVERSARIALUScator to enhance the detection of metamorphic malware IDS. Opcode-level obfuscations that resemble malware enhance detection and train other subsystems for protection.</td>
<td>Malicia dataset.</td>
<td>DRL model.</td>
<td>The study&#x2019;s findings show that the ADVERSARIALuscator system is effective at creating malware using obfuscation, making it more difficult for IDS to identify.</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-23">23</xref>]</bold></td>
<td>They created an RL-based structure for logging analysis in HPC systems (ReLog). To provide a fresh perspective on AD by framing it as a series of decision-making challenges. Therefore, they offer a method to utilize the existing logs to generate sufficient training data using a GAN.</td>
<td>SKaMPI dataset.</td>
<td>The ReLog framework is based on the RL model.</td>
<td>The gathered dataset yielded a detection accuracy of 93% for ReLog. Using ML approaches, the research proved that log-based AD in HPC systems is successful. The methodology has shown potential in identifying suspicious or hostile individuals in high-performance computing settings.</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-24">24</xref>]</bold></td>
<td>Proposed an adversarial DRL (X-Swarm) to create metamorphic malware that can bypass sophisticated ATP safeguards. Although complex, the MDP overcomes the difficulties of existing RL and DRL approaches in cybersecurity.</td>
<td>Malicia dataset.</td>
<td>PPO model for efficient policy learning.</td>
<td>According to the findings, X-Swarm was able to develop metamorphic malware versions that were almost identical to the original yet could avoid detection by ATP. The system&#x2019;s agents successfully obfuscate each piece of malware as they remain close to the initial strain.</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-25">25</xref>]</bold></td>
<td>The study introduces an ARL framework to identify weaknesses in the control policies of DNN-driven autonomous vehicles that were not apparent during online testing.</td>
<td>Highway driving scenarios with varying velocity to evaluate collision cases.</td>
<td>An ARL agent will test the deep control policies of autonomous vehicles.</td>
<td>The findings showed that the malicious agent could exploit the control rules, representing a significant improvement over the traditional method of manual testing.</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-26">26</xref>]</bold></td>
<td>Used RL models to optimize IoT decision-making for IoBT twin instrument gateways. They introduced Intelligent Dual Function Sensor Gateways to secure &#x0026; optimize IoT networks.</td>
<td>Network Simulation Parameters</td>
<td>Drift-Diffusion RL model.</td>
<td>They demonstrated that the proposed technique optimizes decision thresholds and enhances IoT security. The model&#x2019;s relevance is evaluated using analysis of variance, showing how disrepute affects redundant transmission.</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-27">27</xref>]</bold></td>
<td>The research enhances IoT IDS using ML and DL. DRL algorithms adapt to the changing IoT environment and enhance IDS. IDS resilience against assaults is also examined using Adversarial ML algorithms.</td>
<td>UNSW-NB15, Bot-IoT, AWID, CICIDS2017, and IoTID20. Datasets.</td>
<td>DRL, Q-Learning model.</td>
<td>The study found the Extreme Learning Machine (ELM) ELM-IDS promising for IoT AD. The work combines advanced ML and DL models to improve IDS against IoT smart home cyberattacks. The study focuses on innovative technologies to secure IoT.</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-28">28</xref>]</bold></td>
<td>They proposed a novel classifier model for IDS in networking that utilizes RL to enhance performance. It investigates multi-agent and ARL approaches to training bias to improve IDS performance in imbalanced datasets.</td>
<td>NSL-KDD and AWID datasets.</td>
<td>AE-RL model</td>
<td>Findings indicated that the AE-RL model was competitive, with results comparable to those of SOTA classifiers. For imbalanced datasets and circumstances in which false negatives are critical, AE-RL is more beneficial since it requires less time for training and prediction.</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-29">29</xref>]</bold></td>
<td>Examined ML-based NIDS resistant to adversarial attacks by offering real-world attacks to test the NIDS model and its vulnerability to adversarial attacks.</td>
<td>Kitsune Dataset and CIC-IDS2017.</td>
<td>MLP, LR, DT, SVM, IF, and KitNET.</td>
<td>Findings provide insight into how effective defensive mechanisms can prevent adversarial assaults and reveal the susceptibility of ML and DL models to such attacks.</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-30">30</xref>]</bold></td>
<td>Aimed to test DNN models utilized by NIDS via malicious samples to test how well black-box attacks trying to evade IDSs work against DNN models in NIDS.</td>
<td>NSL-KDD dataset.</td>
<td>DNN model.</td>
<td>DNN model accurately identified various types of network traffic. However, black-box attacks, notably the ZOO algorithm, revealed NIDS flaws and reduced the model&#x2019;s effectiveness.</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-31">31</xref>]</bold></td>
<td>A powerful Context-Adaptive IDS using DRL agents that can accurately identify complex threats. It demonstrates improved resistance to hostile assaults and practical use in response to changing attack patterns.</td>
<td>NSL-KDD, UNSW-NB15, and AWID datasets.</td>
<td>GNB, KNN, and QDA models.</td>
<td>In the research, RF, ADB, and KNN were high-accuracy classifiers for NSL-KDD and UNSW-NB15, whereas RF, ADB, and QDA performed well on AWID. Improvements in accuracy and a decrease in FPR demonstrated that the model balanced accuracy and FPR across datasets.</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-32">32</xref>]</bold></td>
<td>New ML-based method for auto-construction of fault-tolerant algorithms (Rbcast). Its automation improved the accuracy &#x0026; efficiency.</td>
<td>Fault-Tolerant Distributed Algorithms.</td>
<td>RBcast algorithms based on RL.</td>
<td>An algorithm to synthesize efficient and accurate RBcast, where the automated method generates distributed problem-solving algorithms better than conventional methods.</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-33">33</xref>]</bold></td>
<td>Designed a hybrid simulator with a state-action-dependent function that boosts model expressiveness, establishing an ARL framework to improve simulation fidelity, and showing how the hybrid simulator might be generalized to learn multiple motor skills across tasks.</td>
<td>Trajectory dataset.</td>
<td>PPO model.</td>
<td>They found that after 30 rollouts of randomized starting states, the suggested strategy achieved better target rewards in the target settings than baseline strategies in five of six domain-specific adaptation studies. The hybrid simulator proved to be versatile and capable of efficiently learning new motor abilities.</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-34">34</xref>]</bold></td>
<td>Developed a GAN-based framework to generate adv.l low-rate DDoS traffic, exposing vulnerabilities in IDS systems.</td>
<td>LR-DDoS 2022; Port Scan &#x0026; Slowloris</td>
<td>Deep-learning IDS models evaluated under adversarial traffic.</td>
<td>GAN attacks achieved very high evasion success (&#x007E;99.9%).</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-35">35</xref>]</bold></td>
<td>The primary objective of this research is to demonstrate that DRL models are effective intrusion detection tools for networks.</td>
<td>NSL-KDD and AWID</td>
<td>Various ML and DL models are applied.</td>
<td>Examined multiple ML-IDSs on the AWID multiclass data set. Weighted averages were used to calculate the performance metrics. The DDQN and RF models performed well in all measures.</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-36">36</xref>]</bold></td>
<td>Using RL to make adversarial botnet flows more difficult to detect, while evading target detectors, a new botnet flow-generating architecture is introduced.</td>
<td>IOST 2010 dataset.</td>
<td>CNN is based on a DL model.</td>
<td>The study&#x2019;s findings indicate that the agent was successfully trained on 40,000 flows, in addition to the datasets required for training the detection models.</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-37">37</xref>]</bold></td>
<td>Concise ARL review, comparing attack methods with protective systems and highlighting their pros and cons.</td>
<td>CIFAR-10 dataset.</td>
<td>DQN model.</td>
<td>The research findings suggest that disruption during agent pathfinding is possible since it produces statistical outputs for input locations.</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-38">38</xref>]</bold></td>
<td>Discovered how fine-tuning protects against certain adversaries and can uncover new antagonistic rules. They showed that adversarial policies outperform off-distribution baselines in RL, exposing weaknesses.</td>
<td>Stable Baselines dataset.</td>
<td>PPO model.</td>
<td>In competitive simulated robot environments, self-play adversarial techniques outperformed those using off-distribution starting locations. Fine-tuning methods blocked various attackers, but new aggressive strategies exposed flaws in AT that affected RL via a diverging victim policy network.</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-39">39</xref>]</bold></td>
<td>The RADIAL-RL framework trains DRL agents using adversarial loss functions to enhance their resilience. Experiments on Atari games and MuJoCo challenges demonstrate that RADIAL agents outperform existing techniques.</td>
<td>Atari games dataset.</td>
<td>RADIAL-DQN &#x0026;<break/> RADIAL-AC3.</td>
<td>RADIAL agents demonstrate superior performance and better rewards than current approaches on Atari games and MuJoCo tasks. The agents&#x2019; degree of generalizability and resistance to adversarial assaults are both improved by the robust training strategy.</td>
</tr>
<tr>
<td align="center" colspan="5"><bold>Dataset-oriented studies</bold></td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-41">41</xref>]</bold></td>
<td>MANifold and Decision boundary-based AE for ML-IDS, and showed significant algorithm adaptation to every AE attack.</td>
<td>NSL-KDD, CICIDS2017 datasets.</td>
<td>Multilayer perceptron model.</td>
<td>MANDA is quite good at detecting adaptive assaults. They also outperform NSL-KDD on the CICIDS dataset in terms of ASR.</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-42">42</xref>]</bold></td>
<td>Developed a model to identify &#x0026; categorize different forms of IoT network activities, such as assaults.</td>
<td>IoTID20 dataset.</td>
<td>5 supervised models: SNN, DT, BT, KNN, and SVM.</td>
<td>The results of the IoTID20 dataset were encouraging, achieving a classification accuracy ranging from 99.4% to 99.9%.</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-43">43</xref>]</bold></td>
<td>Abstracted ML concepts for security domain issues and a novel data-centered IDS taxonomy are introduced to solve domain challenges using ML.</td>
<td>DARPA199, KDD99, NSL-KDD, &#x0026; UNSW-NB15</td>
<td>SVM, DT, NB, KNN, LSTM, CNN, autoencoder, GAN, and DNN. Models.</td>
<td>Based on data sources and monitoring methodologies, the research categorizes IDSs, with an emphasis on ML approaches. It focuses on how ML may use various kinds of data to improve IDS.</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-44">44</xref>]</bold></td>
<td>Analyzing adversarial assaults using ML approaches might compromise conventional NIDS. Examining different perturbation techniques that affect NIDS performance stresses the need for strong defenses against malicious cyberattacks.</td>
<td>NSL-KDD and UNSW-NB15 datasets.</td>
<td>LR, LDA, DDA, and BAG models.</td>
<td>Adversarial algorithms based on perturbation methods, such as PSO, GA, &#x0026; GAN, may target conventional NIDS, particularly tree-based classifiers. Accordingly, additional research is needed to strengthen cybersecurity and render NIDS more resilient to hostile attacks.</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-45">45</xref>]</bold></td>
<td>A new hybrid feature selection for IDS in an IoT setting. The goal is to improve AD performance by employing dimensionality reduction to reduce training time complexity &#x0026; false alarm rates.</td>
<td>IoTID20 dataset.</td>
<td>ANN, KNN, and Ensemble classifiers with majority voting models.</td>
<td>The suggested model outperforms existing models in (accuracy, precision, recall, and F1-measure), with an extremely high classification rate of 99.9% attained by combining a feature selection strategy, which incorporates both IMF and UMF theories.</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-46">46</xref>]</bold></td>
<td>New NIDS combining statistical learning and DRL. This framework addresses data imbalance and scarcity issues by integrating generative models with neural networks.</td>
<td>NSL-KDD dataset.</td>
<td>DRL models.</td>
<td>Outperformed IDSs in terms of accuracy, precision, recall, and F1-score. By reducing the disparity in data and scarcity, they enhanced NIDS, particularly with limited training data. The findings show that DRL improves IDS.</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-47">47</xref>]</bold></td>
<td>The SAVAER-DNN model is being introduced to enhance detection rates during unknown attacks, showing superior performance to other SOTA IDS models.</td>
<td>NSL-KDD, UNSW-NB15 datasets.</td>
<td>SAVAER-DNN model.</td>
<td>The SAVAE-R-DNN model outperforms earlier classification on the UNSW-NB15 dataset in terms of accuracy, detection rate, F1 score, ROC, &#x0026; AUC evaluations, showing improved attack detection.</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-48">48</xref>]</bold></td>
<td>The study aims to assess the performance of ML-based IDS via IoT networks against adversarial attacks.</td>
<td>Bot-IoT, Kitsune, CIFAR-10.</td>
<td>CVNN model.</td>
<td>Model distillation and AT make the LSTM-CNN model more robust, showing resistance to certain evasion assaults.</td>
</tr>
<tr>
<td align="center" colspan="5"><bold>Application domain (IoT, cloud, UAV, edge computing, etc.)</bold></td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-49">49</xref>]</bold></td>
<td>It investigates the efficacy of RL-based techniques to improve road safety and cybersecurity.</td>
<td>UK Road Safety dataset.</td>
<td>They purposed Q-learning, policy iteration, and DRL.</td>
<td>Experimental findings reveal that RL-based algorithms outperform rule-based systems in terms of collision prevention and lane-changing safety.</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-50">50</xref>]</bold></td>
<td>Studied Boost-Defense, which uses AdaBoost &#x0026; DT for classification and provided a comprehensive experimental investigation to assess the system&#x2019;s performance, focusing on accuracy, precision, recall, and AUC.</td>
<td>TON-IoT-2020 dataset.</td>
<td>Boost-DT.</td>
<td>Both Windows &#x0026; Linux datasets are classified well. The confusion matrix was evaluated on the network layer dataset. Assessments are also compared to innovative research models. The proposed model surpasses others in classification accuracy and mistakes.</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-51">51</xref>]</bold></td>
<td>The study contributes by developing an IDS to detect RPL attacks in the evolving and imbalanced data environment of 6LoWPAN.</td>
<td>Adversary dataset.</td>
<td>ARL model to RPL attacks in 6LoWPAN.</td>
<td>The suggested technique outperforms black-box adversaries in terms of accuracy, recall, precision, and F1 in identifying RPL assaults in 6LoWPAN.</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-52">52</xref>]</bold></td>
<td>Examined how adversarial attacks affect ML algorithms in cognitive nets. It underlines ML vulnerability to hostile perturbations &#x0026; the need for effective defenses.</td>
<td>Telemetry dataset for NIDS classification.</td>
<td>Various ML models.</td>
<td>The findings revealed that the SVM and DNN models remained vulnerable to adversarial attacks, with carefully crafted samples having a significant impact on their performance.</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-53">53</xref>]</bold></td>
<td>Employed ML/DL to construct a broad SE-based IoT-IDS. It uses modern technologies to secure IoT networks.</td>
<td>AWID, CICIDS DDoS, UNSW-NB15</td>
<td>MLP_NN, CNN&#x002B;SVM, AE&#x002B;DBN models.</td>
<td>Various models &#x0026; datasets are used for IDS design, achieving an accuracy of 75% to 99% in detecting IoT device breaches and intrusions.</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-54">54</xref>]</bold></td>
<td>Developed an edge computing VANET IDPS framework driven by RL.</td>
<td>VANETs dataset.</td>
<td>Trained RL with offline labeled data to avoid raw model implementation.</td>
<td>The RL model enhances system efficiency and improves scheduling policy.</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-55">55</xref>]</bold></td>
<td>Focused on implementing ML models for UAV-IDS, specifically using a predictive model implemented in MATLAB.</td>
<td>UAV-IDS-2020 dataset.</td>
<td>UAV IDS ConvNet model.</td>
<td>Demonstrated high detection accuracy for different UAV communication modes, with percentages ranging from 90% to 100%.</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-56">56</xref>]</bold></td>
<td>Used ML/DL models to identify instances of power theft. It investigated evasion attempts that target detection models and suggested defensive measures to counter them.</td>
<td>Real-time electricity consumption dataset.</td>
<td>DQN and DDQN models.</td>
<td>AT enables power theft detection to be more immune to escape attempts. The defensive strategy notably enhances detector accuracy, attack success, and overall performance.</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-57">57</xref>]</bold></td>
<td>Investigated complex IDS challenges using DRL techniques to enhance the Model-Based DRL approaches.</td>
<td>Windows-based datasets.</td>
<td>DRL model.</td>
<td>The combined DRL approach to reducing cyber threats in massive data contexts is promising for improving cybersecurity and IDS.</td>
</tr>
</tbody>
</table>
</table-wrap><fig id="fig-11">
<label>Figure 11</label>
<caption>
<title>The advancement of ARL over the last years</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_73540-fig-11.tif"/>
</fig>
<sec id="s3_1">
<label>3.1</label>
<title>Analysis of Past Studies</title>
<p><list list-type="simple">
<list-item><label><bold>A</bold>.</label><p><bold>Algorithm-Oriented Classification</bold></p></list-item>
</list></p>
<p>To begin with, ref. [<xref ref-type="bibr" rid="ref-17">17</xref>] examines advertising aesthetics and their significance in advertising research. The study highlights seven important advertising aesthetics themes: originality, textuality, social dimensions, cross-cultural variances, and the media&#x2019;s role in defining aesthetic options. Thus, the recommendation is to incorporate aesthetic ideas into advertising studies and explore advertising aesthetics as a subject of investigation. It demonstrates how aesthetics can enhance consumer engagement in advertising and offers suggestions for further developing theory and practice. Additionally, the study [<xref ref-type="bibr" rid="ref-18">18</xref>] highlights seven important advertising aesthetic themes: originality, textuality, social dimensions, cross-cultural variations, and the media&#x2019;s role in defining aesthetic options. Thus, this work applies RL, especially DQN, to IoT NIDS. The paper compares DQN against Support Vector Machine (SVM), Naive Bayes (NB), and Multilayer Perceptron (MLP) using the TON-IoT dataset. DQN enhances the accuracy, precision, and recall of IoT IDS systems, outperforming existing ones. Furthermore, the research [<xref ref-type="bibr" rid="ref-19">19</xref>] demonstrates how the DRL-based system can automatically learn useful characteristics for effective IDS, thereby enhancing IDS performance. While ref. [<xref ref-type="bibr" rid="ref-20">20</xref>] emphasizes the need to use DRL to develop IDS models for protecting IoT networks, comparing field surveys and evaluations highlights DRL-based IDS applications, datasets, metrics, IDS and RL taxonomy, and prospects. The research emphasizes the general requirement to feed novel models, benchmark datasets, integration with other DL strategies, TL, real-time adaptive IDSs, and lightweight threat detection models to implement DRL for IDS in IoT systems. According to the study, DRL may improve IDS performance, address security concerns, and recommend further research. Thus, the study [<xref ref-type="bibr" rid="ref-21">21</xref>] proposes an ensemble model that combines binary and prediction Long Short-Term Memory (LSTM) to identify compromising actor actions within MARL systems. The algorithm outperforms starting points in detecting abnormalities, with high recall and accuracy rates. A thorough review of detection findings demonstrates the effectiveness of the ensemble approach, highlighting its superior performance in various situations. According to research, the adversarial system, as described in [<xref ref-type="bibr" rid="ref-22">22</xref>], improves IDS detection of metamorphic malware. The algorithm generates opcode-level obfuscations that imitate malware to enhance detection and train subsystems for protection. The Malicia dataset offers a comprehensive collection of malicious files for testing purposes. DRL agents, such as PPO, develop sophisticated malware samples that may evade powerful IDS. The system&#x2019;s ability to generate disguised malware demonstrates RL&#x2019;s promise in cybersecurity protection against emerging threats. In the following research [<xref ref-type="bibr" rid="ref-23">23</xref>], the author presents ReLog, an instrument that enables log analytics within high-performance computing systems based on RL models. With the obtained information, ReLog achieves a detection accuracy of 93% via treating AD as a sequential decision issue. This structure demonstrates that log-based AD is effective in HPC settings and can identify suspicious or malicious users. However, the paper [<xref ref-type="bibr" rid="ref-24">24</xref>] presents X-Swarm, an adversarial DRL platform that generates metamorphic malware versions that may surpass sophisticated ATP protections. It solves the limits of RL and DRL algorithms beyond cybersecurity MDP complexity. By employing the Malicia dataset, X-Swarm increases the legitimate chance of malware, making ATP&#x2019;s detection difficult. The system obfuscates malware versions using the PPO method. The metamorphic malware versions developed resemble the primary virus and avoid ATP detection. X-Swarm&#x2019;s swarm assault simulation poses a significant threat to networks and highlights opportunities for cybersecurity improvement.</p>
<p>Additionally, the research in [<xref ref-type="bibr" rid="ref-25">25</xref>] aimed to enhance the control strategies of autonomous cars powered by DNN using an ARL framework. This research aims to analyze accident situations and model resilience by evaluating these rules across virtual highway driving scenarios with varying velocity limits. The findings demonstrate that the adversarial agent outperforms manual testing techniques and successfully exploits control rules. Wherein [<xref ref-type="bibr" rid="ref-26">26</xref>], RL algorithms improve IoT decision-making in IoBT twin sensor gateways. It offers an Intelligent Attack Detection System and Intelligent Dual Function Sensor Gateways to improve IoT network security and efficiency. Bayesian optimization optimizes IoBT decision thresholds. Thus, the research [<xref ref-type="bibr" rid="ref-27">27</xref>] employs sophisticated ML and DL to improve IoT IDS. It investigates the use of DRL techniques to adapt to the changing IoT environment and enhance IDS. The work utilizes Adversarial ML to evaluate IDS against threats, thereby improving IoT smart home security. The work [<xref ref-type="bibr" rid="ref-28">28</xref>] demonstrated the efficacy and competitiveness of the AE-RL model for IDS compared to more conventional classifiers. As AE-RL reduces false negatives and addresses imbalanced datasets, it shows promise in improving IDS. In another reference [<xref ref-type="bibr" rid="ref-29">29</xref>], the researchers concluded that the training modules were beneficial, as xenophobic views among young police officers decreased significantly over a four-year period. There was evidence that the dual-track bachelor&#x2019;s degree had an effect since modifications to social desirability ratings were not explained by an increase in self-confidence. To assess the effectiveness of black-box attacks via [<xref ref-type="bibr" rid="ref-30">30</xref>], the research investigates the generation of adversarial instances against DNN models in NIDS. The NSL-KDD dataset trains and evaluates a DNN model, which performs well but is susceptible to adversarial attacks. This research [<xref ref-type="bibr" rid="ref-31">31</xref>] represents a Context-Adaptive IDS that utilizes DRL agents to accurately identify sophisticated attacks. The NSL-KDD, UNSW-NB15, and AWID datasets showed improved accuracy and decreased FPR. In this study [<xref ref-type="bibr" rid="ref-32">32</xref>], the authors use RL, which enables an agent to go through trial and error, despite requiring a dataset with fault-tolerant distributed algorithms&#x2014;in other words, it is not supervised. Since it would require a vast collection of pre-existing algorithms that manage the challenge, creating a set like this would be complicated and tedious.</p>
<p>Moreover, Ref. [<xref ref-type="bibr" rid="ref-33">33</xref>] represented a framework for adversarial learning that identifies systems to improve the accuracy of simulations. It builds a combination simulator with a state-action-dependent function to make models more expressive and demonstrates how to generalize it to acquire different motor abilities across tasks. With better task incentives in the target contexts, the suggested strategy surpassed baseline strategies in five of six domain-specific adaptation studies. The hybrid simulator works flawlessly when acquiring a wide range of motor skills.</p>
<p>However, the study in [<xref ref-type="bibr" rid="ref-34">34</xref>] developed a GAN-based framework to generate adversarial low-rate DDoS traffic, revealing that small perturbations can significantly weaken intrusion-detection systems. Using the Low-Rate DDoS 2022 dataset and a public Port Scan/Slowloris dataset, the authors evaluated several deep-learning IDS models against the generated adversarial traffic. The results showed a very high evasion success rate of approximately 99.9%, demonstrating the critical vulnerability of existing IDS models to adversarial low-rate DDoS attacks. Through [<xref ref-type="bibr" rid="ref-35">35</xref>], the researcher utilizes RL to reduce hostile botnet flow. Its unique botnet flows, generating architecture fools target detectors. It examines assault methods and introduces ML model evasion. The MCF Project, the IOST botnet dataset, and benign flows are used for training and testing in the experiment dataset. CNNs with manually specified features are used for DL and DT. With 40,000 flows trained, the agent discovered evasive variants, and 100,000 botnet and innocuous stream target predictions were correct [<xref ref-type="bibr" rid="ref-36">36</xref>]. This article builds upon previous work on RL models by examining ARL to enhance AI security [<xref ref-type="bibr" rid="ref-37">37</xref>]. The authors also cover the ground by comparing adversarial attack methods and defense mechanisms. It recommends further research and outlines the benefits and drawbacks of current approaches. They employ a CIFAR-10 dataset and a weighted probabilistic output model based on impact factors to forecast adversarial automated route-finding instances. Since the model produces probability outputs for input locations, the results show that agents interfere with pathfinding. Agent route planning is evaluated for perturbation effects utilizing the energy point, key point, path, and included angle. This work, as referenced in [<xref ref-type="bibr" rid="ref-38">38</xref>], examines adversarial policies within competitive simulation robots. Instead of working, adversarial policies exploit flaws in victim policies. Victims and settings have different victory rates. Victim policy networks vary in response to adversarial policies. The dimensionality of the observation space influences attack susceptibility. Feint-tuning protects victims against attackers. High-dimensional adversarial strategies outperform foundation agents in certain cases. Strong policies require an understanding of adversarial policies in competitive circumstances.</p>
<p>The paper [<xref ref-type="bibr" rid="ref-39">39</xref>] introduces the RADIAL-RL architecture for resilient DRL agents that rely on adversarial loss functions. When assessed on Atari games and MuJoCo tasks, RADIAL agents outperform other methods in terms of computing efficiency. Enthusiastic training improves both defense and generalization. Effective evaluation of agent performance against formidable opponents is achieved via GWC. Both RADIAL-DQN and RADIAL-A3C outperform baseline models on various tasks. Lastly, the study [<xref ref-type="bibr" rid="ref-40">40</xref>] provides an overview of a systematic approach to defense mechanisms that use ML and DL integrated into network applications to combat adversarial attacks.
<list list-type="simple">
<list-item><label><bold>B</bold>.</label><p><bold>Dataset-Oriented Studies</bold></p></list-item>
</list></p>
<p>To start with [<xref ref-type="bibr" rid="ref-41">41</xref>], a MANifold and Decision boundary-based AE detection strategy for ML-based IDS is introduced in the paper as MANDA. The primary goal is to develop an efficient AE detector that can handle various AE attacks without requiring customization of each IDS model. This method relies on creating AEs in the issue space and mapping them to genuine network events. As noted in [<xref ref-type="bibr" rid="ref-42">42</xref>], the research made a significant contribution to cybersecurity across IoT systems by demonstrating that supervised learning models can effectively identify and categorize actions occurring in IoT networks. In another reference [<xref ref-type="bibr" rid="ref-43">43</xref>], authors aimed to find new types of intrusions, decrease the number of false alarms, and increase the accuracy of IDS by using various ML and DL techniques, for the accuracy of intrusion detection and prevention decisions. The research [<xref ref-type="bibr" rid="ref-44">44</xref>] looks at how conventional NIDSs may be hacked by malicious actors utilizing ML approaches. Study findings highlight the need for robust safeguards in cybersecurity by examining the impact of perturbation approaches on the performance of NIDS. Meanwhile, Ref. [<xref ref-type="bibr" rid="ref-45">45</xref>] develops and validates a hybrid feature selection technique for IoT ecosystem ML-based IDS. The IoTID20 dataset, Weka Tool, and Python are used to reduce false alarm rates and training time complexity through dimensionality reduction, thereby improving AD performance. The suggested system selects important characteristics based on entropy to enhance detection accuracy. ANN, KNN, and Ensemble classifiers with majority voting are employed to assess the hybrid feature selection strategy. The suggested model outperforms current accuracy, precision, recall, and F1-measure techniques. The research offers insights into improving IDS effectiveness in IoT contexts via unique feature selection and employs ML methods.</p>
<p>To address data scarcity and imbalance, the study [<xref ref-type="bibr" rid="ref-46">46</xref>] presents an innovative model for NIDS that integrates DRL with statistical approaches. Generative models and neural networks are employed to address these challenges. Improving accuracy, precision, recall, and F1-score in detecting NIDS, the model outperforms current IDSs using LR and SVM classifiers when applied to the NSL-KDD dataset. In particular, the findings demonstrate that the proposed method enhances IDS skills, even when training data are scarce. Via [<xref ref-type="bibr" rid="ref-47">47</xref>], the authors aim to improve detection rates for unknown attacks by introducing the SAVAER-DNN model, which is focused on NIDS. The model is trained and evaluated on the NSL-KDD and UNSW-NB15 datasets, demonstrating that it outperforms the most recent and advanced IDS techniques. Total accuracy, detection rates, and F1 scores are all improved by SAVAER-DNN, which uses data augmentation methods to produce unexpected assault samples. Using ROC curves and AUC values, the model performs quite well in detecting network cyber threats. Lastly, this research [<xref ref-type="bibr" rid="ref-48">48</xref>] evaluates the effectiveness of ML-based IDS in protecting IoT networks from malicious attacks. It uses the Bot-IoT, Kitsune, and CIFAR-10 datasets to evaluate ML-based security solutions. This research utilizes a CVNN model to identify devices in IoT networks and finds that repeated attack strategies are more effective than one-step attacks, as they can fool ML-based DIS models. We also illustrate how adversarial attacks may degrade the efficacy of SVM, DT, and RF classifications. The LSTM-CNN model exhibits improved robustness after AT and model distillation and is resilient to various evasion attacks.
<list list-type="simple">
<list-item><label><bold>C</bold>.</label><p><bold>Application Domain</bold></p></list-item>
</list></p>
<p>The authors, as cited in [<xref ref-type="bibr" rid="ref-49">49</xref>], emphasize RL&#x2019;s potential for improving road safety and cybersecurity, highlighting its accuracy, flexibility, and efficiency. In [<xref ref-type="bibr" rid="ref-50">50</xref>], researchers proposed a Boost-Defense system based on the AdaBoost technique, which demonstrated excellent classification performance on various datasets. Confusion matrix evaluation was conducted on this network layer dataset to demonstrate the model&#x2019;s performance. Compared to current research models, the suggested Boost-Defense system outperformed them in terms of accurate classification and error rates. Wherein [<xref ref-type="bibr" rid="ref-51">51</xref>], the study focuses on detecting diverse types of attacks, including direct-resource-topology, indirect-resource-topology, sub-optimization-topology, and isolation-topology attacks. The research [<xref ref-type="bibr" rid="ref-52">52</xref>] emphasized the necessity of testing ML models for worst-case situations and adversarial resistance rather than relying exclusively on standard measures such as accuracy. Adversarial attacks may significantly impact the efficiency and safety of ML models used in neural networking applications, demanding the development of effective response measures. Through [<xref ref-type="bibr" rid="ref-53">53</xref>], authors employ SE-based intrusion detection design and architecture to improve IoT cybersecurity. Cyberattacks are detected using powerful ML methods. The algorithms are trained and evaluated using the AWID and CICIDS datasets, resulting in identification rates of 75.0% to 99.0%. According to the research, DL methods like CNN and SVM may enhance IoT security. Overall, the study sheds light on how ML might address IoT cyber risks. The research [<xref ref-type="bibr" rid="ref-54">54</xref>] also suggests a VANET IDPS architecture in edge computing driven by RL to improve the accuracy and processing efficiency of IDS and preventive decisions. Due to the scarcity of training data for VANETs, it is recommended to utilize a GAN to generate training data from existing attack data. By incorporating RL into IDPS decision-making, system effectiveness and planning policy are enhanced. The present instance demonstrates how the suggested architecture works in VANET IDPS at the network&#x2019;s edge, proving its worth in improving transportation system security. To enhance the dependability and safety of autonomous cars, this method emphasizes the importance of using ARL to identify instances of failure in targeted models and to understand the limitations of deep control strategies. The study employs MATLAB 2020b&#x2019;s predictive model to apply ML for UAV-IDS via [<xref ref-type="bibr" rid="ref-55">55</xref>]. For UAV communication networks and cyberattacks/IDS, the research uses the UAV-IDS-2020 dataset. Wi-Fi traffic logs with binary output labels for regular and abnormal UAV operations are encrypted. A shallow ConvNet is used for training, testing, and tuning. Experimental findings show 90%&#x2013;100% detection accuracy for various UAV communication modalities. The model can identify breaches in UAV communication networks, demonstrating its potential to enhance cybersecurity. The study in [<xref ref-type="bibr" rid="ref-56">56</xref>] employs DL to detect electricity theft. To enhance threat identification, it analyzes evasion attacks on detection models and proposes countermeasures. ML-based models for power theft detection are trained and evaluated using data collected from intelligent meters. 69,680 harmless samples were generated during testing and training from 130 consumer readings. Power theft detectors on a global scale are taught to identify theft using DRL models such as DQN and Double Deep Q-Networks (DDQN). The models use FFNN, CNN, and GRU neural network architectures to get higher performance. AT makes power theft detection systems more resilient to evasion attacks. The proposed defensive strategy enhances performance, increases assault success rates, and improves detector accuracy. Lastly, authors within [<xref ref-type="bibr" rid="ref-57">57</xref>] delve into the use of DRL techniques for cyber defense, specifically in addressing difficulties with IDS. Improving cybersecurity techniques is the goal of this investigation on Model-Based DRL methodologies&#x2019; capabilities. This research examines the effectiveness of Host-IDS in mitigating multiple types of cyberattacks using datasets specific to Windows. A hybrid DL model enhances IDS in large data settings, combining DL methods with IDS. With the hybrid model exhibiting encouraging results in identifying and mitigating cyber threats, the findings reveal that DRL approaches are effective in enhancing cybersecurity measures. The research indicates that DRL algorithms have the potential to enhance IDS and cybersecurity measures.</p>
</sec>
<sec id="s3_2">
<label>3.2</label>
<title>Critical Analysis of Past Studies</title>
<p>The latest studies at ARL-NIDS indicate a promising, albeit fragmented, array of strategies to increase detection accuracy, flexibility, and resistance in hostile environments. Early contributions, for instance, the studies [<xref ref-type="bibr" rid="ref-28">28</xref>,<xref ref-type="bibr" rid="ref-35">35</xref>] demonstrated the feasibility of RL within IDS, while later works introduced AT to harden models against disturbances and new threats [<xref ref-type="bibr" rid="ref-30">30</xref>,<xref ref-type="bibr" rid="ref-39">39</xref>]. Thus, numerous studies have proposed innovative algorithms, ranging from multi-armed bandit models [<xref ref-type="bibr" rid="ref-19">19</xref>] and LSTM-based multi-agent RL [<xref ref-type="bibr" rid="ref-21">21</xref>] to hierarchical and DRL frameworks [<xref ref-type="bibr" rid="ref-56">56</xref>,<xref ref-type="bibr" rid="ref-58">58</xref>], to enhance adaptability and robustness. Additionally, various works have integrated ARL into specific domains, such as IoT [<xref ref-type="bibr" rid="ref-27">27</xref>,<xref ref-type="bibr" rid="ref-51">51</xref>], vehicle and edge computing [<xref ref-type="bibr" rid="ref-54">54</xref>], UAV safety [<xref ref-type="bibr" rid="ref-55">55</xref>], and smart grid systems [<xref ref-type="bibr" rid="ref-46">46</xref>], demonstrating context-aware performance improvements. However, a recurring challenge identified in these studies is the limited generalization and robustness of models: they often outperform reference datasets such as NSL-KDD, CICIDS2017, and IoTID20 [<xref ref-type="bibr" rid="ref-42">42</xref>,<xref ref-type="bibr" rid="ref-45">45</xref>,<xref ref-type="bibr" rid="ref-46">46</xref>], which provide high accuracy in controlled settings but degrade performance under realistic adversarial perturbations. Furthermore, while adversarial DRL approaches such as [<xref ref-type="bibr" rid="ref-22">22</xref>,<xref ref-type="bibr" rid="ref-25">25</xref>] perform sophisticated defense and attack simulations, their computational complexity and lack of interpretability hinder real-time implementation. Even though the review [<xref ref-type="bibr" rid="ref-48">48</xref>] further emphasizes the absence of standardized evaluation metrics and defense-aware RL training strategies. In summary, while ARL-based NIDS research has demonstrated significant conceptual and technological advances, it remains hindered by scalability issues, dataset bias, and the need for robust, interpretable, and trustworthy mechanisms that can withstand dynamic and adversarial network environments.</p>
</sec>
</sec>
<sec id="s4">
<label>4</label>
<title>Methodology of ARL-NIDS Approach</title>
<p>To start with, the ARL-NIDS comparative method segment, ARL is a branch of RL that could address varied conditions in which the agent encounters adversaries or opponents actively seeking to obstruct its development or take advantage of weaknesses, in addition to boundaries in its decision-making framework. The aim is to research robust policies that can withstand antagonistic assaults or disturbances. The following section outlines the conceptual framework of ARL-NIDS for detecting malicious and unusual activities. Furthermore, ARL-NIDS may appoint RL models based on the detection of malicious and peculiar activities. Moreover, ARL-NIDS involves various key steps to ensure its effectiveness and robustness in detecting and mitigating cyber threats. Thus, Algorithm 2 illustrates the overall ARL-NIDS algorithm. Additionally, Algorithm 2 guides the construction of an ARL-NIDS framework that can enhance community defenses. ARL-NIDS, the generalized structure, is represented in <xref ref-type="fig" rid="fig-12">Fig. 12</xref>.</p>
<fig id="fig-12">
<label>Figure 12</label>
<caption>
<title>General ARL-NIDS architecture; the following architecture illustrates the generic approach for ARL-NIDS</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_73540-fig-12.tif"/>
</fig>
<fig id="fig-24">
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_73540-fig-24.tif"/>
</fig>
<p><xref ref-type="fig" rid="fig-12">Fig. 12</xref> illustrates the process of selecting data for the ARL-NIDS model, which enhances NIDS by training it to identify and respond to attacks in network traffic. The NIDS agent finds unusual activities, and the reward function checks the agent&#x2019;s actions to label them as either normal or an attack. This feedback trains and evaluates the NIDS agents to enhance security detection. We have twelve evaluation metrics to measure performance, including accuracy, F1-score, recall, FPR, True Negative Rate (TNR), log loss, precision, ROC-AUC, PR-AUC, confusion matrix, Matthews Correlation Coefficient (MCC), and Cohen&#x2019;s Kappa. We optimize results through epochs of training rounds, during which the model learns from its experiences. ARL aims to achieve the best behavior in decision-making tasks.</p>
<p>To consider the hostile character of the environment via ARL, while an opponent influences the environment, it might be necessary to adapt the Bellman equation. However, the approaches for overtly or implicitly modeling the adversary&#x2019;s activities, among other aspects associated with identifying ARL issues, may dictate how the Bellman equation remains adjusted. <xref ref-type="fig" rid="fig-13">Fig. 13</xref> illustrates ARL-NIDS, which involves unfavorable threats and defenses, where the system is trained on network traffic and evaluated against various adverse dangers. Attackers can try poisoning, theft, or extraction, which is countered through poison detection, negative training, and verification. This framework process remains flexible by maintaining a reliable classification performance during the input, RL process, and attacks of the IDs. Furthermore, <xref ref-type="fig" rid="fig-14">Fig. 14</xref> illustrates a conceptual framework for ARL-NIDS. It integrates adversarial components, namely the Adversarial Actor and Adversarial Critic, with an agent responsible for packet classification.</p>
<fig id="fig-13">
<label>Figure 13</label>
<caption>
<title>General ARL-NIDS combining adversarial threats &#x0026; defenses</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_73540-fig-13.tif"/>
</fig><fig id="fig-14">
<label>Figure 14</label>
<caption>
<title>Overall, ARL-NIDS process</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_73540-fig-14.tif"/>
</fig>
<p>The opponent manipulates the agent&#x2019;s input or reward signal to create ideal yet challenging circumstances, thereby enhancing the agent&#x2019;s flexibility in responding to complex attacks. The agent interacts with the environment by classifying network packets and assessing its detection accuracy, which adversarial critics then evaluate. This cyclical engagement fosters adaptive learning and flexibility in intrusion detection.</p>
<p><bold>Q-Learning Update with Adversarial Perturbation:</bold> The <xref ref-type="disp-formula" rid="eqn-11">Formula (11)</xref> extends RL to adversarial IDS by modifying Q-learning updates under adversarial settings. This equation also updates the RL action-value function. It calculates the cumulative payoff for the activity <inline-formula id="ieqn-6"><mml:math id="mml-ieqn-6"><mml:msub><mml:mrow><mml:mtext>a</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>t</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> in state <inline-formula id="ieqn-7"><mml:math id="mml-ieqn-7"><mml:msub><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>t</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula>, considering environmental adversaries. The agent critic and network environment loop in the ARL-NIDS <xref ref-type="fig" rid="fig-14">Fig. 14</xref> refines the agent&#x2019;s policy after analyzing incentives and future states [<xref ref-type="bibr" rid="ref-28">28</xref>].
<disp-formula id="eqn-10"><label>(10)</label><mml:math id="mml-eqn-10" display="block"><mml:mrow><mml:mtext>Q</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>t</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mtext>a</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>t</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo stretchy="false">&#x2190;</mml:mo><mml:mrow><mml:mtext>Q</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>t</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mtext>a</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>t</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mi mathvariant="normal">&#x03B1;</mml:mi></mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:msub><mml:mrow><mml:mtext>r</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>t</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mrow><mml:mi mathvariant="normal">&#x03B3;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mo movablelimits="true" form="prefix">max</mml:mo><mml:mrow><mml:mi>a</mml:mi><mml:mtext>&#x2032;</mml:mtext></mml:mrow></mml:msub></mml:mrow><mml:mspace width="thinmathspace" /><mml:mrow><mml:mtext>Q</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>t</mml:mtext></mml:mrow><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msup><mml:mrow><mml:mtext>a</mml:mtext></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:mtext>Q</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>t</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mtext>a</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>t</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p><bold>Adversarial Perturbation of States:</bold> <xref ref-type="disp-formula" rid="eqn-12">Formula (12)</xref> illustrates how the attacker disrupts the state to deceive the IDS. Adversarial actors create &#x03B4;, which alters observable traffic properties before the data reaches the agent. This is evident in the picture, where Adversarial Actors move into the Network Environment, injecting adversarial samples [<xref ref-type="bibr" rid="ref-34">34</xref>].
<disp-formula id="eqn-11"><label>(11)</label><mml:math id="mml-eqn-11" display="block"><mml:msubsup><mml:mrow><mml:mtext>S</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>t</mml:mtext></mml:mrow></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>t</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="normal">&#x03B4;</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mi mathvariant="normal">&#x03B4;</mml:mi></mml:mrow><mml:mo>&#x223C;</mml:mo><mml:mrow><mml:mtext>A</mml:mtext></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>t</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:math></disp-formula></p>
<p><bold>Policy Gradient with Advantage Function:</bold> Actor update rule in an actor-critic method. The agent adjusts its policy parameters to enhance action selection in the face of hostile influence. <inline-formula id="ieqn-8"><mml:math id="mml-ieqn-8"><mml:mrow><mml:mi mathvariant="normal">&#x03B8;</mml:mi></mml:mrow><mml:mo>.</mml:mo></mml:math></inline-formula> The advantage function <inline-formula id="ieqn-9"><mml:math id="mml-ieqn-9"><mml:mrow><mml:mtext>A</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>t</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mtext>a</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>t</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> Helps the agent find better-than-average actions. Agent actors learn from agent critic comments, as seen in <xref ref-type="fig" rid="fig-14">Fig. 14</xref>.
<disp-formula id="eqn-12"><label>(12)</label><mml:math id="mml-eqn-12" display="block"><mml:mrow><mml:mi mathvariant="normal">&#x03B8;</mml:mi></mml:mrow><mml:mo stretchy="false">&#x2190;</mml:mo><mml:mrow><mml:mi mathvariant="normal">&#x03B8;</mml:mi></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mi mathvariant="normal">&#x03B1;</mml:mi></mml:mrow><mml:mi mathvariant="normal">&#x2207;</mml:mi><mml:mrow><mml:mi mathvariant="normal">&#x03B8;</mml:mi></mml:mrow><mml:mi>log</mml:mi><mml:mo>&#x2061;</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="normal">&#x03C0;</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x03B8;</mml:mi></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mrow><mml:mtext>a</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>t</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>|</mml:mo></mml:mrow><mml:msub><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>t</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mrow><mml:mtext>A</mml:mtext></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>t</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mtext>a</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>t</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:math></disp-formula></p>
<p><bold>Adversarial Reward Shaping:</bold> The reward function is reshaped by adversarial influence, where <inline-formula id="ieqn-10"><mml:math id="mml-ieqn-10"><mml:mrow><mml:mtext>C</mml:mtext></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>t</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mtext>a</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>t</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> Represents the cost induced by adversarial attacks (e.g., false negatives, evasion success). The parameter <inline-formula id="ieqn-11"><mml:math id="mml-ieqn-11"><mml:mrow><mml:mi mathvariant="normal">&#x03BB;</mml:mi></mml:mrow></mml:math></inline-formula> Balances standard detection rewards with adversarial penalties. <xref ref-type="fig" rid="fig-14">Fig. 14</xref> corresponds to the Adversarial Critic, which evaluates the success of the attack against the IDS [<xref ref-type="bibr" rid="ref-34">34</xref>].
<disp-formula id="eqn-13"><label>(13)</label><mml:math id="mml-eqn-13" display="block"><mml:msubsup><mml:mrow><mml:mtext>r</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>t</mml:mtext></mml:mrow></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mtext>r</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>t</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:mi mathvariant="normal">&#x03BB;</mml:mi></mml:mrow><mml:mo>.</mml:mo><mml:mrow><mml:mtext>C</mml:mtext></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>t</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mtext>a</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>t</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:math></disp-formula></p>
<p><xref ref-type="table" rid="table-5">Table 5</xref> provides a summary of the key components of ARL-NIDS, mapping each component to its corresponding formula and components. To clarify the significance of each term used in the formulas, <xref ref-type="table" rid="table-6">Table 6</xref> explains each term mentioned.</p>
<table-wrap id="table-5">
<label>Table 5</label>
<caption>
<title>ARL-NIDS key component summary</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th>Component</th>
<th>Description</th>
<th>Mapping with formulas</th>
</tr>
</thead>
<tbody>
<tr>
<td><bold>Adversarial actor</bold></td>
<td>Generates adversarial actions or perturbations to challenge the agent</td>
<td><xref ref-type="disp-formula" rid="eqn-11">Formula (11)</xref></td>
</tr>
<tr>
<td><bold>Adversarial critic</bold></td>
<td>Evaluates the agent&#x2019;s response and assigns adversarial rewards</td>
<td><xref ref-type="disp-formula" rid="eqn-13">Formula (13)</xref></td>
</tr>
<tr>
<td><bold>Agent actor</bold></td>
<td>Classifies network packets and learns optimal policies</td>
<td><xref ref-type="disp-formula" rid="eqn-11">Formula (11)</xref></td>
</tr>
<tr>
<td><bold>Agent critic</bold></td>
<td>Assesses the agent&#x2019;s performance and updates its policy</td>
<td><xref ref-type="disp-formula" rid="eqn-10">Formula (10)</xref></td>
</tr>
<tr>
<td><bold>Environment</bold></td>
<td>Simulates network conditions and provides feedback</td>
<td><xref ref-type="disp-formula" rid="eqn-12">Formula (12)</xref></td>
</tr>
</tbody>
</table>
</table-wrap><table-wrap id="table-6">
<label>Table 6</label>
<caption>
<title>Notions table</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th>Notation</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><bold>s</bold><sub><bold>t</bold></sub></td>
<td>Current state of the network environment (traffic features at time t).</td>
</tr>
<tr>
<td><bold>s</bold><sub><bold>t</bold></sub> <bold>&#x002B;</bold> <bold>1</bold></td>
<td>The next state of the network environment after the action is <inline-formula id="ieqn-12"><mml:math id="mml-ieqn-12"><mml:msub><mml:mrow><mml:mtext>a</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>t</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula>.</td>
</tr>
<tr>
<td><bold>s<sup>&#x2032;</sup></bold><bold>t</bold></td>
<td>An adversarial perturbed state generated by adversarial actors.</td>
</tr>
<tr>
<td><bold>a</bold><sub><bold>t</bold></sub></td>
<td>Action taken by the agent: <bold>For example, classify it as benign/malicious and apply a</bold> detection rule.</td>
</tr>
<tr>
<td><bold>r</bold><sub><bold>t</bold></sub></td>
<td>Reward signal based on detection accuracy, FP, or robustness under attack.</td>
</tr>
<tr>
<td><bold>&#x03B1;</bold></td>
<td>Learning rate control updates the speed of value or policy functions.</td>
</tr>
<tr>
<td><inline-formula id="ieqn-13"><mml:math id="mml-ieqn-13"><mml:mi mathvariant="bold-italic">&#x03B3;</mml:mi></mml:math></inline-formula></td>
<td>The discount factor determines the importance of future rewards.</td>
</tr>
<tr>
<td><bold>Q</bold> <bold>(s</bold><sub><bold>t</bold></sub>, <bold>a</bold><sub><bold>t</bold></sub><bold>)</bold></td>
<td>Action-value function representing expected return.</td>
</tr>
<tr>
<td><bold>A</bold> <bold>(s</bold><sub><bold>t</bold></sub>, <bold>a</bold><sub><bold>t</bold></sub><bold>)</bold></td>
<td>The advantage function estimates the relative value of action <inline-formula id="ieqn-14"><mml:math id="mml-ieqn-14"><mml:msub><mml:mrow><mml:mtext>a</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>t</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> at state <inline-formula id="ieqn-15"><mml:math id="mml-ieqn-15"><mml:msub><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>t</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula>.</td>
</tr>
<tr>
<td><inline-formula id="ieqn-16"><mml:math id="mml-ieqn-16"><mml:mrow><mml:mi mathvariant="normal">&#x03B4;</mml:mi></mml:mrow></mml:math></inline-formula></td>
<td>Adversarial perturbation introduced to deceive the IDS.</td>
</tr>
<tr>
<td><inline-formula id="ieqn-17"><mml:math id="mml-ieqn-17"><mml:mrow><mml:mi mathvariant="normal">&#x03B8;</mml:mi></mml:mrow></mml:math></inline-formula></td>
<td>Policy parameters updated during training.</td>
</tr>
<tr>
<td><inline-formula id="ieqn-18"><mml:math id="mml-ieqn-18"><mml:mi mathvariant="bold-italic">&#x03BB;</mml:mi></mml:math></inline-formula></td>
<td>The weighting factor controls the influence of adversarial cost on the reward function.</td>
</tr>
<tr>
<td><bold>C</bold> <bold>(s</bold><sub><bold>t</bold></sub>, <bold>a</bold><sub><bold>t</bold></sub><bold>)</bold></td>
<td>The Adversarial Cost Function measures the effectiveness of adversarial actions&#x2014;e.g., misclassification rate, perturbation strength, or evasion success.</td>
</tr>
</tbody>
</table>
</table-wrap>
<p><xref ref-type="table" rid="table-7">Table 7</xref> examines the most recent studies that utilize diverse AI models, including RL, ML, and DRL, in various environments. Thus, it summarizes the most common models, environments, and IDS. In addition, the table also explores the latest dataset employed and the novel techniques. Lastly, the attacks detected in previous studies are listed in the table.</p>
<table-wrap id="table-7">
<label>Table 7</label>
<caption>
<title>General analysis of comparative models</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th align="center" rowspan="2">Ref.</th>
<th align="center" rowspan="2">Dataset</th>
<th align="center" rowspan="2">Feature selection</th>
<th colspan="4">Algorithm</th>
<th colspan="3">Environment</th>
<th colspan="3">IDS</th>
<th align="center" rowspan="2">Classification method</th>
<th>Attacks detected</th>
</tr>
<tr>
<th>ARL</th>
<th>RL</th>
<th>DRL</th>
<th>ML</th>
<th>SDN</th>
<th>IoT</th>
<th>NN</th>
<th>AD</th>
<th>HIDS</th>
<th>NIDS</th>

<th></th>
</tr>
</thead>
<tbody>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-58">58</xref>]</bold></td>
<td>CORA-ML<break/>CITESEER</td>
<td>&#x00D7;</td>
<td></td>
<td></td>
<td>&#x221A;</td>
<td></td>
<td></td>
<td></td>
<td>&#x221A;</td>
<td>&#x221A;</td>
<td colspan="2">&#x00D7;</td>
<td>GNNs for node classification</td>
<td>Non-targeted Poison Attack, Target Evasion.</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-59">59</xref>]</bold></td>
<td>Atari games dataset</td>
<td>Universal perturbation generation for feature categories.</td>
<td></td>
<td></td>
<td>&#x221A;</td>
<td></td>
<td>&#x221A;</td>
<td></td>
<td></td>
<td>&#x221A;</td>
<td colspan="2">&#x00D7;</td>
<td>DAP was proposed to attack the DRL system.</td>
<td>ZOO, AutoZOOM, Decision-Based, UAPs, Derivative-Free, and SAI-FGSM attacks.</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-60">60</xref>]</bold></td>
<td>CSI 300.<break/>DJIA.<break/>S&#x0026;P 500.</td>
<td>fails to offer a portfolio selection and feature selection technique.</td>
<td></td>
<td></td>
<td>&#x221A;</td>
<td></td>
<td>&#x221A;</td>
<td></td>
<td></td>
<td>&#x221A;</td>
<td colspan="2">&#x00D7;</td>
<td>Enhance investment portfolios employing RL and constraint solvers.</td>
<td>The study does not specify or list the specific attacks detected.</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-61">61</xref>]</bold></td>
<td>CTU dataset<break/>BOTNET dataset.</td>
<td>The feature set used for detection is reported in Table I.</td>
<td></td>
<td></td>
<td>&#x221A;</td>
<td></td>
<td></td>
<td></td>
<td>&#x221A;</td>
<td>&#x221A;</td>
<td></td>
<td>&#x221A;</td>
<td>trained using the best practices in related literature.</td>
<td>Neris, Robt, Virut, Menti, Murlo.</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-62">62</xref>]</bold></td>
<td>Michigan Speedway environment in TORCS</td>
<td>&#x00D7;</td>
<td></td>
<td>&#x221A;</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>&#x221A;</td>
<td>&#x221A;</td>
<td colspan="2">&#x00D7;</td>
<td>Trained via ensemble policy networks &#x0026; asymmetric reward functions.</td>
<td>Random perturbations, Adversarial attacks</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-63">63</xref>]</bold></td>
<td>Robust Gymnasium: A Unified Modular Benchmark for Robust RL.</td>
<td>&#x00D7;</td>
<td></td>
<td>&#x221A;</td>
<td></td>
<td></td>
<td>&#x221A;</td>
<td></td>
<td></td>
<td></td>
<td colspan="2">&#x00D7;</td>
<td>DRL policies (PPO/DDPG/DQN in experiment)</td>
<td>Adversarial perturbations on observations (worst-case)</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-64">64</xref>]</bold></td>
<td>Atari games (Pong &#x0026; Breakout).<break/>Autonomous driving (TORCS).<break/>Continuous robot control (Mojuco).</td>
<td>&#x00D7;</td>
<td></td>
<td></td>
<td>&#x221A;</td>
<td></td>
<td></td>
<td></td>
<td>&#x221A;</td>
<td>&#x221A;</td>
<td colspan="2">&#x00D7;</td>
<td>&#x00D7;</td>
<td>Critical Point Attack, Antagonist Attack.</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-65">65</xref>]</bold></td>
<td>IoT traffic generation patterns dataset.</td>
<td>IoT sensors measure humidity, temperature, and the dew point.</td>
<td></td>
<td></td>
<td></td>
<td>&#x221A;</td>
<td></td>
<td>&#x221A;</td>
<td></td>
<td>&#x221A;</td>
<td>&#x221A;</td>
<td></td>
<td>Supervised ML methods (SVM, NB, DT) to categorize data and identify IoT network assaults.</td>
<td>Man-In-The-Middle (MitM) attack, ARP attacks.</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-66">66</xref>]</bold></td>
<td>CICIDS2017 dataset</td>
<td>&#x00D7;</td>
<td></td>
<td></td>
<td>&#x221A;</td>
<td></td>
<td></td>
<td>&#x221A;</td>
<td></td>
<td>&#x221A;</td>
<td></td>
<td>&#x221A;</td>
<td>Not mentioned.</td>
<td>DoS, DDoS, XSS, phishing, eavesdropping, password attack, SQL injection, MITM, and malware.</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-67">67</xref>]</bold></td>
<td>Large-Scale IDS Dataset.</td>
<td>&#x00D7;</td>
<td></td>
<td></td>
<td>&#x221A;</td>
<td></td>
<td></td>
<td></td>
<td>&#x221A;</td>
<td></td>
<td></td>
<td>&#x221A;</td>
<td>Multi-agent DRL agents (policy networks) &#x002B; GAN for data augmentation</td>
<td>Device, email, Logon, HTTP &#x0026; file Attack</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-68">68</xref>]</bold></td>
<td>CSE-CIC-IDS201 dataset.<break/>NSL-KDD dataset.</td>
<td>Multi-agent feature selection (MAFS) to reduce redundant features.</td>
<td></td>
<td></td>
<td>&#x221A;</td>
<td></td>
<td></td>
<td></td>
<td>&#x221A;</td>
<td></td>
<td></td>
<td>&#x221A;</td>
<td>DQL (policy network) combined with GCN/CNN for feature extraction.</td>
<td>DoS, Probe, R2L &#x0026; U2R attacks.</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-69">69</xref>]</bold></td>
<td>IIoT was publicly released by the U.S. Dept of Energy&#x2019;s Oak Ridge National Laboratory dataset.</td>
<td>&#x00D7;</td>
<td></td>
<td></td>
<td>&#x221A;</td>
<td></td>
<td></td>
<td></td>
<td>&#x221A;</td>
<td></td>
<td></td>
<td>&#x221A;</td>
<td>Deep RL agents (DQN variants/DNN classifiers)</td>
<td>IIoT-relevant attacks (anomalies, injection, DoS variants).</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-70">70</xref>]</bold></td>
<td>Robust Gymnasium: A Unified Modular Benchmark for Robust RL dataset.</td>
<td>&#x00D7;</td>
<td></td>
<td></td>
<td>&#x221A;</td>
<td></td>
<td></td>
<td></td>
<td>&#x00D7;</td>
<td></td>
<td></td>
<td></td>
<td>DDPG/PPO/DQN &#x002B; robustness regularizer.</td>
<td>------</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-71">71</xref>]</bold></td>
<td>DS2OS dataset</td>
<td>&#x00D7;</td>
<td></td>
<td>&#x221A;</td>
<td></td>
<td></td>
<td></td>
<td>&#x221A;</td>
<td></td>
<td>&#x221A;</td>
<td></td>
<td>&#x221A;</td>
<td>Authors employ data encoding and data normalization.</td>
<td>
<list list-type="bullet">
<list-item>
<p>DoS.</p></list-item>
<list-item>
<p>Scan.</p></list-item>
<list-item>
<p>Mal-control.</p></list-item>
<list-item>
<p>Mal-operation.</p></list-item>
<list-item>
<p>Spying.</p></list-item>
<list-item>
<p>Data probing.</p></list-item>
<list-item>
<p>Wrong. Setup</p></list-item>
</list>
</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-72">72</xref>]</bold></td>
<td>NSL-KDD dataset.</td>
<td>&#x00D7;</td>
<td></td>
<td>&#x221A;</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>&#x221A;</td>
<td></td>
<td></td>
<td>&#x221A;</td>
<td>Deep feed-forward NN within DQL agent (DQN style)</td>
<td>
<list list-type="bullet">
<list-item>
<p>DoS.</p></list-item>
<list-item>
<p>Probe.</p></list-item>
<list-item>
<p>R2L.</p></list-item>
<list-item>
<p>U2R.</p></list-item>
</list>
</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-73">73</xref>]</bold></td>
<td>CIC-DDoS 2019 dataset.</td>
<td>&#x00D7;</td>
<td></td>
<td>&#x221A;</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>&#x221A;</td>
<td></td>
<td></td>
<td>&#x221A;</td>
<td>DQN-CNN (packet level), policy-gradient methods (flow level); AT integrated</td>
<td>DDoS/dataset attacks as per CIC-DDoS 2019.</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-74">74</xref>]</bold></td>
<td>KDD99 dataset &#x0026; CIC-IDS 2017 dataset.</td>
<td>Feature engineering &#x0026; selection (RFE/selection compared).</td>
<td></td>
<td></td>
<td>&#x221A;</td>
<td></td>
<td></td>
<td>&#x221A;</td>
<td colspan="4">&#x00D7;</td>
<td>CNN, RNN, &#x0026; Transformer models for classification</td>
<td>Web-based attacks (DDoS, SQLi, XSS).</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-75">75</xref>]</bold></td>
<td>Uses realistic scenarios (MITRE ATT&#x0026;CK mapping; simulation datasets)</td>
<td>&#x00D7;</td>
<td></td>
<td></td>
<td>&#x221A;</td>
<td></td>
<td></td>
<td></td>
<td colspan="4">&#x00D7;</td>
<td>Simulated network environment.</td>
<td>MITRE ATT&#x0026;CKS.</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>In short, the methodology and recent advancements demonstrate the growing integration of reinforcement learning with AD in next-generation networked environments. Research authors [<xref ref-type="bibr" rid="ref-76">76</xref>] employ DDQN for AD in Cyber-Physical Systems (CPS), emphasizing RL&#x2019;s importance in managing complex and dynamic conditions. Moreover, the authors in [<xref ref-type="bibr" rid="ref-77">77</xref>] apply DRL to business process automation, highlighting data efficiency and weak supervision as essential design factors. Similarly, a privacy-enhanced DRL-based IDS tailored for CPS, which balances detection accuracy with confidentiality, was developed in [<xref ref-type="bibr" rid="ref-78">78</xref>]. Thus, in [<xref ref-type="bibr" rid="ref-79">79</xref>], to balance an imbalanced dataset, the Synthetic Minority Over-sampling Technique (SMOTE) algorithm is employed to enhance detection fidelity. Regarding the RL framework that integrates oversampling and undersampling strategies to address dataset imbalance issues common in intrusion detection, a proposal is made. However, to balance an imbalanced dataset, the Synthetic Minority Over-sampling Technique (SMOTE) algorithm must be employed before training models. Collectively, these works reinforce that NIDS are an appropriate application area for ARL, as they must operate in adversarial settings, adapt to evolving threats, and ensure robust detection across diverse and imbalanced datasets. This positions ARL-NIDS as a promising methodology for achieving adaptive, resilient, and intelligent cybersecurity defenses.</p>
</sec>
<sec id="s5">
<label>5</label>
<title>Comparative Analysis Results</title>
<p>To begin with, the comparative analysis and discussion section illustrates the results of this unique survey study. This research focused on the ARL-NIDS approach to detecting malicious and abnormal activities, which will be discussed within the current section. The objective is to synthesize earlier research, highlight methodological variations, and assess algorithms, datasets, and settings across the literature. Analysis of the latest applications, dataset selection, ARL within various algorithms, extensions, datasets, cybersecurity risks, security challenges, overall comparison, and analysis and discussion. This current section delves into the following:
<list list-type="simple">
<list-item><label><bold>A</bold>.</label><p><bold>ARL-NIDS Applications</bold></p></list-item>
</list></p>
<p>The increasing complexity of cyber threats in modern digital ecosystems has catalyzed the development of intelligent ID mechanisms. ARL has emerged as a promising paradigm that enables adaptive and flexible NIDS. By integrating side effects that mimic malicious behavior, the ARL framework enhances the model&#x2019;s detection capabilities in response to evolving attacks. Recently, literature reflected a rejection in various domains, including CPS, IIoT, smart health services, and web safety. These systems benefit from the dynamic learning of optimal defense strategies for DRL architecture, improve the accuracy of nonconformities, and treat adverse disorders. Recent progress in ARL-NIDS has shown its versatility and efficiency in addressing real-world challenges. For instance, ref. [<xref ref-type="bibr" rid="ref-76">76</xref>] proposed a DDQN framework for ED in the next generation of CPS, demonstrating increased adaptability in dynamic environments. Business process AD has also benefited from ARL, as demonstrated by [<xref ref-type="bibr" rid="ref-77">77</xref>], who employed weakly supervised learning to improve data efficiency. Similarly, an ARL model introduced privacy protection within [<xref ref-type="bibr" rid="ref-78">78</xref>] ID, addressing data privacy, a significant concern in sensitive infrastructure, while Ref. [<xref ref-type="bibr" rid="ref-79">79</xref>] focused on convenience choice using DRL to adapt performance for detection. Healthcare and vehicular systems are emerging frontiers for ARL-NIDS. The RL for NIDS utilizes the CSE-CIC-IDS2018 and NSL-KDD datasets, as discussed in [<xref ref-type="bibr" rid="ref-80">80</xref>]. Whereas ref. [<xref ref-type="bibr" rid="ref-81">81</xref>] addressed the coexistence of anomalies in intelligent connected vehicles using RL-based data validity analysis, regarding the oversampling and undersampling discussed to balance unbalanced datasets via RL and DRL. Regarding UAVs and vehicles, for data validity in RL and DRL, as mentioned in [<xref ref-type="bibr" rid="ref-82">82</xref>], the authors define a driving style that&#x2019;s appropriate for a quantitative model. In another context, both studies [<xref ref-type="bibr" rid="ref-83">83</xref>,<xref ref-type="bibr" rid="ref-84">84</xref>] employ RL and DRL in cybersecurity and network security to counter adversarial attack simulations. For IoT environments, it is vital to implement robust cybersecurity frameworks that can help mitigate cyberattacks within IoT layers. However, scientists in [<xref ref-type="bibr" rid="ref-85">85</xref>,<xref ref-type="bibr" rid="ref-86">86</xref>] explore IDS and botnets for RL within the IoT environment. In industrial contexts, the study used [<xref ref-type="bibr" rid="ref-87">87</xref>] the dynamic reward mechanism and the main component analysis to detect IIOT attacks. Thus, in [<xref ref-type="bibr" rid="ref-88">88</xref>], an adaptive RL model for secure routing in smart healthcare is developed. These applications underscore the growing significance of ARL in safeguarding heterogeneous and mission-critical systems. The study [<xref ref-type="bibr" rid="ref-89">89</xref>] seeks to leverage DRL in conjunction with IDS within an IoT context to detect evolving threats and enhance the complexity of detection. Wherein [<xref ref-type="bibr" rid="ref-90">90</xref>], authors introduce a novel approach for computing a policy for prioritizing alerts using ARL. Due to multi-agent RL, authors in [<xref ref-type="bibr" rid="ref-91">91</xref>,<xref ref-type="bibr" rid="ref-92">92</xref>] employ minor and major RL for IDS to enhance both centralized and decentralized approaches based on the NSL-KDD dataset. Regarding the hybrid approach and multi-class network, DRL is employed within NIDS in [<xref ref-type="bibr" rid="ref-93">93</xref>]. Also, authors train GANs with the NSL-KDD dataset. Furthermore, the multi-agent adversarial detection is employed via TL in [<xref ref-type="bibr" rid="ref-94">94</xref>]. Also, the NIDS was employed to detect anomalies against evasion attacks. However, <xref ref-type="table" rid="table-8">Table 8</xref> provides an overview of the latest applications of ARL-NIDS, including brief details.</p>
<table-wrap id="table-8">
<label>Table 8</label>
<caption>
<title>ARL-NIDS latest applications</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th>Application</th>
<th>Description</th>
<th>Ref.</th>
</tr>
</thead>
<tbody>
<tr>
<td><bold>Healthcare &#x0026; smart medical systems</bold></td>
<td>ARL-NIDs can protect secure medical equipment, hospital networks, and telemedicine platforms from cyberattacks, ultimately benefiting patients.</td>
<td>[<xref ref-type="bibr" rid="ref-14">14</xref>,<xref ref-type="bibr" rid="ref-71">71</xref>,<xref ref-type="bibr" rid="ref-88">88</xref>]</td>
</tr>
<tr>
<td><bold>Financial networks/Banking systems</bold></td>
<td>While it utilizes the Arl-enriched ID, it identifies suspicious transactions, fraud, and network attacks targeting financial institutions.</td>
<td>[<xref ref-type="bibr" rid="ref-60">60</xref>]</td>
</tr>
<tr>
<td><bold>Robotics &#x0026; industrial automation</bold></td>
<td>ARL was used to find cyber-attacks in autonomous robots, smart industrial systems, and automated production lines</td>
<td>[<xref ref-type="bibr" rid="ref-25">25</xref>,<xref ref-type="bibr" rid="ref-47">47</xref>,<xref ref-type="bibr" rid="ref-62">62</xref>,<xref ref-type="bibr" rid="ref-75">75</xref>]</td>
</tr>
<tr>
<td><bold>Food/Supply chain systems</bold></td>
<td>ARL-NIDs can protect against cyber threats related to supply chain networks, smart stocks, and food processing IoT systems.</td>
<td>[<xref ref-type="bibr" rid="ref-14">14</xref>,<xref ref-type="bibr" rid="ref-87">87</xref>]</td>
</tr>
<tr>
<td><bold>Smart grid &#x0026; energy systems</bold></td>
<td>The ARL-based identity protects the power grid, the energy distribution network, and smart meters against cyberattacks.</td>
<td>[<xref ref-type="bibr" rid="ref-51">51</xref>,<xref ref-type="bibr" rid="ref-56">56</xref>,<xref ref-type="bibr" rid="ref-78">78</xref>]</td>
</tr>
<tr>
<td><bold>Connected vehicles &#x0026; transportation</bold></td>
<td>Wise discovers distribuend behavior in connected vehicles, traffic networks, and autonomous driving systems.</td>
<td>[<xref ref-type="bibr" rid="ref-26">26</xref>,<xref ref-type="bibr" rid="ref-54">54</xref>,<xref ref-type="bibr" rid="ref-81">81</xref>]</td>
</tr>
<tr>
<td><bold>Web-based &#x0026; industry 5.0 platforms</bold></td>
<td>Web attacks, phishing, and IIoT threats have been prevalent in advanced industrial setups over the last few years.</td>
<td>[<xref ref-type="bibr" rid="ref-66">66</xref>,<xref ref-type="bibr" rid="ref-74">74</xref>,<xref ref-type="bibr" rid="ref-82">82</xref>]</td>
</tr>
<tr>
<td><bold>IoT consumer devices</bold></td>
<td>Secures smart homes, wearable devices, and consumer IoT ecosystems from malware or network intrusions.</td>
<td>[<xref ref-type="bibr" rid="ref-13">13</xref>,<xref ref-type="bibr" rid="ref-18">18</xref>,<xref ref-type="bibr" rid="ref-50">50</xref>,<xref ref-type="bibr" rid="ref-55">55</xref>,<xref ref-type="bibr" rid="ref-69">69</xref>]</td>
</tr>
<tr>
<td><bold>Next-generation CPS</bold></td>
<td>ARL-enhanced IDS for CPS environments integrating multiple physical and digital systems.</td>
<td>[<xref ref-type="bibr" rid="ref-76">76</xref>,<xref ref-type="bibr" rid="ref-83">83</xref>]</td>
</tr>
</tbody>
</table>
</table-wrap>
<p><list list-type="simple">
<list-item><label><bold>B</bold>.</label><p><bold>ARL-NIDS within Various Models</bold></p></list-item>
</list></p>
<p>Depending on the ARL algorithm, the sub-section compares ARL with other algorithms and approaches. <xref ref-type="table" rid="table-9">Table 9</xref> explores the ARL comparison with existing algorithms/models. RL and side effects are used to identify complex and dynamic cyber threats, such as ARL-NIDS and advanced network security approaches. It contradicts traditional IDS models, signature-based IDS, and traditional ML approaches such as Decision Tree (DT), SVM, and Random Forests (RF). However, ARL-NIDS dynamically adjusts the optimal defense strategies through interactions with the network environment, utilizes multi-agent DRL for cooperative advertising, and optimizes the defense strategies in response to unfortunate attacks. Latest research has advanced ARL-NIDS by incorporating hybrid models that integrate methodologies such as GANs for data augmentation and TL for multi-adversarial detection. The mentioned methodologies enable ARL-NIDS to proactively detect abnormalities in complex network contexts, including IoT, IIoT, and CPS, even in the presence of advanced evasion or stealth attacks [<xref ref-type="bibr" rid="ref-95">95</xref>]. ARL-NIDS offers a substantial advantage in terms of flexibility, robustness, and predictive performance compared to traditional IDS frameworks [<xref ref-type="bibr" rid="ref-96">96</xref>]. Traditional monitored models rely on labeled datasets and often struggle against new attacks. ARL-NIDs continuously refine their guidelines through RL, enabling them to predict and counteract the dangers that are already unacceptable [<xref ref-type="bibr" rid="ref-97">97</xref>]. However, the Hybrid ARL-NIDS architecture intersects with GAN or multi-agent collaboration traditional DL-based NIDs in detecting stolen attacks [<xref ref-type="bibr" rid="ref-93">93</xref>,<xref ref-type="bibr" rid="ref-98">98</xref>]. However, sophisticated systems often require multiple processing resources and careful adjustments to reward features, ensuring stability and preventing overheating. Despite these obstacles, ARL-NIDs demonstrate increased flexibility in enemy contexts compared to traditional ML models, providing an active security solution rather than a reactive one. Furthermore, the training time and policy teaching for adversarial attacks against RL algorithms, as presented in [<xref ref-type="bibr" rid="ref-99">99</xref>]. In [<xref ref-type="bibr" rid="ref-100">100</xref>], authors employ RL for an autonomous defense approach for software-defined networks.</p>
<table-wrap id="table-9">
<label>Table 9</label>
<caption>
<title>ARL comparison of exiting algorithms</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th>Algorithm/Model</th>
<th align="center">Accuracy/<break/>Detection</th>
<th>Adaptability</th>
<th>Computational cost</th>
<th>Robustness (Adversarial)</th>
<th>Scalability</th>
<th>Ref.</th>
</tr>
</thead>
<tbody>
<tr>
<td><bold>ARL-NIDS (Single-Agent DRL)</bold></td>
<td>High</td>
<td>High, learns dynamically from the environment</td>
<td>High</td>
<td>High</td>
<td>Low</td>
<td>[<xref ref-type="bibr" rid="ref-90">90</xref>]</td>
</tr>
<tr>
<td><bold>Multi-Agent DRL (Collaborative)</bold></td>
<td>Very high</td>
<td>Very high</td>
<td>High</td>
<td>Very high</td>
<td>High</td>
<td>[<xref ref-type="bibr" rid="ref-91">91</xref>,<xref ref-type="bibr" rid="ref-92">92</xref>]</td>
</tr>
<tr>
<td><bold>DRL &#x002B; GAN hybrid</bold></td>
<td>Very high</td>
<td>High</td>
<td>Very high</td>
<td>Very high</td>
<td>Medium</td>
<td>[<xref ref-type="bibr" rid="ref-93">93</xref>]</td>
</tr>
<tr>
<td><bold>TL &#x002B; Multi-adversarial detection</bold></td>
<td>High</td>
<td>Very High</td>
<td>Medium-High</td>
<td>High</td>
<td>Medium</td>
<td>[<xref ref-type="bibr" rid="ref-94">94</xref>]</td>
</tr>
<tr>
<td><bold>Policy teaching/Environment poisoning</bold></td>
<td>High</td>
<td>High</td>
<td>High</td>
<td>Very high</td>
<td>Low-Medium</td>
<td>[<xref ref-type="bibr" rid="ref-99">99</xref>]</td>
</tr>
<tr>
<td><bold>Strategically timed attacks</bold></td>
<td>High</td>
<td>Medium-High</td>
<td>Medium-High</td>
<td>High</td>
<td>Low-Medium</td>
<td>[<xref ref-type="bibr" rid="ref-98">98</xref>]</td>
</tr>
<tr>
<td><bold>Traditional ML (DT, SVM, RF)</bold></td>
<td>Medium</td>
<td>Low</td>
<td>Low</td>
<td>Low</td>
<td>Low</td>
<td>[<xref ref-type="bibr" rid="ref-15">15</xref>,<xref ref-type="bibr" rid="ref-43">43</xref>,<xref ref-type="bibr" rid="ref-65">65</xref>]</td>
</tr>
<tr>
<td><bold>ELM model</bold></td>
<td>Medium-High</td>
<td>Medium</td>
<td>Low</td>
<td>Medium</td>
<td>Low</td>
<td>[<xref ref-type="bibr" rid="ref-45">45</xref>,<xref ref-type="bibr" rid="ref-72">72</xref>]</td>
</tr>
<tr>
<td><bold>DL (DNN, CNN, LSTM)</bold></td>
<td>High</td>
<td>Medium-High</td>
<td>High</td>
<td>Medium</td>
<td>Low-Medium</td>
<td>[<xref ref-type="bibr" rid="ref-2">2</xref>,<xref ref-type="bibr" rid="ref-5">5</xref>,<xref ref-type="bibr" rid="ref-21">21</xref>,<xref ref-type="bibr" rid="ref-55">55</xref>]</td>
</tr>
</tbody>
</table>
</table-wrap>
<p><list list-type="simple">
<list-item><label><bold>C</bold>.</label><p><bold>ARL-NIDS Extensions</bold></p></list-item>
</list></p>
<p>ARL-NIDS has demonstrated significant dynamic detection capabilities, reducing the incidence of cyberattacks in network environments. However, as with sophisticated AI systems, they are receptive to new attack strategies and operational challenges. Recent research has focused on expanding ARL-NIDS to enhance its strength, adaptability, and overall efficiency in real-world applications. These extensions benefit from insights into unfavorable attacks, AD, social robotics, and moving goal defense to improve the ARL system&#x2019;s identity ability and flexibility [<xref ref-type="bibr" rid="ref-101">101</xref>]. <xref ref-type="fig" rid="fig-15">Fig. 15</xref> identifies the ARL extensions used to acquire knowledge &#x0026; improve efficiency. Various approaches have been proposed to extend the classical ARL-NIDS framework. Malicious attack characterization and optimal adversarial approaches allow ARL models to anticipate and defend against sophisticated threats, improving the stability and convergence of RL policies [<xref ref-type="bibr" rid="ref-102">102</xref>]. Recent extensions of ARL have significantly strengthened its theoretical foundations and expanded its applicability in security-critical domains. During [<xref ref-type="bibr" rid="ref-103">103</xref>], the authors formalized optimal attack strategies against RL policies, exposing vulnerabilities in well-trained agents by leveraging policy structures. However, in [<xref ref-type="bibr" rid="ref-104">104</xref>], snooping attacks are introduced, in which adversaries deduce the internal states or decision patterns of DRL agents, highlighting the dangers of information leakage during deployment. Thus, robust ARL&#x2019;s stability and convergence characteristics in linear quadratic systems provide the mathematical assurances necessary for secure incorporation into control systems, as examined in the study [<xref ref-type="bibr" rid="ref-105">105</xref>].</p>
<fig id="fig-15">
<label>Figure 15</label>
<caption>
<title>Latest ARL extensions: employing ARL extensions to acquire knowledge &#x0026; improve efficiency [<xref ref-type="bibr" rid="ref-101">101</xref>, <xref ref-type="bibr" rid="ref-104">104</xref>&#x2013;<xref ref-type="bibr" rid="ref-112">112</xref>]</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_73540-fig-15.tif"/>
</fig>
<p>DRL overviews outline strategies for enhancing training effectiveness and making policies more generalizable in dynamic network settings [<xref ref-type="bibr" rid="ref-106">106</xref>]. Beyond that, various technical insights offer a detailed look at AI-driven cybersecurity policies, demonstrating how ARL can be employed to develop initiative-taking defense systems in dynamic threat environments [<xref ref-type="bibr" rid="ref-107">107</xref>]. All these works extend ARL beyond simply reacting to obstacles. They also transfer toward strategic foresight, system-level resilience, and formal assurance. Moreover, research has investigated the integration of ARL with social robotics methodologies, enabling systems to collectively acquire knowledge from various agents and environmental feedback, thereby enhancing AD in complex network architectures [<xref ref-type="bibr" rid="ref-108">108</xref>]. Deep Sarsa and hybrid approaches are indicative of DRL algorithms that can be applied in the real world to detect network problems, even when attackers attempt to conceal them [<xref ref-type="bibr" rid="ref-109">109</xref>,<xref ref-type="bibr" rid="ref-110">110</xref>].</p>
<p>When used in conjunction with ARL, Moving Target Defense (MTD) aims to make it challenging for intruders to access a network, thereby complicating their ability to navigate it [<xref ref-type="bibr" rid="ref-111">111</xref>]. Research has identified the open security weaknesses and potential challenges associated with RL training, emphasizing the requirement for flexible side effects [<xref ref-type="bibr" rid="ref-110">110</xref>,<xref ref-type="bibr" rid="ref-112">112</xref>]. ARL enhances conventional RL by incorporating adversarial dynamics that simulate hostile or uncertain environments. This makes the system more robust and adaptable. In reference [<xref ref-type="bibr" rid="ref-113">113</xref>], the authors analyze cascaded fuzzy reward methodologies for robotic path planning in textiles. ARL could improve frameworks by training agents to operate under adversarial perturbations or dynamic constraints, thereby augmenting resilience in practical applications. Likewise, the research in [<xref ref-type="bibr" rid="ref-114">114</xref>] demonstrates how RL can mitigate DoS attacks in smart grids. By treating attackers as adversarial agents, the ARL approach may enhance this phenomenon. This informs defenders on how to respond when threats evolve. However, a DRL is employed through various functions of IDS in [<xref ref-type="bibr" rid="ref-115">115</xref>]. Furthermore, the train DRL utilizing the NSL-KDD dataset further evaluates the performance with several adversarial attacks. These add-ons make ARL a valuable tool for creating intelligent systems that function effectively, are secure, and can effectively manage issues even in the presence of threats. The ARL-NIDS extensions aim to enhance flexibility, resilience, and operational effectiveness in response to advanced cyber threats. By integrating unfavorable training, approaches for detecting collaboration with various agents, robust target rescue, and refined deviations, ARL-NIDS can achieve a height of active danger restriction. These extensions strengthen system performance and provide a framework for future research on AI-driven cybersecurity solutions.</p>
<p><list list-type="simple">
<list-item><label><bold>D</bold>.</label><p><bold>ARL-NIDS Datasets</bold></p></list-item>
</list></p>
<p>Regarding the widespread improvements of IDS, which potentially impact the growth of RL algorithms in the cybersecurity sector. Traditional IDS strategies often struggle to manage the evolving and adaptive characteristics of contemporary cyberattacks. DRL is a promising approach to enhancing the IDS because it can continually learn and adapt. In [<xref ref-type="bibr" rid="ref-116">116</xref>], the author examines how side effects can compromise DRL-based IDS agents by introducing micro-disorders in the Fast Gradient Sign Method (FGSM) and Basic Iterative Method (BIM) network traffic. Their findings highlight the fragility of the DRL model when exposed to unfortunate examples and emphasize the need for a dataset that includes such clear patterns to evaluate the strength of ID. In contrast, the study [<xref ref-type="bibr" rid="ref-117">117</xref>] proposes an active defense mechanism that utilizes the Arl-promoted honeypot system integrated into the industrial control network. Their model simulates unfavorable conditions within an MDP structure, enabling the agent to learn the behavior of complex attacks and enhance accuracy against DDO&#x2019;s variants, such as NetBIOS and LDAP. The system performs well on unbalanced data sets, suggesting that synthetic and unfavorably rich data is required to train and validate such models. However, DRL-based IDS is highly efficient and heavily depends on the quality and properties of the dataset used for training and evaluation. Side effects for AD and NIDS, as well as recent studies in DRL, have emphasized the significant role of diverse datasets in evaluating the strength and adaptability of the algorithm.</p>
<p>Studies, such as the one by [<xref ref-type="bibr" rid="ref-118">118</xref>], stressed the importance of a large-scale, diverse dataset for benchmarking ARL models in real-world situations. Both the data and the traffic that need to be analyzed are crucial to the significance of this subsection. <xref ref-type="table" rid="table-10">Table 10</xref> lists and compares the most recent datasets used, along with various basic parameters, such as the type of traffic, the nature of the data, the tasks associated with them, and additional information, to enhance the utilization of datasets.</p>
<table-wrap id="table-10">
<label>Table 10</label>
<caption>
<title>Overall datasets comparison is employed over the last years, based on past studies</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th align="center" rowspan="2">Ref.</th>
<th align="center" rowspan="2">Dataset publicly?</th>
<th align="center" rowspan="2">Dataset</th>
<th align="center" rowspan="2">Nature of data</th>
<th align="center" colspan="2">Nature of traffic</th>
<th align="center" rowspan="2">Labeled dataset</th>
<th align="center" rowspan="2">Balanced dataset</th>
<th align="center" rowspan="2"># of Instances</th>
<th align="center" rowspan="2"># of Classes</th>
<th align="center" rowspan="2">Traffic type</th>
<th align="center" rowspan="2">Associated tasks</th>
</tr>
<tr>
<th>Normal</th>
<th>Attack</th>
</tr>
</thead>
<tbody>
<tr>
<td rowspan="2"><bold>[<xref ref-type="bibr" rid="ref-7">7</xref>]</bold></td>
<td>&#x221A;</td>
<td>UNSW-NB15 dataset</td>
<td>Network packet</td>
<td>&#x221A;</td>
<td>&#x221A;</td>
<td>&#x221A;</td>
<td>&#x00D7;</td>
<td>2,540,044 instances</td>
<td>10</td>
<td>Real</td>
<td rowspan="2">Classification</td>
</tr>
<tr>
<td>&#x221A;</td>
<td>Unicauca dataset</td>
<td>IoT botnet attack traces</td>
<td>&#x221A;</td>
<td>&#x221A;</td>
<td>&#x221A;</td>
<td>&#x00D7;</td>
<td>3,577,296 instances</td>
<td>75</td>
<td>Emulated</td>

</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-18">18</xref>]</bold></td>
<td>&#x221A;</td>
<td>TON-IoT-2020</td>
<td>Heterogeneous nature</td>
<td>&#x221A;</td>
<td>&#x221A;</td>
<td>&#x221A;</td>
<td>&#x00D7;</td>
<td>3,435,084 instances</td>
<td>9</td>
<td>Emulated &#x0026; Real</td>
<td>Classification &#x0026; clustering</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-29">29</xref>]</bold></td>
<td>&#x221A;</td>
<td>Kitsune Dataset</td>
<td>Network traffic AD</td>
<td>&#x221A;</td>
<td>&#x221A;</td>
<td>&#x221A;</td>
<td>&#x00D7;</td>
<td>27,170,754 instances</td>
<td>10</td>
<td>Real</td>
<td>Classification, clustering, causal-discovery</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-42">42</xref>]</bold></td>
<td>&#x221A;</td>
<td>IoTID20 dataset.</td>
<td>IoT traffic</td>
<td>&#x221A;</td>
<td>&#x221A;</td>
<td>&#x221A;</td>
<td>&#x00D7;</td>
<td>625,783 instances</td>
<td>9</td>
<td>Real</td>
<td rowspan="3">Classification</td>
</tr>
<tr>
<td rowspan="2"><bold>[<xref ref-type="bibr" rid="ref-53">53</xref>]</bold></td>
<td>&#x221A;</td>
<td>AWID datasets</td>
<td>Wireless network traffic</td>
<td>&#x221A;</td>
<td>&#x221A;</td>
<td>&#x221A;</td>
<td>&#x00D7;</td>
<td>210,900,113 instances</td>
<td>20</td>
<td>Real</td>

</tr>
<tr>

<td>&#x221A;</td>
<td>NQTT-20</td>
<td>MQTT-specific network traffic</td>
<td>&#x221A;</td>
<td>&#x221A;</td>
<td>&#x221A;</td>
<td>&#x00D7;</td>
<td>358,000 instances</td>
<td>5</td>
<td>Emulated</td>

</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-66">66</xref>]</bold></td>
<td>&#x221A;</td>
<td>CIC-IDS-2017 dataset</td>
<td>Network intrusion flows</td>
<td>&#x221A;</td>
<td>&#x221A;</td>
<td>&#x221A;</td>
<td>&#x00D7;</td>
<td>2,830,540 instances</td>
<td>15</td>
<td rowspan="5">Emulated</td>
<td>Classification</td>
</tr>
<tr>
<td rowspan="2"><bold>[<xref ref-type="bibr" rid="ref-68">68</xref>]</bold></td>
<td>&#x221A;</td>
<td>CSE-CIC-IDS201dataset.</td>
<td>Realistic multi-attack traffic</td>
<td>&#x221A;</td>
<td>&#x221A;</td>
<td>&#x221A;</td>
<td>&#x00D7;</td>
<td>16,233,002 instances</td>
<td>14</td>
<td></td>

</tr>
<tr>
<td></td>
<td>&#x221A;</td>
<td>NSL-KDD dataset.</td>
<td>Captured From Real-World Networks</td>
<td>&#x221A;</td>
<td>&#x221A;</td>
<td>&#x221A;</td>
<td>&#x00D7;</td>
<td>148,517 instances</td>
<td>5</td>

</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-71">71</xref>]</bold></td>
<td>&#x221A;</td>
<td>DS2OS dataset</td>
<td>IoT smart service communications</td>
<td>&#x221A;</td>
<td>&#x221A;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>357,952 instances</td>
<td>8</td>
<td></td>

</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-109">109</xref>]</bold></td>
<td>&#x221A;</td>
<td>NSL-KDD dataset</td>
<td>Captured From Real-World Networks</td>
<td>&#x221A;</td>
<td>&#x221A;</td>
<td>&#x221A;</td>
<td>&#x00D7;</td>
<td>148,517 instances</td>
<td>5</td>
<td></td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-112">112</xref>]</bold></td>
<td>&#x221A;</td>
<td>UNSW-NB15</td>
<td>Network packet</td>
<td>&#x221A;</td>
<td>&#x221A;</td>
<td>&#x221A;</td>
<td>&#x00D7;</td>
<td>2,540,044 instances</td>
<td>10</td>
<td>Real</td>
<td></td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-115">115</xref>]</bold></td>
<td>&#x221A;</td>
<td>NSL-KDD Dataset</td>
<td>Captured From Real-World Networks</td>
<td>&#x221A;</td>
<td>&#x221A;</td>
<td>&#x221A;</td>
<td>&#x00D7;</td>
<td>148,517 instances</td>
<td>5</td>
<td>Emulated</td>
<td></td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-116">116</xref>]</bold></td>
<td>&#x221A;</td>
<td>CIC-DDoS-2019 dataset</td>
<td>DoS attack traffic</td>
<td>&#x221A;</td>
<td>&#x221A;</td>
<td>&#x221A;</td>
<td>&#x00D7;</td>
<td>12,794,627 instances</td>
<td>13</td>
<td>Real</td>
<td></td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-119">119</xref>]</bold></td>
<td>&#x221A;</td>
<td>NSL-KDD dataset</td>
<td>Captured From Real-World Networks</td>
<td>&#x221A;</td>
<td>&#x221A;</td>
<td>&#x221A;</td>
<td>&#x00D7;</td>
<td>148,517 instances</td>
<td>5</td>
<td>Emulated</td>
<td></td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-122">122</xref>]</bold></td>
<td>&#x221A;</td>
<td>UNSW-NB15 dataset</td>
<td>Network packet</td>
<td>&#x221A;</td>
<td>&#x221A;</td>
<td>&#x221A;</td>
<td>&#x00D7;</td>
<td>2,540,044 instances</td>
<td>10</td>
<td>Real</td>
<td></td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-123">123</xref>]</bold></td>
<td>&#x221A;</td>
<td>CIC-IDS-2017 dataset</td>
<td>Network intrusion flows</td>
<td>&#x221A;</td>
<td>&#x221A;</td>
<td>&#x221A;</td>
<td>&#x00D7;</td>
<td>2830540 instances</td>
<td>15</td>
<td>Emulated</td>
<td></td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-133">133</xref>]</bold></td>
<td>&#x221A;</td>
<td>MovieLens dataset</td>
<td>User movie rating preferences</td>
<td>No traffic</td>
<td>&#x221A;</td>
<td>&#x00D7;</td>
<td>25 million ratings instances.</td>
<td>20K&#x002B; movies</td>
<td>Real</td>
<td>Prediction</td>
<td/>
</tr>
<tr>
<td></td>
<td>&#x221A;</td>
<td>Last.fm dataset</td>
<td>Music listening behavior logs</td>
<td></td>
<td></td>
<td>&#x221A;</td>
<td>&#x00D7;</td>
<td>360K instances.</td>
<td>N/A</td>
<td>Real</td>
<td>User behavior analysis</td>
</tr>
<tr>
<td></td>
<td>&#x221A;</td>
<td>Yelp dataset</td>
<td>Business reviews with ratings</td>
<td></td>
<td></td>
<td>&#x221A;</td>
<td>&#x00D7;</td>
<td>8M reviews</td>
<td>N/A</td>
<td>Real</td>
<td>Classification/<break/>Prediction</td>
</tr>
<tr>
<td></td>
<td>&#x221A;</td>
<td>Taobao dataset</td>
<td>E-commerce user transaction records</td>
<td></td>
<td></td>
<td>&#x221A;</td>
<td>&#x00D7;</td>
<td>100M interactions</td>
<td>N/A</td>
<td>Real</td>
<td>Recommendation/<break/>Prediction</td>
</tr>
<tr>
<td></td>
<td>&#x221A;</td>
<td>RecSys15 dataset</td>
<td>Session-based clickstream interactions</td>
<td></td>
<td></td>
<td>&#x221A;</td>
<td>&#x00D7;</td>
<td>7M sessions</td>
<td>N/A</td>
<td>Real</td>
<td>Recommendation</td>
</tr>
<tr>
<td></td>
<td>&#x221A;</td>
<td>Ant Financial News dataset</td>
<td>Financial news text articles</td>
<td></td>
<td></td>
<td>&#x221A;</td>
<td>&#x00D7;</td>
<td>500k articles</td>
<td>N/A</td>
<td>Real</td>
<td>Classification/<break/>Prediction</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-134">134</xref>]</bold></td>
<td>&#x221A;</td>
<td>Drebin datasets</td>
<td>Android malware application features</td>
<td>&#x221A;</td>
<td>&#x00D7;</td>
<td>&#x221A;</td>
<td>&#x00D7;</td>
<td>5.5k apps</td>
<td>2</td>
<td>Real</td>
<td align="center" rowspan="2">Classification</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-135">135</xref>]</bold></td>
<td>&#x221A;</td>
<td>Kinetics dataset</td>
<td>Human action video clips</td>
<td align="center" colspan="2">No traffic</td>
<td>&#x221A;</td>
<td>&#x00D7;</td>
<td>650k video clips</td>
<td>600</td>
<td>Real</td>

</tr>
</tbody>
</table>
</table-wrap>
<p>Wherein [<xref ref-type="bibr" rid="ref-119">119</xref>], authors develop a DRL-NIDS system based on MDP to enhance the robustness of DRL-IDS for stochastic games via employing the NSL-KDD dataset. Furthermore, the study [<xref ref-type="bibr" rid="ref-120">120</xref>] explored FPGA-accelerated decentralized RL in UAV networks, where customized UAV traffic datasets were crucial for validating AD. Additionally, contributors in [<xref ref-type="bibr" rid="ref-121">121</xref>] utilize multi-attribute monitoring datasets to evaluate unsupervised, reward-based AD.</p>
<p>Moreover, the study [<xref ref-type="bibr" rid="ref-122">122</xref>] explores the DRL model for the edge computing paradigm, based on IDS packet sampling within network traffic, utilizing two datasets: UNSW-NB15 and CIC-IDS-2017. Furthermore, a lightweight IDS for UAV simulation in [<xref ref-type="bibr" rid="ref-123">123</xref>] based on DRL has been developed to create a robust system for preventing and protecting UAVs and IoT devices from malicious activities, as well as cyberattacks. In another reference, authors via [<xref ref-type="bibr" rid="ref-124">124</xref>] examine adversarial ML for cybersecurity resilience and network security enhancement to identify security and network challenges, gaps, and limitations. However, the research findings suggest that future adversarial ML processes should utilize various robust algorithms to enhance the accuracy and effectiveness of the model.</p>
<p>Regarding Software-Defined Networks (SDNs), the study [<xref ref-type="bibr" rid="ref-125">125</xref>] demonstrated the importance of traffic sampling datasets for scaling ARL models. However, broader perspectives on hierarchical RL [<xref ref-type="bibr" rid="ref-126">126</xref>,<xref ref-type="bibr" rid="ref-127">127</xref>] have further revealed dataset-driven exploration of intrinsic options in multi-layered environments. Beyond networking, ARL has been applied in financial systems [<xref ref-type="bibr" rid="ref-128">128</xref>] and robotics in [<xref ref-type="bibr" rid="ref-129">129</xref>], which relied heavily on simulated and real-world datasets to ensure adversarial robustness. Recent studies, such as [<xref ref-type="bibr" rid="ref-130">130</xref>], that examine backdoor detection using DRL, emphasize the ease with which datasets can be poisoned. On the other hand, classic studies, such as those in [<xref ref-type="bibr" rid="ref-131">131</xref>,<xref ref-type="bibr" rid="ref-132">132</xref>], emphasize the importance of utilizing cybersecurity-specific simulation datasets for the validation of adversarial RL. In summary, these studies suggest that the choice and variety of datasets remain crucial for the development of ARL-based NIDS, facilitating a smoother transition from controlled settings to real-world use. Thus, the study [<xref ref-type="bibr" rid="ref-133">133</xref>] explores the GAN user model based on RL and a recommendation system. At the same time, biometric authentication is proposed on [<xref ref-type="bibr" rid="ref-134">134</xref>] for a mobile environment. Thus, the study [<xref ref-type="bibr" rid="ref-135">135</xref>] discusses the significant role of biometric verification systems, strengths, limitations, weaknesses, and mitigation strategies. Although a domain ARL is discussed based on MDP and Zero-Shot RL to train across different visual backgrounds. The primary goal of the study is to improve the generalization of DRL agents.</p>
<p>In short, the datasets referenced within this survey study, including NSL-KDD, CICIDS2017, UNSW-NB15, and IoTID20, are among the most extensively utilized benchmarks in the NIDS and RL literature [<xref ref-type="bibr" rid="ref-117">117</xref>,<xref ref-type="bibr" rid="ref-121">121</xref>,<xref ref-type="bibr" rid="ref-136">136</xref>]. Their adoption across various studies underscores the representativeness of ARL-NIDS models in evaluating them under controlled experimental conditions. However, these datasets also represent that multiple limitations influence ARL performance. For instance, NSL-KDD and UNSW-NB15 often exhibit data imbalance and outdated attack profiles, leading to biased learning in adversarial environments. Similarly, CICIDS2017 and IoTID20 offer more modern and diversified network traffic, but their limited real-time adaptability constrains the transferability of ARL models to dynamic IoT or edge computing scenarios. Recognizing these limitations is critical to ensure the robustness and generalization of ARL-NIDS frameworks. Hence, future research should focus on developing hybrid datasets that integrate synthetic adversarial samples with real-world network traffic to enhance the robustness and adaptability of evaluation models across heterogeneous cybersecurity environments.</p>
<p><list list-type="simple">
<list-item><label><bold>E</bold>.</label><p><bold>ARL-NIDS Risks</bold></p></list-item>
</list></p>
<p>ARL has recently emerged as a powerful technique to improve the adaptability and intelligence of the NIDs. ARL enhances AD by enabling agents from the dynamic environment, improving traffic analysis, and adaptive defense strategies. To demonstrate how adversarial learning might compromise cybersecurity resilience by corrupting policies and manipulating environments in [<xref ref-type="bibr" rid="ref-124">124</xref>]. This research&#x2019;s findings highlight the need for employing AT, reward shaping, and uncertainty modeling as robust defensive mechanisms in ARL. This will help ensure that agents behave securely and reliably in real-world deployment situations. However, the integration also exposes NIDs to develop novel cybersecurity risks such as unfavorable attacks, system weaknesses, and cyber threats. <xref ref-type="table" rid="table-11">Table 11</xref> summarizes the latest cybersecurity risks in ARL-NIDS. These risks challenge identity efficiency and question the strength, scalability, and reliability of ARL-NIDs in the deployment of the real world [<xref ref-type="bibr" rid="ref-134">134</xref>]. The integration of ARL into NIDS introduces significant risks that stem from negative mobility, system complexity, and a mature cyber environment. An important anxiety domain is the vulnerability of the ARL model for optimization risk, where unstable reinforcement learning can lead to unstable guidelines and domain transfer errors [<xref ref-type="bibr" rid="ref-135">135</xref>]. Additionally, the study [<xref ref-type="bibr" rid="ref-136">136</xref>] highlights the vulnerability of IoT networks to sophisticated cyber threats due to their inherent limitations and limited computational resources. Using ELM, the author shows increased ID accuracy in high-dimensional and unbalanced datasets. Research on IoT emphasizes the need for lightweight, scalable detection models to mitigate the risk of unseen infiltration in the IoT environment and to minimize real-time hazards. In addition, ARL-based systems face challenges from adaptive cyber-attacks that exploit the weaknesses of RL-operated reactions, making them susceptible to reward manipulation and misleading political updates [<xref ref-type="bibr" rid="ref-137">137</xref>]. However, <xref ref-type="fig" rid="fig-16">Fig. 16</xref> shows the cybersecurity triad (CIA), which identifies three main aspects (confidentiality, integrity, and availability).</p>
<table-wrap id="table-11">
<label>Table 11</label>
<caption>
<title>Latest cybersecurity risks</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th align="center" colspan="2">Risk</th>
<th>Impact</th>
<th>Ref.</th>
</tr>
</thead>
<tbody>
<tr>
<td><bold>Threat</bold></td>
<td>Data leakage</td>
<td>Leakage of sensitive data during training and evaluation due to weak ARL defense models.</td>
<td>[<xref ref-type="bibr" rid="ref-135">135</xref>]</td>
</tr>
<tr>
<td></td>
<td>Evolving threat models</td>
<td>Adversarial multi-agent adaptation creates instability in NIDS policies.</td>
<td>[<xref ref-type="bibr" rid="ref-138">138</xref>,<xref ref-type="bibr" rid="ref-142">142</xref>]</td>
</tr>
<tr>
<td></td>
<td>Adversarial simulation threats</td>
<td>Synthetic adversarial environments mislead RL training,</td>
<td>[<xref ref-type="bibr" rid="ref-143">143</xref>]</td>
</tr>
<tr>
<td></td>
<td>Adaptive threat evolution</td>
<td>Continuous adaptation of malicious traffic reduces the robustness of RL policies,</td>
<td>[<xref ref-type="bibr" rid="ref-139">139</xref>,<xref ref-type="bibr" rid="ref-144">144</xref>]</td>
</tr>
<tr>
<td></td>
<td>Supply-chain &#x0026; dataset poisoning</td>
<td>Poisoned training data or generated flows corrupt model behavior &#x0026; create persistent blind spots.</td>
<td>[<xref ref-type="bibr" rid="ref-7">7</xref>,<xref ref-type="bibr" rid="ref-41">41</xref>]</td>
</tr>
<tr>
<td></td>
<td>Insider threats &#x0026; misconfiguration</td>
<td>Trusted insiders or misconfigured RL rewards enable subtle long-term exploitation.</td>
<td>[<xref ref-type="bibr" rid="ref-26">26</xref>,<xref ref-type="bibr" rid="ref-99">99</xref>]</td>
</tr>
<tr>
<td></td>
<td>Regulatory &#x0026; compliance risk</td>
<td>Data use and automated decision-making can create legal/regulatory exposure if mishandled.</td>
<td>[<xref ref-type="bibr" rid="ref-50">50</xref>,<xref ref-type="bibr" rid="ref-53">53</xref>]</td>
</tr>
<tr>
<td><bold>Attack</bold></td>
<td>Reward manipulation</td>
<td>Attackers inject misleading feedback into ARL to degrade detection accuracy,</td>
<td>[<xref ref-type="bibr" rid="ref-137">137</xref>,<xref ref-type="bibr" rid="ref-144">144</xref>]</td>
</tr>
<tr>
<td></td>
<td>Evasion attacks</td>
<td>Crafted adversarial traffic bypasses IDS by exploiting weak policy generalization,</td>
<td>[<xref ref-type="bibr" rid="ref-138">138</xref>,<xref ref-type="bibr" rid="ref-141">141</xref>]</td>
</tr>
<tr>
<td></td>
<td>DoS attacks</td>
<td>Exploitation of stochastic vulnerabilities in ARL leads to DoS,</td>
<td>[<xref ref-type="bibr" rid="ref-145">145</xref>]</td>
</tr>
<tr>
<td></td>
<td>Policy poisoning/training-time attacks</td>
<td>Training-time poisoning, also known as environment poisoning, can produce faulty policies (backdoors).</td>
<td>[<xref ref-type="bibr" rid="ref-61">61</xref>,<xref ref-type="bibr" rid="ref-130">130</xref>]</td>
</tr>
<tr>
<td></td>
<td>Adv. policy attacks (multi-agent)</td>
<td>Malicious agents craft adversarial policies that destabilize multi-agent detection systems.</td>
<td>[<xref ref-type="bibr" rid="ref-59">59</xref>]</td>
</tr>
<tr>
<td></td>
<td>Adversarial evasion (test-time)</td>
<td>Small, crafted perturbations or mutated live traffic let malicious flows bypass detectors.</td>
<td>[<xref ref-type="bibr" rid="ref-115">115</xref>]</td>
</tr>
<tr>
<td></td>
<td>Snooping/observation attacks</td>
<td>Attackers observe agent behavior to infer policies and craft targeted exploits.</td>
<td>[<xref ref-type="bibr" rid="ref-102">102</xref>,<xref ref-type="bibr" rid="ref-104">104</xref>]</td>
</tr>
<tr>
<td><bold>Vulnerability</bold></td>
<td>Training Instability</td>
<td>ARL models fail to generalize across domains, making them vulnerable to adversarial drift.</td>
<td>[<xref ref-type="bibr" rid="ref-135">135</xref>,<xref ref-type="bibr" rid="ref-145">145</xref>]</td>
</tr>
<tr>
<td></td>
<td>Adversarial perturbation weakness</td>
<td>Small, crafted perturbations severely disrupt ARL decision-making.</td>
<td>[<xref ref-type="bibr" rid="ref-138">138</xref>,<xref ref-type="bibr" rid="ref-141">141</xref>]</td>
</tr>
<tr>
<td></td>
<td>Incomplete defense mechanisms</td>
<td>The lack of layered defense strategies in RL-based systems leaves IDS vulnerable to compromise.</td>
<td>[<xref ref-type="bibr" rid="ref-140">140</xref>,<xref ref-type="bibr" rid="ref-142">142</xref>]</td>
</tr>
<tr>
<td></td>
<td>Poor generalization/domain shift</td>
<td>Models trained on data from a specific area, such as simulations or emulations, often do not perform well in different networks.</td>
<td>[<xref ref-type="bibr" rid="ref-60">60</xref>,<xref ref-type="bibr" rid="ref-135">135</xref>]</td>
</tr>
<tr>
<td></td>
<td>Lack of explainability/interpretability</td>
<td>Opaque policies hinder incident investigation and human oversight.</td>
<td>[<xref ref-type="bibr" rid="ref-101">101</xref>,<xref ref-type="bibr" rid="ref-110">110</xref>]</td>
</tr>
<tr>
<td></td>
<td>Scalability &#x0026; computational cost</td>
<td>High computation/latency requirements impede real-time deployment on edge or constrained devices.</td>
<td>[<xref ref-type="bibr" rid="ref-55">55</xref>,<xref ref-type="bibr" rid="ref-120">120</xref>,<xref ref-type="bibr" rid="ref-122">122</xref>,<xref ref-type="bibr" rid="ref-123">123</xref>]</td>
</tr>
<tr>
<td></td>
<td>Synchronization/coordination failures</td>
<td>In multi-agent or distributed deployments, miscoordination produces false alarms or gaps.</td>
<td>[<xref ref-type="bibr" rid="ref-14">14</xref>,<xref ref-type="bibr" rid="ref-34">34</xref>,<xref ref-type="bibr" rid="ref-67">67</xref>,<xref ref-type="bibr" rid="ref-118">118</xref>]</td>
</tr>
<tr>
<td></td>
<td>Dataset imbalance &#x0026; labeling bias</td>
<td>Imbalanced or mislabeled datasets produce biased detectors with high false positive/negative rates.</td>
<td>[<xref ref-type="bibr" rid="ref-11">11</xref>,<xref ref-type="bibr" rid="ref-43">43</xref>,<xref ref-type="bibr" rid="ref-46">46</xref>,<xref ref-type="bibr" rid="ref-80">80</xref>]</td>
</tr>
<tr>
<td></td>
<td>Backdoor &#x0026; hidden-trigger policies</td>
<td>Subtle backdoors enable attackers to trigger undetected behavior when conditions are met.</td>
<td>[<xref ref-type="bibr" rid="ref-99">99</xref>,<xref ref-type="bibr" rid="ref-130">130</xref>]</td>
</tr>
<tr>
<td></td>
<td>Sensor/telemetry spoofing (data integrity)</td>
<td>Falsified telemetry (including flow features and timestamps) corrupts state observations for RL agents.</td>
<td>[<xref ref-type="bibr" rid="ref-4">4</xref>,<xref ref-type="bibr" rid="ref-41">41</xref>,<xref ref-type="bibr" rid="ref-104">104</xref>]</td>
</tr>
<tr>
<td></td>
<td>Firmware/platform security gaps</td>
<td>Device firmware and edge node weaknesses allow adversaries to tamper with detection points.</td>
<td>[<xref ref-type="bibr" rid="ref-20">20</xref>,<xref ref-type="bibr" rid="ref-55">55</xref>]</td>
</tr>
</tbody>
</table>
</table-wrap><fig id="fig-16">
<label>Figure 16</label>
<caption>
<title>Cybersecurity triad (CIA): confidentiality, integrity, and availability</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_73540-fig-16.tif"/>
</fig>
<p><bold>Vulnerabilities</bold> are flaws or weaknesses in a system&#x2019;s design, implementation, or configuration that could be exploited. It is passive; it does not cause harm on its own, but it opens the door. ARL-NIDS systems are vulnerable to issues related to model interpretability, scalability, and resource consumption. Limitations in explainability hinder the justification of judgments; however, decentralized or distributed ARL presents synchronization issues and increased processing costs [<xref ref-type="bibr" rid="ref-135">135</xref>].</p>
<p><bold>Threats:</bold> A threat is any potential cause of harm that can exploit a vulnerability. It can occur due to human intervention (such as hacking), natural causes (like earthquakes), or random events (like a misunderstanding). ARL-NIDS systems face external dangers such as zero-day attacks, botnets, and insider threats. The dynamic training process can also be manipulated to induce biased learning or introduce poisoned data streams, rendering the agent unable to normalize [<xref ref-type="bibr" rid="ref-121">121</xref>].</p>
<p><bold>Attacks:</bold> An attack refers to the actual action taken by a malicious actor to exploit a vulnerability. It is conscious and active, aligning with the principles of privacy, integrity, and availability. ARL agents are unsafe for learning attacks for unfavorable reinforcement, including: (1) <bold>Examination:</bold> such as misleading the agent during training, (2) <bold>Exploration:</bold> that misleads the agent during training, (3) <bold>Evasion:</bold> where adversaries craft packets or flows to bypass detection, (4) <bold>Poisoning:</bold> corrupt training datasets or reward signals, and (5) <bold>Backdoor:</bold> that embeds hidden malicious policies [<xref ref-type="bibr" rid="ref-138">138</xref>].</p>
<p>This survey study contributes to the understanding of ARL in cybersecurity by examining ARL-NIDS within the IoT environment, as highlighted by various researchers in related past studies. Assorted studies attempted to mitigate adversarial reward manipulation by introducing adaptive or hierarchical reward functions to stabilize learning dynamics under adversarial perturbations [<xref ref-type="bibr" rid="ref-111">111</xref>,<xref ref-type="bibr" rid="ref-129">129</xref>]. However, others ignored the issue, assuming static environments that do not reflect dynamic attack surfaces. The transferability challenge of how ARL models trained on one dataset generalize to another was only partially explored, primarily in simulation-based research with limited real-world validation [<xref ref-type="bibr" rid="ref-135">135</xref>]. Likewise, few studies have analyzed the robust trade-offs between model adaptability and computational complexity, despite their importance in large-scale or resource-constrained systems, such as IoT and UAV networks [<xref ref-type="bibr" rid="ref-123">123</xref>,<xref ref-type="bibr" rid="ref-139">139</xref>]. This survey identifies these gaps as critical future research directions, encouraging empirical benchmarking of ARL models across heterogeneous datasets and environments.</p>
<p>However, ref. [<xref ref-type="bibr" rid="ref-139">139</xref>] proposed an AI-competent adaptive cybersecurity system that suggested provoking RL to respond to new threats in real-time. Research indicates the limitations of static rule-based systems in managing zero-day attacks and describes agents that learn through reinforcement to achieve effective security methods. The most significant threat addressed is that the old systems are slow and inflexible, which means they may not quickly adapt to new attack vectors. In addition, ref. [<xref ref-type="bibr" rid="ref-140">140</xref>] intensive examination of ML methods to protect the blockchain network, which postpones the dangers associated with manipulation in consensus, weaknesses in smart contracts, and data integrity breaches. Research examines DL, a Simbel approach, and AD to enhance risk and resistance, particularly in relation to the side effects and internal threats of decentralized systems.</p>
<p>In addition, investigations in [<xref ref-type="bibr" rid="ref-141">141</xref>] examine the surface of ARL systems, which explains how unfavorable ML can compromise model integrity through theft, poisoning, and invasion attacks. Authors emphasize the risk of higher dependence on opaque ML models in critical training infrastructure and for strong counterparts, as AI can secretly reduce exploitation. In another context of reviewing RL applications in cybersecurity [<xref ref-type="bibr" rid="ref-142">142</xref>], the author identifies significant risks associated with penetration tests, IDS, and malware reactions. They also insist on the challenge of trial efficiency and strength of the model in an adverse environment, suggesting the TRL-based system should be carefully set to avoid utilizing adaptive threats. In addition, the study [<xref ref-type="bibr" rid="ref-143">143</xref>] DRL uses an unfavorable cyber-attack simulation and explains how RL agents can learn optimal attack strategies against network defense. The research exposes the risk of intelligent adversaries exploiting system weaknesses through learned behaviors and proposes RL-based defensive agents to counteract these evolving threats.</p>
<p>During [<xref ref-type="bibr" rid="ref-144">144</xref>], a framework was introduced for adaptive RL for automated incident response, targeting the risk of delayed or ineffective reactions to cyber incidents, by modeling the response strategies such as RL problems, the system detection and scalability accuracy in the dynamic danger scenario improves, while addressing the important requirement for the autonomous defense mechanisms, which reduces FP. Another article [<xref ref-type="bibr" rid="ref-145">145</xref>] discovered DRLs to detect advanced risk, focusing on the risk of negative theft in harmful software and infiltration scenarios. The author demonstrates how DRL can enhance the accuracy and flexibility of detection, while also mitigating overfitting and unfavorable manipulation, which can compromise the system&#x2019;s reliability under real-world conditions. However, they suggest [<xref ref-type="bibr" rid="ref-146">146</xref>] a hybrid security structure to ensure web applications for blockchain are adaptable and sufficient for learning. The study addresses risks such as SQL injection attacks, model poisoning, and IoT falsification. It demonstrates how decentralized consensus and unwanted training systems can jointly mitigate these risks while maintaining transparency and accountability.</p>
<p>Although ARL remains effective in enhancing decision-making and flexibility, it is nevertheless quite susceptible to a variety of complex attacks that target the operational and training phases. Within the most common types of attacks is poisoning, in which the adversary attempts to damage or harm the policy optimization process by manipulating the learning environment or the reward function. In secure systems, such as autonomous vehicles and cybersecurity frameworks, these manipulations have the potential to alter agent learning trajectories and weaken the robustness of policies. Another risk to ARL models is an evasion attack, which tricks the trained agent into generating bad, poor decisions without altering the environment by subtly modifying input states or observations during testing. The agent may also be overfit to hostile conditions or rewards by an exploration manipulation attack, which exploits the equilibrium between both. Another study has shown the seriousness of these risks in the smart transportation and cybersecurity industries. While in [<xref ref-type="bibr" rid="ref-147">147</xref>], the risks of adversarial interactions in multi-agent driving environments are highlighted. Where manipulated agents may modify, the results would be an insecure lane change. Wherein [<xref ref-type="bibr" rid="ref-148">148</xref>] authors examined a comprehensive survey about implementing ML-IDS to detect adversarial attacks. Furthermore, implement robust security controls to enhance the defense systems against adversarial attacks. The Deep PackGen introduced a DRL framework to generate unfavorable network packages, revealing how attackers can create an ethnic payload that bypasses IDS within [<xref ref-type="bibr" rid="ref-149">149</xref>]. The study identifies effective defense strategies and customized approaches to combat sophisticated stolen techniques, and assesses the risks posed by unfavorable samples and lawyers to inform adaptive defense strategies.</p>
<p>In addition, a deep dive into AI and ML in cybersecurity, as outlined in [<xref ref-type="bibr" rid="ref-150">150</xref>], highlights the transformative potential of intelligent systems while cautioning against risks such as data imbalance, adversarial manipulation, and a lack of explainability. The study calls for scalable, transparent, and privacy-preserving models to address the evolving threat landscape and ensure the trustworthy deployment of AI. The early work in [<xref ref-type="bibr" rid="ref-151">151</xref>] investigates how AI integrates ML models to maximize the behavior of agent simulation in the cybersecurity domain. Furthermore, the development strategies that enhance the models and approaches, as well as compare RL and DRL, are discussed. Even though the work in [<xref ref-type="bibr" rid="ref-152">152</xref>] reviews adversarial ML in biometric recognition, exposing vulnerabilities in face and fingerprint systems to spoofing and evasion attacks. The paper underscores the risk of data stationarity assumptions in ML models and advocates for adversarial-aware design to secure biometric authentication systems. Thus, the dissertation in [<xref ref-type="bibr" rid="ref-153">153</xref>] utilizes a chemical plant cybersecurity; the red team&#x2019;s outputs simulate dynamics. The study suggests how the attackers can motivate the plant&#x2019;s shutdown through sensor manipulation, while the protector uses RL and digital twins to predict and reduce the risk. This task highlights the risk of physical cyber-attack disruption and the need for flexible control systems.</p>
<p><list list-type="simple">
<list-item><label><bold>F</bold>.</label><p><bold>RL-NIDS Security Challenges</bold></p></list-item>
</list></p>
<p>Since ARL intelligently revolutionized cybersecurity systems, integrating them into dynamic environments, such as smart networks, IoT networks, and CPS, novel security challenges have been highlighted. The adaptive nature of RL agents makes them robust defense approaches; however, they are vulnerable to sophisticated attacks that exploit their learning mechanisms. Recent literature highlighted increasing concern about conflicting threats, data integrity, and system strengthening, and requires a deep understanding of novel security challenges. This section synthesizes the latest and most significant security challenges facing ARL-NIDS-based security structures, which provide a basis for developing flexible and reliable intelligent systems. <xref ref-type="table" rid="table-12">Table 12</xref> provides a brief overview of the latest ARL-NIDS security challenges. Recent advancements in RL have introduced both opportunities and risks in cybersecurity applications. The most oppressive are unfavorable attacks that manipulate agents&#x2019; behavior, data poisoning that corrupts learning inputs, and rewards that facilitate manipulation, thereby deforming political adaptation. These threats are compounded by the lack of interpretability in RL decisions, making it an obstacle to detecting or explaining anomalous behavior. In addition, RL&#x2019;s distribution introduces resource-composed environments, such as Those Found in Edge and IoT systems, which pose scalability and real-time adaptation problems. As the attackers evolve, the defense mechanism requires RL models with robust, clear, and privacy-preserving features that can withstand unfavorable conditions and maintain operational integrity. Furthermore, <xref ref-type="fig" rid="fig-17">Fig. 17</xref> represents the key ARL-NIDS security issues. As shown in the figure, the security issues are classified into four main categories: system-level threats to integrity, adversarial manipulation, and privacy concerns. RL&#x2019;s developed landscapes in cybersecurity require an active approach to recognize and reduce innovative dangers. The key challenges in <xref ref-type="fig" rid="fig-17">Fig. 17</xref>, which emphasize unfavorable manipulation and privacy risks, highlight the complexity of ensuring the integrity of intelligent systems. These issues require interdisciplinary strategies that combine technical hardness with moral and legal foresight. Future research should focus on developing a flexible RL architecture that is robust, scalable, transparent, and privacy-preserving, thereby ensuring the safe distribution of critical infrastructure.</p>
<table-wrap id="table-12">
<label>Table 12</label>
<caption>
<title>Latest ARL-NIDS security challenges summary</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th>Security challenge</th>
<th>Description</th>
<th>Ref.</th>
</tr>
</thead>
<tbody>
<tr>
<td><bold>Adversarial attacks</bold></td>
<td>Manipulation of inputs to mislead RL agents.</td>
<td>[<xref ref-type="bibr" rid="ref-3">3</xref>,<xref ref-type="bibr" rid="ref-22">22</xref>,<xref ref-type="bibr" rid="ref-37">37</xref>,<xref ref-type="bibr" rid="ref-38">38</xref>,<xref ref-type="bibr" rid="ref-59">59</xref>,<xref ref-type="bibr" rid="ref-95">95</xref>,<xref ref-type="bibr" rid="ref-96">96</xref>,<xref ref-type="bibr" rid="ref-99">99</xref>,<xref ref-type="bibr" rid="ref-102">102</xref>]</td>
</tr>
<tr>
<td><bold>Evasion techniques</bold></td>
<td>Attackers bypass detection by altering behavior or data patterns.</td>
<td>[<xref ref-type="bibr" rid="ref-4">4</xref>,<xref ref-type="bibr" rid="ref-56">56</xref>,<xref ref-type="bibr" rid="ref-61">61</xref>,<xref ref-type="bibr" rid="ref-94">94</xref>,<xref ref-type="bibr" rid="ref-115">115</xref>]</td>
</tr>
<tr>
<td><bold>Data poisoning</bold></td>
<td>Corrupting training data to degrade model performance.</td>
<td>[<xref ref-type="bibr" rid="ref-99">99</xref>,<xref ref-type="bibr" rid="ref-130">130</xref>,<xref ref-type="bibr" rid="ref-141">141</xref>]</td>
</tr>
<tr>
<td><bold>Model robustness</bold></td>
<td>Ensure RL models remain effective under adversarial conditions.</td>
<td>[<xref ref-type="bibr" rid="ref-29">29</xref>,<xref ref-type="bibr" rid="ref-39">39</xref>,<xref ref-type="bibr" rid="ref-62">62</xref>,<xref ref-type="bibr" rid="ref-70">70</xref>,<xref ref-type="bibr" rid="ref-105">105</xref>,<xref ref-type="bibr" rid="ref-129">129</xref>]</td>
</tr>
<tr>
<td><bold>Privacy risks</bold></td>
<td>Exposure of sensitive data during learning or inference.</td>
<td>[<xref ref-type="bibr" rid="ref-78">78</xref>,<xref ref-type="bibr" rid="ref-141">141</xref>]</td>
</tr>
<tr>
<td><bold>Generalization limitations</bold></td>
<td>Deficient performance when RL agents face unseen or dynamic environments.</td>
<td>[<xref ref-type="bibr" rid="ref-46">46</xref>,<xref ref-type="bibr" rid="ref-63">63</xref>,<xref ref-type="bibr" rid="ref-135">135</xref>]</td>
</tr>
<tr>
<td><bold>Reward manipulation</bold></td>
<td>Attackers exploit reward functions to misguide learning.</td>
<td>[<xref ref-type="bibr" rid="ref-64">64</xref>,<xref ref-type="bibr" rid="ref-99">99</xref>,<xref ref-type="bibr" rid="ref-132">132</xref>]</td>
</tr>
<tr>
<td><bold>Interpretability and explainability</bold></td>
<td>Difficulty in understanding RL decisions, hindering trust, and debugging.</td>
<td>[<xref ref-type="bibr" rid="ref-101">101</xref>,<xref ref-type="bibr" rid="ref-110">110</xref>]</td>
</tr>
<tr>
<td><bold>Scalability in multi-agent systems</bold></td>
<td>Coordination and security in large-scale MARL environments.</td>
<td>[<xref ref-type="bibr" rid="ref-21">21</xref>,<xref ref-type="bibr" rid="ref-67">67</xref>,<xref ref-type="bibr" rid="ref-92">92</xref>,<xref ref-type="bibr" rid="ref-118">118</xref>,<xref ref-type="bibr" rid="ref-138">138</xref>]</td>
</tr>
<tr>
<td><bold>Detection of sophisticated threats</bold></td>
<td>Challenges in identifying stealthy or polymorphic malware.</td>
<td>[<xref ref-type="bibr" rid="ref-22">22</xref>,<xref ref-type="bibr" rid="ref-24">24</xref>,<xref ref-type="bibr" rid="ref-73">73</xref>,<xref ref-type="bibr" rid="ref-74">74</xref>]</td>
</tr>
<tr>
<td><bold>Real-time adaptation</bold></td>
<td>Need for fast, adaptive responses to evolving threats.</td>
<td>[<xref ref-type="bibr" rid="ref-6">6</xref>,<xref ref-type="bibr" rid="ref-26">26</xref>,<xref ref-type="bibr" rid="ref-144">144</xref>]</td>
</tr>
<tr>
<td><bold>FP/Negatives in IDS</bold></td>
<td>Balancing sensitivity and specificity in IDS.</td>
<td>[<xref ref-type="bibr" rid="ref-15">15</xref>,<xref ref-type="bibr" rid="ref-42">42</xref>,<xref ref-type="bibr" rid="ref-85">85</xref>]</td>
</tr>
<tr>
<td><bold>Limited training data</bold></td>
<td>Scarcity of labeled data for supervised RL models.</td>
<td>[<xref ref-type="bibr" rid="ref-11">11</xref>,<xref ref-type="bibr" rid="ref-77">77</xref>,<xref ref-type="bibr" rid="ref-80">80</xref>]</td>
</tr>
<tr>
<td><bold>Resource constraints in edge/IoT</bold></td>
<td>Limited computational power affects the deployment and security of RL.</td>
<td>[<xref ref-type="bibr" rid="ref-54">54</xref>,<xref ref-type="bibr" rid="ref-123">123</xref>,<xref ref-type="bibr" rid="ref-136">136</xref>]</td>
</tr>
<tr>
<td><bold>Secure collaboration in FL</bold></td>
<td>Ensuring integrity and privacy in distributed RL training.</td>
<td>[<xref ref-type="bibr" rid="ref-86">86</xref>,<xref ref-type="bibr" rid="ref-138">138</xref>]</td>
</tr>
<tr>
<td><bold>Backdoor and Trojan attacks</bold></td>
<td>Hidden malicious behaviors are embedded during the training process.</td>
<td>[<xref ref-type="bibr" rid="ref-130">130</xref>,<xref ref-type="bibr" rid="ref-141">141</xref>]</td>
</tr>
<tr>
<td><bold>Policy exploitation and manipulation</bold></td>
<td>Attackers reverse-engineer or manipulate learned policies.</td>
<td>[<xref ref-type="bibr" rid="ref-38">38</xref>,<xref ref-type="bibr" rid="ref-103">103</xref>,<xref ref-type="bibr" rid="ref-104">104</xref>]</td>
</tr>
<tr>
<td><bold>Cyber-physical system vulnerabilities</bold></td>
<td>RL-based systems in CPS face unique risks due to physical interactions.</td>
<td>[<xref ref-type="bibr" rid="ref-76">76</xref>,<xref ref-type="bibr" rid="ref-78">78</xref>,<xref ref-type="bibr" rid="ref-81">81</xref>]</td>
</tr>
<tr>
<td><bold>Alert prioritization</bold></td>
<td>Difficulty in ranking threats effectively when under attack.</td>
<td>[<xref ref-type="bibr" rid="ref-66">66</xref>,<xref ref-type="bibr" rid="ref-90">90</xref>]</td>
</tr>
<tr>
<td><bold>Secure feature selection</bold></td>
<td>Ensuring selected features are resilient to adversarial manipulation.</td>
<td>[<xref ref-type="bibr" rid="ref-45">45</xref>,<xref ref-type="bibr" rid="ref-79">79</xref>,<xref ref-type="bibr" rid="ref-93">93</xref>]</td>
</tr>
</tbody>
</table>
</table-wrap><fig id="fig-17">
<label>Figure 17</label>
<caption>
<title>Recent key ARL-NIDS cybersecurity challenges</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_73540-fig-17.tif"/>
</fig>
<p><list list-type="simple">
<list-item><label><bold>G</bold>.</label><p><bold>Overall Comparison</bold></p></list-item>
</list></p>
<p>This section compares the most significant approaches in ARL-NIDS over the past years. The comparison is summarized in <xref ref-type="table" rid="table-13">Table 13</xref>, which outlines the model employed, the dataset used, the strengths, weaknesses, limitations, results, and future directions. Thus, these comparisons are important because they provide a bird&#x2019;s-eye perspective of the models utilized in previous studies, their primary goals, and the purpose of their research, all of which are relevant to our intended strategy, ARL-NIDS. Researchers can utilize the document&#x2019;s strengths and weaknesses, limitations, and future directions to develop innovative studies that leverage ARL via other algorithms and classification, or other associated tasks.</p>
<table-wrap id="table-13">
<label>Table 13</label>
<caption>
<title>Overall comparison</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th>Ref.</th>
<th>Model</th>
<th>Dataset</th>
<th>Strength</th>
<th>Weakness</th>
<th>Limitation</th>
<th>Results</th>
<th>Future direction</th>
</tr>
</thead>
<tbody>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-3">3</xref>]</bold></td>
<td>Adversarial attack analysis</td>
<td>Multiple RL agents</td>
<td>Comprehensive taxonomy of adversarial threats</td>
<td>Lacks empirical validation</td>
<td>Theoretical focus</td>
<td>Framework for attack classification</td>
<td>Empirical testing across RL domains</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-6">6</xref>]</bold></td>
<td>Adversarial RL System</td>
<td>Custom simulation</td>
<td>Adaptive defense in dynamic environments</td>
<td>Limited scalability</td>
<td>Simulated only</td>
<td>Improved threat response</td>
<td>Real-world IoT/CPS deployment</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-10">10</xref>]</bold></td>
<td>DQN-based ARL IDS</td>
<td>Custom NIDS dataset</td>
<td>Combines deep Q-learning with AT</td>
<td>High training time</td>
<td>Limited scalability</td>
<td>Improved detection under adversarial noise</td>
<td>Lightweight ARL for edge devices</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-11">11</xref>]</bold></td>
<td>DQN</td>
<td>NSL-KDD</td>
<td>Manages class imbalance with synthetic oversampling</td>
<td>Computational overhead</td>
<td>May not generalize about unseen attacks</td>
<td>Enhanced AD accuracy</td>
<td>Hybrid oversampling &#x002B; federated RL</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-13">13</xref>]</bold></td>
<td>DRL &#x002B; GAN</td>
<td>IoT malware traces</td>
<td>Evades black-box detectors using GAN-generated features</td>
<td>Vulnerable to transfer attacks</td>
<td>Focused on evasion, not detection</td>
<td>High evasion success rate</td>
<td>Combining with explainable AI</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-17">17</xref>]</bold></td>
<td>Multi-Armed Bandit RL</td>
<td>Smart infrastructure logs</td>
<td>Efficient decision-making under adversarial conditions</td>
<td>Limited interpretability</td>
<td>Narrow domain scope</td>
<td>Improved detection in adversarial settings</td>
<td>MARL for collaborative defense</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-18">18</xref>]</bold></td>
<td>DRL for IoT attack detection</td>
<td>IoT traffic logs</td>
<td>Tailored for IoT cyberattack patterns</td>
<td>Dataset imbalance</td>
<td>Narrow IoT scope</td>
<td>High detection accuracy</td>
<td>Cross-domain generalization</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-27">27</xref>]</bold></td>
<td>DRL-based IDS review</td>
<td>Multiple datasets</td>
<td>Synthesizes DRL applications in IoT security</td>
<td>No experimental results</td>
<td>Survey-based</td>
<td>Identifies key trends and gaps</td>
<td>Benchmarking across IoT platforms</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-51">51</xref>]</bold></td>
<td>Adversarial RL-based IDS</td>
<td>6LoWPAN</td>
<td>Robust to evolving data distributions</td>
<td>High training complexity</td>
<td>Dataset-specific tuning</td>
<td>Effective detection in low-power networks</td>
<td>Federated learning integration</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-56">56</xref>]</bold></td>
<td>RL-based smart grid detector</td>
<td>Smart grid traffic</td>
<td>Counters evasion attacks with adaptive learning</td>
<td>Sensitive to reward design</td>
<td>Limited to the smart grid context</td>
<td>Reduced false negatives</td>
<td>Broader infrastructure application</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-67">67</xref>]</bold></td>
<td>Multi-Agent DRL &#x002B; Generative</td>
<td>Enlarged IoT dataset</td>
<td>Reduces bias and improves agent coordination</td>
<td>Complex agent synchronization</td>
<td>High resource demand</td>
<td>Enhanced multi-agent IDS</td>
<td>Decentralized agent optimization</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-73">73</xref>]</bold></td>
<td>RL &#x002B; AT</td>
<td>Packet &#x0026; flow-level data</td>
<td>Dual-level intrusion detection</td>
<td>Complex feature engineering</td>
<td>High computational cost</td>
<td>Improved detection granularity</td>
<td>Feature selection optimization</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-76">76</xref>]</bold></td>
<td>Double deep Q-network</td>
<td>CPS anomaly data</td>
<td>Detects anomalies in CPS</td>
<td>Limited explainability</td>
<td>Specific CPS architecture</td>
<td>High AD precision</td>
<td>Privacy-preserving mechanisms</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-86">86</xref>]</bold></td>
<td>Federated RL Botnet detection</td>
<td>IoT botnet traces</td>
<td>Privacy-preserving collaborative learning</td>
<td>Communication overhead</td>
<td>Requires synchronized updates</td>
<td>Effective botnet mitigation</td>
<td>Lightweight federated RL</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-91">91</xref>]</bold></td>
<td>Collaborative MARL</td>
<td>IoT intrusion dataset</td>
<td>Enhances detection via agent cooperation</td>
<td>Coordination complexity</td>
<td>Centralized bottlenecks</td>
<td>Improved detection rates</td>
<td>Decentralized multi-agent learning</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-116">116</xref>]</bold></td>
<td>Honeypot IDS via ARL</td>
<td>Industrial control network logs</td>
<td>Simulates attacker behavior for initiative-taking defense</td>
<td>Limited to honeypot scenarios</td>
<td>Narrow industrial scope</td>
<td>Increased detection of stealthy attacks</td>
<td>Expand to hybrid honeypot-cloud systems</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-124">124</xref>]</bold></td>
<td>Adversarial machine learning taxonomy</td>
<td>Literature synthesis</td>
<td>Comprehensive categorization of AML threats and defenses</td>
<td>No empirical validation</td>
<td>Theoretical focus lacks implementation benchmarks</td>
<td>Structured taxonomy of evasion, poisoning, inference attacks</td>
<td>Empirical testing across real-world cybersecurity platforms</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-130">130</xref>]</bold></td>
<td>BIRD: Backdoor detect-remove</td>
<td>RL benchmark environments</td>
<td>Generalizable backdoor mitigation</td>
<td>May miss stealthy triggers</td>
<td>Synthetic benchmarks only</td>
<td>Effective removal of known backdoors</td>
<td>Real-time detection in deployed systems</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-137">137</xref>]</bold></td>
<td>Adaptive reinforcement learning (ARCS framework)</td>
<td>20K cybersecurity incidents</td>
<td>Real-time incident response optimization</td>
<td>Requires extensive labeled data</td>
<td>Limited generalization to zero-day threats</td>
<td>27.3% faster resolution, 31.2% higher defense effectiveness</td>
<td>Expand to multi-stage attack handling and dynamic policy generation</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-144">144</xref>]</bold></td>
<td>ARCS: Adaptive RL for incident response.</td>
<td>Cyber incident logs</td>
<td>Automates incident response optimization</td>
<td>Requires extensive historical data</td>
<td>Predefined incident types</td>
<td>Improved response time and accuracy</td>
<td>Zero-day threat handling</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-145">145</xref>]</bold></td>
<td>DRL for advanced threat detection</td>
<td>Mixed cybersecurity datasets</td>
<td>Addresses evolving threat vectors with adaptive learning</td>
<td>High model complexity</td>
<td>Requires continuous retraining</td>
<td>High detection accuracy across threat types</td>
<td>Continuous learning with minimal supervision</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-146">146</xref>]</bold></td>
<td>Blockchain &#x002B; Adaptive adversarial learning</td>
<td>150K cyber threat samples (25 attack types)</td>
<td>Tamper-proof, privacy-preserving, scalable threat detection</td>
<td>High computational cost</td>
<td>Limited real-time performance in high-throughput systems</td>
<td>98.7% detection accuracy, 45% reduction in FP</td>
<td>Integration with federated learning and smart contract automation</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-149">149</xref>]</bold></td>
<td>Deep PackGen (DRL for packet generation)</td>
<td>Public NIDS datasets</td>
<td>Generates functional adversarial packets for evasion testing</td>
<td>Limited to packet-level perturbations</td>
<td>May not generalize to flow-based or encrypted traffic</td>
<td>66.4% adversarial success rate; 45% out-of-distribution evasion</td>
<td>Extend to encrypted traffic and hybrid flow-packet perturbation models</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-150">150</xref>]</bold></td>
<td>AI/ML review framework</td>
<td>Multi-domain synthesis</td>
<td>Broad coverage of AI techniques in cybersecurity</td>
<td>Lacks experimental depth</td>
<td>High-level overview, not implementation-specific</td>
<td>Identify key trends in IDS and threat intelligence</td>
<td>Roadmap for XAI, federated learning, &#x0026; quantum-secure models</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-151">151</xref>]</bold></td>
<td>Monte Carlo &#x002B; Q-learning in adversarial simulation</td>
<td>Custom security game</td>
<td>Evaluates attacker-defender dynamics in stochastic environments</td>
<td>Simplified simulation environment</td>
<td>Not scalable to enterprise-grade systems</td>
<td>Q-learning is effective for defense.</td>
<td>Extend to multi-agent and real-world network simulations.</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-152">152</xref>]</bold></td>
<td>Adversarial ML in biometric systems</td>
<td>Biometric datasets (face spoofing)</td>
<td>Highlights vulnerabilities in adaptive biometric recognition systems</td>
<td>Focused on biometrics only</td>
<td>Limited cross-domain applicability</td>
<td>Demonstrated spoofing and template compromise techniques</td>
<td>Develop cross-domain adversarial defenses for biometric and IoT systems</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-153">153</xref>]</bold></td>
<td>ARL for chemical plant resilience</td>
<td>Tennessee Eastman process simulation</td>
<td>Models attacker-defender dynamics in industrial control systems</td>
<td>Requires theoretical verification for deployment</td>
<td>Simulation-based; not yet validated on physical infrastructure</td>
<td>Red Team induced shutdowns in &#x003C;3 min. Blue Team extended its uptime marginally.</td>
<td>Integrate digital twins and predictive analytics for industrial resilience</td>
</tr>
<tr>
<td><bold>[<xref ref-type="bibr" rid="ref-154">154</xref>]</bold></td>
<td>RL (Q-learning, DQN, SARSA, A3C, PPO, DDPG)</td>
<td>
<list list-type="bullet">
<list-item>
<p>NSL-KDD,</p></list-item>
<list-item>
<p>CICIDS2017,</p></list-item>
<list-item>
<p>UNSW-NB15,</p></list-item>
<list-item>
<p>KDD&#x2019;99,</p></list-item>
<list-item>
<p>IoTID20</p></list-item>
</list>
</td>
<td>Comprehensive taxonomy and critical examination of RL algorithms in IDS, evaluating detection accuracy, scalability, and adaptation in network contexts.</td>
<td>Limited focus on adversarial robustness and interoperability through ARL frameworks.</td>
<td>Unified benchmarking measures for RL models and a lack of real-world deployment.</td>
<td>Identifies RL-based IDS models with &#x003E;95% detection accuracy in simulated settings; DRL is more adaptable than conventional RL.</td>
<td>Evaluate robust &#x0026; flexible NIDS frameworks in real-time for ARL integration, cross-dataset generalization, and adversarial resistance.</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s6">
<label>6</label>
<title>Analysis &#x0026; Discussion</title>
<p>In this study, 154 past studies related to ARL-NIDS are reviewed across various applications and approaches, highlighting the need for a generic ARL-NIDS to detect and identify attacks, unbalanced data, and malicious activities. For the importance of this sub-section, we classified the sub-section into:
<list list-type="simple">
<list-item><label><bold>A</bold>.</label><p><bold>Publication Analysis</bold></p></list-item>
</list></p>
<p>This section examines the publication analysis, where <xref ref-type="fig" rid="fig-18">Fig. 18</xref> illustrates the yearly publication distribution over the most recent years. As shown in <xref ref-type="fig" rid="fig-18">Fig. 18</xref>, 2023 had the highest distribution, while 2015 and 2016 had the lowest distribution. Moreover, <xref ref-type="table" rid="table-14">Table 14</xref> provides a detailed breakdown of the number of publications.</p>
<fig id="fig-18">
<label>Figure 18</label>
<caption>
<title>Yearly publication distribution</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_73540-fig-18.tif"/>
</fig><table-wrap id="table-14">
<label>Table 14</label>
<caption>
<title>Summary of publication distribution per year</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th>Year</th>
<th># of Publications</th>
</tr>
</thead>
<tbody>
<tr>
<td><bold>2025</bold></td>
<td>12</td>
</tr>
<tr>
<td><bold>2024</bold></td>
<td>8</td>
</tr>
<tr>
<td><bold>2023</bold></td>
<td>35</td>
</tr>
<tr>
<td><bold>2022</bold></td>
<td>28</td>
</tr>
<tr>
<td><bold>2021</bold></td>
<td>24</td>
</tr>
<tr>
<td><bold>2020</bold></td>
<td>23</td>
</tr>
<tr>
<td><bold>2019</bold></td>
<td>13</td>
</tr>
<tr>
<td><bold>2018</bold></td>
<td>3</td>
</tr>
<tr>
<td><bold>2017</bold></td>
<td>5</td>
</tr>
<tr>
<td><bold>2016</bold></td>
<td>1</td>
</tr>
<tr>
<td><bold>2015</bold></td>
<td>1</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Regional contributions to ARL-NIDS research vary. <xref ref-type="fig" rid="fig-19">Fig. 19</xref> illustrates the regional distribution of ARL-NIDS worldwide. Also, the focus area of ARL-NIDS is identified in <xref ref-type="table" rid="table-15">Table 15</xref>. ML models for NIDS, traffic management, cybersecurity, and networking security have been prominent in North America, especially in the US. The UK, Germany, France, and Italy are actively involved in comparative IDS probes, IoT AD, and botnets. Brazil&#x2019;s ARL-based IDS and IoT lead South America by learning traffic classification and employing strong DRL. China leads in RL-based NIDS, traffic security, intelligent connected devices, and adaptive RL in the healthcare system, while India, Japan, and South Korea are increasing IoT and smart environmental applications. Middle East countries, including Saudi Arabia, the UAE, T&#x00FC;rkiye, Iran, and Jordan, are showing interest in IoT-focused infiltration and securing smart infrastructure. Nigeria, South Africa, and Egypt are researching IoT, smart grid cybersecurity, and theft attacks on IDS.</p>
<fig id="fig-19">
<label>Figure 19</label>
<caption>
<title>Regional ARL-NIDS contributions worldwide</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_73540-fig-19.tif"/>
</fig><table-wrap id="table-15">
<label>Table 15</label>
<caption>
<title>Focus area of ARL-NIDS regions summary</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th>Region</th>
<th>Country</th>
<th>Focus Area</th>
</tr>
</thead>
<tbody>
<tr>
<td rowspan="3">North America</td>
<td>United States</td>
<td rowspan="3"><list list-type="bullet">
<list-item>
<p>NIDS.</p></list-item>
<list-item>
<p>Cybersecurity in traffic management systems.</p></list-item>
<list-item>
<p>ML for networking cybersecurity.</p></list-item>
</list></td>
</tr>
<tr>

<td>Canada</td>

</tr>
<tr>

<td>Mexico</td>

</tr>
<tr>
<td rowspan="10"><bold>Europe</bold></td>
<td>United Kingdom</td>
<td rowspan="10"><list list-type="bullet">
<list-item>
<p>Comparative analysis of IDS.</p></list-item>
<list-item>
<p>AD in IIoT.</p></list-item>
<list-item>
<p>Botnet detection and mitigation models.</p></list-item>
<list-item>
<p>DL approaches for cybersecurity.</p></list-item>
</list></td>
</tr>
<tr>

<td>Germany</td>

</tr>
<tr>

<td>France</td>

</tr>
<tr>

<td>Italy</td>

</tr>
<tr>

<td>Spain</td>

</tr>
<tr>

<td>Netherlands</td>

</tr>
<tr>

<td>Sweden</td>

</tr>
<tr>

<td>Norway</td>

</tr>
<tr>

<td>Poland</td>

</tr>
<tr>

<td>Switzerland</td>

</tr>
<tr>
<td rowspan="5"><bold>South America</bold></td>
<td>Brazil</td>
<td rowspan="5"><list list-type="bullet">
<list-item>
<p>ARL-based IDS.</p></list-item>
<list-item>
<p>IoT traffic classification.</p></list-item>
<list-item>
<p>Robust DRL.</p></list-item>
</list>
</td>
</tr>
<tr>

<td>Argentina</td>

</tr>
<tr>

<td>Chile</td>

</tr>
<tr>

<td>Colombia</td>

</tr>
<tr>

<td>Peru</td>

</tr>
<tr>
<td rowspan="10"><bold>Asia</bold></td>
<td>China</td>
<td rowspan="10"><list list-type="bullet">
<list-item>
<p>NIDS using RL.</p></list-item>
<list-item>
<p>Enhancing road safety and cybersecurity.</p></list-item>
<list-item>
<p>Data validity analysis in intelligent connected vehicles.</p></list-item>
<list-item>
<p>Adaptive RL in smart healthcare.</p></list-item>
</list></td>
</tr>
<tr>

<td>India</td>

</tr>
<tr>

<td>Japan</td>

</tr>
<tr>

<td>South Korea</td>

</tr>
<tr>

<td>Indonesia</td>

</tr>
<tr>

<td>Saudi Arabia</td>

</tr>
<tr>

<td>UAE</td>

</tr>
<tr>

<td>T&#x00FC;rkiye</td>

</tr>
<tr>

<td>Iran</td>

</tr>
<tr>

<td>Jordan</td>

</tr>
<tr>
<td rowspan="8"><bold>Africa</bold></td>
<td>Nigeria</td>
<td rowspan="8"><list list-type="bullet">
<list-item>
<p>IDS for IoT.</p></list-item>
<list-item>
<p>Cybersecurity challenges in smart grids.</p></list-item>
<list-item>
<p>Evasion attack countermeasures.</p></list-item>
</list>
</td>
</tr>
<tr>

<td>South Africa</td>

</tr>
<tr>

<td>Egypt</td>

</tr>
<tr>

<td>Kenya</td>

</tr>
<tr>

<td>Ethiopia</td>

</tr>
<tr>

<td>Ghana</td>

</tr>
<tr>

<td>Tanzania</td>

</tr>
<tr>

<td>Uganda</td>

</tr>
<tr>
<td rowspan="4"><bold>Oceania</bold></td>
<td>Australia</td>
<td rowspan="4"><list list-type="bullet">
<list-item>
<p>AD in IoT networks.</p></list-item>
<list-item>
<p>Cybersecurity frameworks for industrial sys.</p></list-item>
</list></td>
</tr>
<tr>

<td>New Zealand</td>

</tr>
<tr>

<td>Papua New Guinea</td>

</tr>
<tr>

<td>Fiji</td>

</tr>
</tbody>
</table>
</table-wrap>
<p>Finally, Oceania, led by Australia, has upgraded its IoT infrastructure and industrial cybersecurity structure. These regional contributions show that ARL-NIDS are crucial to protecting modern digital infrastructure worldwide.</p>
<p><list list-type="simple">
<list-item><label><bold>B</bold>.</label><p><bold>Analysis of Publication Type</bold></p></list-item>
</list></p>
<p>For a deeper understanding of the spread and maturity of ARL-NIDS research, it is necessary to analyze the types of publications, magazines, conferences, and other educational resources that have contributed to shaping the field. <xref ref-type="fig" rid="fig-20">Fig. 20</xref> shows the percentage distribution of publication types, while <xref ref-type="table" rid="table-16">Table 16</xref> provides a numerical breakdown. The Magazine Dominates the Scenario with 107 Publications, Reflecting the Maturity of ARL-NIDS Research and Colleague-Review. Conferences 44 stands for tasks, which indicate a rapid spread of conclusions within a living pipeline and research society of innovative ideas. Other contributions include a book chapter and a dissertation reflecting initial investigative efforts. This distribution suggests that the area has exceeded the first use for stricter, colleague-human production. Journals provide extensive visibility and high citation capacity, highlighting impressive features that shape the track into ARL-NIDS. Meanwhile, the stable flow of conference articles emphasizes innovation and current use, while the limited presence of research and book chapters indicates unused opportunities for deep academic studies.</p>
<fig id="fig-20">
<label>Figure 20</label>
<caption>
<title>Percentage of publication types</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_73540-fig-20.tif"/>
</fig><table-wrap id="table-16">
<label>Table 16</label>
<caption>
<title>Summary of publication type</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th>Publications type</th>
<th/>
<th># of Publications</th>
</tr>
</thead>
<tbody>
<tr>
<td><bold>Journal</bold></td>
<td/>
<td>107</td>
</tr>
<tr>
<td><bold>Conference</bold></td>
<td/>
<td>44</td>
</tr>
<tr>
<td rowspan="2"><bold>Other</bold></td>
<td>Book chapter</td>
<td>1</td>
</tr>
<tr>

<td>Thesis</td>
<td>1</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Overall, the dominance of journal publications emphasizes the comparative weight of applied research over purely theoretical studies, reflecting the field&#x2019;s strong orientation toward practical deployment in real-world cybersecurity systems.</p>
<p><list list-type="simple">
<list-item><label><bold>C</bold>.</label><p><bold>ARL-NIDS Datasets Analysis</bold></p></list-item>
</list></p>
<p>Dataset ARL-based to detect infiltration, create a basis for exercise, validation, and benchmarking. <xref ref-type="fig" rid="fig-21">Fig. 21</xref> shows the most commonly used datasets NSL-KD, Cicids2017, CICIS-DOS-2019, UNSW-NB15, and IoT-specific datasets such as AWID, IoT-AID-20, and Kitsune. NSL-KD and Cicids2017 dominate the literature due to their structured labeling and extensive acceptance, while UNSW-NB15 provides more realistic traffic flows and a modern attack scenario.</p>
<fig id="fig-21">
<label>Figure 21</label>
<caption>
<title>Top common datasets</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_73540-fig-21.tif"/>
</fig>
<p>Beyond the traditional dataset, interest in new domain-specific datasets, particularly in CPS, Industrial IoT (IIoT), and healthcare systems, is increasing, as unwanted threats directly impact safety and operational reliability. However, various data sets exhibit a recurring border imbalance, with significantly fewer malicious samples than benign traffic, and the outdated nature of the traffic patterns, which cannot accurately reflect the landscape of modern attacks. Researchers quickly utilize synthetic data paper techniques, including GANs, learning reinforcement, and simulation environments, to eliminate these intervals and produce side effects, thereby balancing data sets. This trend enhances the strength of model training and provides an opportunity to evaluate ARL-NIDS in realistic, dynamic scenarios. Overall, the development of the data set postpones the projection of the area from the Cultural Heritage Market against domain-specific and unwanted rich datasets, which are better suited to future cybersecurity challenges.
<list list-type="simple">
<list-item><label><bold>D</bold>.</label><p><bold>Comparative Model Performance</bold></p></list-item>
</list></p>
<p>Important inspirations are detected compared to traditional ML and other RL variants, such as DRL and MARL. Traditional ML models (DT, Support Vector Machines, and Random Forests) perform well in a static environment but struggle with dynamic and adaptive attack strategies. In contrast, DRL approaches, such as DQN, DDQN, and PPO, are better suited to detect complex and developed threats, particularly in large datasets. The Marl Framework expands these opportunities, which are especially valuable for IoT and IIoT systems, by enabling distributed and cooperative identity in a multi-agent environment. However, these performances come at the expense of lost profits. Accuracy compared to traditional ML is usually improved with ARL and DRL. Still, training time and calculation costs also increase significantly due to the complexity of the neurological network architecture and negative training. Similarly, the scalability of Marl is expanded in distributed systems, but the coordination between agents introduces synchronization challenges. While the ARL-based models push restrictions on detection functions, their distribution requires balanced performance with practical ideas on costs, speed, and scalability.
<list list-type="simple">
<list-item><label><bold>E</bold>.</label><p><bold>Industrial &#x0026; Societal Impact</bold></p></list-item>
</list></p>
<p>There are far-reaching implications in integrating ARL-based infiltration systems in industrial and social domains. In critical infrastructure such as energy networks, ARL-NIDS can detect potential cyberattacks and adapt to those that threaten large-scale disruptions. In the healthcare system, learning from dynamic side effect behavior protects against manipulation of IoT units and data breaches, thereby safeguarding patient security and privacy. Similarly, ARL-NIDS supports smart cities&#x2019; transportation, tools, and communication networks, such as the system&#x2019;s flexibility, which can have widespread consequences for safety phenomena. Beyond technical benefits, ARL-NIDS also increases the need for politics and standardization. The lack of standard benchmarks and evaluation standards makes it difficult to compare solutions and ensure industry interoperability. Also, regulatory bodies will require addressing problems such as compliance with privacy frameworks when distributed in the real world. Establishing such policies will increase confidence in ARL-based solutions and accelerate the adoption of industrial &#x0026; social domains.
<list list-type="simple">
<list-item><label><bold>F</bold>.</label><p><bold>ARL Summary</bold></p></list-item>
</list></p>
<p>The rapid development of ARL for NIDS underscores its transformative capabilities in modern cybersecurity. While technology reflects adaptability, strength, and scalability, the journey towards applying it in the real world requires a balanced understanding of its powers and challenges. This section summarizes the ARL-NIDS research by checking practical distribution challenges, intervals, and boundaries, as well as the benefits of the future, giving a general approach to the current position and the region&#x2019;s future direction.
<list list-type="bullet">
<list-item>
<p><bold><italic>Practical Deployment Challenges</italic></bold></p></list-item>
</list></p>
<p>Although ARL-NIDS makes important promises, the deployment is not without obstacles. The primary challenge is calculating the overhead and training costs associated with the more complex and multi-agent reinforcement learning structure. Training side effects require extensive resources, often making them impractical for light IoT or edge equipment. Another challenge is the realism of the dataset compared to the real-world traffic. In most available data sets, the attack pattern lacks variation or fails to capture dynamic traffic behavior, which limits the model&#x2019;s efficiency when implemented in the Live network. In addition, the integration of ARL-NIDs into the existing SOC is composed. SOC workflows depend on detecting interoperability, explanation, and low oppression, as ARL-based models can often meet these requirements without significant fine-tuning. These challenges underscore the need to bridge the gap between educational research and the operational cybersecurity environment.
<list list-type="bullet">
<list-item>
<p><bold><italic>ARL within Real World Scenarios Deployments</italic></bold></p></list-item>
</list></p>
<p>ARL-NIDS real-world deployments can integrate into modern network infrastructures to support adaptive defense mechanisms. For instance, in industrial IoT, ARL agents can continuously optimize anomaly thresholds to counter zero-day attacks. Similarly, in cloud-based NIDS, they can benefit from ARL-driven sampling and reward mechanisms to manage large-scale, distributed traffic. However, practical adoption requires addressing computational costs, data labeling limitations, and the interpretability of decision policies. ARL-NIDS recently commenced transferring from theoretical designs to practical implementations across various domains. However, IoT networks and UAVs, as well as ARL fashions, have been efficiently leveraged to enhance distributed AD and decentralized decision-making, presenting adaptive security against advancing cyber threats [<xref ref-type="bibr" rid="ref-120">120</xref>,<xref ref-type="bibr" rid="ref-123">123</xref>]. Industrial and critical infrastructure systems, including chemical plants and smart grids, are utilizing ARL to autonomously counter denial-of-service (DoS) and command injection attacks in real-time, thereby enhancing operational resilience and security. However, in edge and cloud-based contexts, ARL has proven useful for improving packet sampling, traffic prioritization, and secure routing within dynamic, large-scale infrastructure [<xref ref-type="bibr" rid="ref-122">122</xref>,<xref ref-type="bibr" rid="ref-125">125</xref>]. Furthermore, in cybersecurity simulation frameworks, ARL has been used for adversary scenario modeling and policy evaluation, thus facilitating more realistic and adaptive defense assessments [<xref ref-type="bibr" rid="ref-132">132</xref>]. In addition to traditional networking, ARL-NIDS ideas are applied to biometric systems and autonomous industrial control, where resilience is essential for privacy-preserving authentication and risk mitigation [<xref ref-type="bibr" rid="ref-152">152</xref>]. These real-world implementations demonstrate that ARL-NIDS offers a viable approach to a self-learning, flexible, and adaptive cybersecurity framework that aligns with the evolving AI paradigms in cybersecurity [<xref ref-type="bibr" rid="ref-150">150</xref>].
<list list-type="bullet">
<list-item>
<p><bold><italic>Benefits</italic></bold> <bold><italic>&#x0026;</italic></bold> <bold><italic>Enhancements</italic></bold></p></list-item>
</list></p>
<p>Regardless of obstacles and challenges, ARL-NIDS&#x2019; compelling benefits and routes provide growth opportunities. <xref ref-type="fig" rid="fig-22">Fig. 22</xref> outlines the key benefits and potential future improvements associated with employing the ARL approach. One of the most remarkable benefits is its adaptive learning ability, which enables real-time adjustment to new or unseen attack strategies. This adaptability improves flexibility against developing cyber hazards compared to stable ML or rule-based IDS. In addition, the ARL framework provides an opportunity to simulate unfavorable dangers during training, leading to robust and active defense systems. Looking ahead, the campaign is expected to address today&#x2019;s shortcomings. These include the development of light and energy-efficient ARL models for the IoT environment, as well as the integration of clarification tools to increase transparency and confidence. The inclusion of blockchain and federated learning can further enhance decentralization, privacy, and data integrity, while the healthcare system, encompassing research in smart cities, extends the purpose of ARL. Together, these benefits and improvements strengthen ARL-NIDS as a foundation stone for the next generation of cybersecurity systems.</p>
<fig id="fig-22">
<label>Figure 22</label>
<caption>
<title>ARL key benefits &#x0026; future improvements</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_73540-fig-22.tif"/>
</fig>
<p><underline><bold>Benefits:</bold></underline>
<list list-type="simple">
<list-item><label>&#x2014;</label><p><bold>Adaptive Learning:</bold> Continuously evolves to detect new and sophisticated cyberattacks without manual retraining.</p></list-item>
<list-item><label>&#x2014;</label><p><bold>Resilience:</bold> More robust to adversarial manipulation than static ML/IDS models.</p></list-item>
<list-item><label>&#x2014;</label><p><bold>Scalability:</bold> Effective across diverse environments such as IoT, IIoT, CPS, and cloud infrastructures.</p></list-item>
<list-item><label>&#x2014;</label><p><bold>Realistic Simulation:</bold> Ability to train models under adversarial settings that mimic real-world cyberattacks.</p></list-item>
<list-item><label>&#x2014;</label><p><bold>Automation:</bold> Reduces manual intrusion detection and response workload, supporting autonomous defense.</p></list-item>
</list></p>
<p><underline><bold>Future Enhancements:</bold></underline>
<list list-type="simple">
<list-item><label>&#x2014;</label><p><bold>Adaptive Multi-Agent Systems:</bold> Expansion of MARL frameworks to coordinate defense across distributed networks and IoT ecosystems.</p></list-item>
<list-item><label>&#x2014;</label><p><bold>Robustness Against Evolving Adversaries:</bold> Embedding AT with continuous threat intelligence feeds.</p></list-item>
<list-item><label>&#x2014;</label><p><bold>Domain-Specific ARL Models:</bold> Tailored for healthcare, finance, transportation, &#x0026; industrial control systems.</p></list-item>
<list-item><label>&#x2014;</label><p><bold>Standardized Evaluation Metrics:</bold> Establishing benchmarks for attack coverage, training efficiency, and real-world deployment performance.</p></list-item>
<list-item><label>&#x2014;</label><p><bold>Integration with Incident Response:</bold> Moving beyond detection to fully adaptive ARL-powered cyber defense orchestration.</p></list-item>
<list-item><label>&#x2014;</label><p><bold>Explainable ARL (X-ARL):</bold> Developing interpretable reinforcement learning models to increase trust and adoption in security operations.</p></list-item>
<list-item><label>&#x2014;</label><p><bold>Energy-Efficient Models:</bold> Optimization for low-power IoT and edge devices to reduce computational costs.</p></list-item>
<list-item><label>&#x2014;</label><p><bold>Cross-Integration:</bold> Hybrid models of ARL with federated learning, blockchain, &#x0026; TL for enhanced protection.</p></list-item>
</list></p>
</sec>
<sec id="s7">
<label>7</label>
<title>Conclusion &#x0026; Future Work</title>
<p>This survey provides a comprehensive overview of ARL as a transformative approach to enhance NIDS. By synthesizing 159 studies from various domains, including IoT, CPS, smart networks, and autonomous systems, this analysis highlights the growing importance of ARL in addressing dynamic and unfavorable environments. The integration of negative training with RL enables NIDS agents to learn beneficially from hostile behavior, thereby improving detection accuracy, flexibility, and scalability. The review emphasizes that the ARL-NIDS framework is reactive and uses unfavorable conditions to strengthen the defense mechanism. This work lays the foundation for future innovation in intelligent cybersecurity systems through comparative analysis, architectural insights, and the development of a comprehensive taxonomy.</p>
<p><underline><bold>Novelty:</bold></underline> This survey study represents one of the first comprehensive investigations into the integration of ARL-NIDS within an IoT environment. This study uniquely combines RL and AL to form the novel discipline of ARL-NIDS. Previous studies have concentrated on each paradigm independently. However, ARL algorithms, their applications, and cybersecurity datasets are categorized in the article, offering an integrative approach absent in the literature. In addition, it discusses real-world implementation scenarios, highlights critical cyber threats, and identifies security challenges associated with the ARL-based NIDS framework. By systematically reviewing existing studies and models, this survey provides both theoretical and practical guidance for researchers and practitioners to understand the ARL-NIDS design principles, benchmark datasets, and evaluation metrics. Overall, this survey contributes a fundamental context that supports future research, model development, and risk-aware applications of ARL in intelligent cybersecurity defense systems.</p>
<p><underline><bold>Key Findings</bold></underline>: The current survey study identifies that ARL-NIDS increases the framework&#x2019;s adaptability and strength. By mimicking side effects during training, ARL-equipped agents can more effectively detect and counter theft, toxicity, and manipulation attacks. Hybrid models that combine ARL with technologies such as GANs, TL, and federated learning promise improvement in accuracy and scalability. In addition, Arl has demonstrated applications across domains, with successful distribution in IoT, industrial control systems, smart health services, and autonomous vehicles. Despite the dependence on benchmark datasets like NSL-KD and CICIDS2017, interest in synthetic and domain-specific datasets is increasing, reflecting the adverse dynamics of the real world.</p>
<p><underline><bold>Limitations of the Research:</bold></underline> Although ARL offers convincing benefits, several limitations persist. Training of ARL models is calculation-intensive, presenting challenges for distributing resources such as Edge and IoT units. The interpretation of learning policy for DRL is limited and complicates forensic analysis and regulatory compliance. Various existing data sets have harmful diversity and a lack of real-time traffic properties, which reduces the generality of trained models. Multi-agent ARL systems encounter challenges and scalability issues, particularly in distributed settings. In addition, the reward highlights the reward forming and the policy manipulation ARL framework for the new attack vectors.</p>
<p><underline><bold>Recommendations for Future Studies:</bold></underline> Future research should prefer developing clear ARL models to increase openness and confidence in important applications. Adapted light architecture, age, and IoT networks are required to detect real-time infiltration in the IoT network. Establishing standardized evaluation frameworks will provide continuous benchmarking facilities during the implementation of ARL-NIDS. Domain-specific ARL models can enhance the relevant threat of matching areas, such as healthcare, finance, and industrial automation. Integrating ARL with the event reaction mechanisms can enable autonomous and adaptive cyber defense strategies. Furthermore, future investigations on ARL-NIDS should address varied pressing challenges that currently hinder real-world deployment. These include ensuring online learning stability in the face of continuous and evolving network traffic, enhancing explainability to render ARL decision-making transparent and trustworthy, and improving scalability for IoT and edge environments with limited computational resources. Additionally, integrating privacy-preserving mechanisms and cross-domain adaptability might be essential to enable ARL models to generalize security across diverse cybersecurity contexts. These challenges highlight promising directions for advancing ARL-NIDS from theoretical exploration to practical, resilient defense systems. Ultimately, creating unfavorable, rich, balanced, and realistic data sets is possible through the use of GANs or simulation environments, which will be crucial for maintaining the strength and reliability of ARL-based NIDs.</p>
</sec>
</body>
<back>
<ack>
<p>Not applicable.</p>
</ack>
<sec>
<title>Funding Statement</title>
<p>The authors received no specific funding for this study.</p>
</sec>
<sec>
<title>Author Contributions</title>
<p>The authors confirm contribution to the paper as follows: study conception and design: Qasem Abu Al-Haija, Shahad Al Tamimi; data collection, literature review, analysis and interpretation of results: Qasem Abu Al-Haija, Shahad Al Tamimi; draft manuscript preparation: Qasem Abu Al-Haija, Shahad Al Tamimi; supervision and funding acquisition: Qasem Abu Al-Haija. All authors reviewed the results and approved the last version of the manuscript.</p>
</sec>
<sec sec-type="data-availability">
<title>Availability of Data and Materials</title>
<p>No new data were created or analyzed in this study; therefore, data sharing is not applicable.</p>
</sec>
<sec>
<title>Ethics Approval</title>
<p>Not applicable.</p>
</sec>
<sec sec-type="COI-statement">
<title>Conflicts of Interest</title>
<p>The authors declare no conflicts of interest to report regarding the present study.</p>
</sec>
<glossary content-type="abbreviations" id="glossary-1">
<def-list>
<def-item>
<term><bold>Abbreviation</bold></term>
<def>
<p><bold>Description</bold></p>
</def>
</def-item>
<def-item>
<term>NIDS</term>
<def>
<p>Network Intrusion Detection System.</p>
</def>
</def-item>
<def-item>
<term>AT</term>
<def>
<p>Adversarial Training.</p>
</def>
</def-item>
<def-item>
<term>IDS</term>
<def>
<p>Intrusion Detection System.</p>
</def>
</def-item>
<def-item>
<term>ARL</term>
<def>
<p>Adversarial Reinforcement Learning.</p>
</def>
</def-item>
<def-item>
<term>RL</term>
<def>
<p>Reinforcement Learning.</p>
</def>
</def-item>
<def-item>
<term>DL</term>
<def>
<p>Deep Learning.</p>
</def>
</def-item>
<def-item>
<term>AI</term>
<def>
<p>Artificial Intelligence.</p>
</def>
</def-item>
<def-item>
<term>DRL</term>
<def>
<p>Deep Reinforcement Learning.</p>
</def>
</def-item>
<def-item>
<term>ML</term>
<def>
<p>Machine Learning.</p>
</def>
</def-item>
<def-item>
<term>GAN</term>
<def>
<p>Generative Adversarial Network.</p>
</def>
</def-item>
<def-item>
<term>MDP</term>
<def>
<p>Markov Decision Process.</p>
</def>
</def-item>
<def-item>
<term>DQN</term>
<def>
<p>Deep Q-Networks.</p>
</def>
</def-item>
<def-item>
<term>PG</term>
<def>
<p>Policy-Gradient.</p>
</def>
</def-item>
<def-item>
<term>IoT</term>
<def>
<p>Internet of Things.</p>
</def>
</def-item>
<def-item>
<term>WSN</term>
<def>
<p>Wireless Sensor Networks.</p>
</def>
</def-item>
<def-item>
<term>KNN</term>
<def>
<p>K-Nearest Neighbors.</p>
</def>
</def-item>
<def-item>
<term>DNN</term>
<def>
<p>Deep Neural Networks.</p>
</def>
</def-item>
<def-item>
<term>AD</term>
<def>
<p>Anomaly Detection.</p>
</def>
</def-item>
<def-item>
<term>APTs</term>
<def>
<p>Advanced Persistent Threats</p>
</def>
</def-item>
<def-item>
<term>SVM</term>
<def>
<p>Support Vector Machine.</p>
</def>
</def-item>
<def-item>
<term>NB</term>
<def>
<p>Naive Bayes.</p>
</def>
</def-item>
<def-item>
<term>MLP</term>
<def>
<p>Multi-Layer Perceptron.</p>
</def>
</def-item>
<def-item>
<term>DDQN</term>
<def>
<p>Double Deep Q-Network.</p>
</def>
</def-item>
<def-item>
<term>SMOTE</term>
<def>
<p>Synthetic Minority Over-sampling Technique.</p>
</def>
</def-item>
<def-item>
<term>MARL</term>
<def>
<p>Multi-Agent Reinforcement Learning.</p>
</def>
</def-item>
<def-item>
<term>IIOT</term>
<def>
<p>Industrial Internet of Things.</p>
</def>
</def-item>
<def-item>
<term>LSTM</term>
<def>
<p>Long Short-Term Memory.</p>
</def>
</def-item>
<def-item>
<term>DT</term>
<def>
<p>Decision Trees.</p>
</def>
</def-item>
<def-item>
<term>RF</term>
<def>
<p>Random Forests.</p>
</def>
</def-item>
<def-item>
<term>TL</term>
<def>
<p>Transfer Learning.</p>
</def>
</def-item>
<def-item>
<term>CPS</term>
<def>
<p>Cyber-Physical Systems.</p>
</def>
</def-item>
<def-item>
<term>IIoT</term>
<def>
<p>Industrial IoT.</p>
</def>
</def-item>
<def-item>
<term>MLP</term>
<def>
<p>Multilayer Perceptron..</p>
</def>
</def-item>
<def-item>
<term>WSN</term>
<def>
<p>Wireless Sensor Networks.</p>
</def>
</def-item>
<def-item>
<term>MTD</term>
<def>
<p>Moving Target Defense.</p>
</def>
</def-item>
<def-item>
<term>FP</term>
<def>
<p>False Positive</p>
</def>
</def-item>
<def-item>
<term>FGSM</term>
<def>
<p>Fast Gradient Sign Method.</p>
</def>
</def-item>
<def-item>
<term>BIM</term>
<def>
<p>Basic Iterative Method.</p>
</def>
</def-item>
<def-item>
<term>SDN</term>
<def>
<p>Software-Defined Networks.</p>
</def>
</def-item>
<def-item>
<term>MTD</term>
<def>
<p>Moving Target Defense.</p>
</def>
</def-item>
<def-item>
<term>FPR</term>
<def>
<p>False Positive Rate.</p>
</def>
</def-item>
<def-item>
<term>MAGPIE</term>
<def>
<p>Magnetic Positioning Indoor Estimation.</p>
</def>
</def-item>
<def-item>
<term>PPO</term>
<def>
<p>Proximal Policy Optimization.</p>
</def>
</def-item>
<def-item>
<term>IoBT</term>
<def>
<p>Internet of Battlefield Things.</p>
</def>
</def-item>
<def-item>
<term>LR</term>
<def>
<p>Logistic Regression.</p>
</def>
</def-item>
<def-item>
<term>LDA</term>
<def>
<p>Linear Discriminant Analysis.</p>
</def>
</def-item>
<def-item>
<term>QDA</term>
<def>
<p>Quadratic Discriminant Analysis.</p>
</def>
</def-item>
<def-item>
<term>BAG</term>
<def>
<p>Bagging.</p>
</def>
</def-item>
<def-item>
<term>RF</term>
<def>
<p>Random Forest.</p>
</def>
</def-item>
<def-item>
<term>DT</term>
<def>
<p>Decision Tree.</p>
</def>
</def-item>
<def-item>
<term>CNN</term>
<def>
<p>Convolutional Neural Network.</p>
</def>
</def-item>
<def-item>
<term>RNN</term>
<def>
<p>Recurrent Neural Networks.</p>
</def>
</def-item>
<def-item>
<term>SOTA</term>
<def>
<p>State-Of-The-Art.</p>
</def>
</def-item>
<def-item>
<term>IF</term>
<def>
<p>Isolation Forest.</p>
</def>
</def-item>
<def-item>
<term>SOC</term>
<def>
<p>Security Operations Center.</p>
</def>
</def-item>
<def-item>
<term>GNB</term>
<def>
<p>Gaussian Naive Bayes.</p>
</def>
</def-item>
<def-item>
<term>ANN</term>
<def>
<p>Artificial Neural Network.</p>
</def>
</def-item>
<def-item>
<term>APT</term>
<def>
<p>Advanced Persistent Threats.</p>
</def>
</def-item>
<def-item>
<term>ROC</term>
<def>
<p>Receiver Operating Characteristic Curve</p>
</def>
</def-item>
<def-item>
<term>AUC</term>
<def>
<p>Area Under Curve.</p>
</def>
</def-item>
<def-item>
<term>CVNN</term>
<def>
<p>Convolutional Variational Neural Network</p>
</def>
</def-item>
<def-item>
<term>GNNs</term>
<def>
<p>Graph Convolutional Networks.</p>
</def>
</def-item>
<def-item>
<term>HIDS</term>
<def>
<p>Hybrid Intrusion Detection System.</p>
</def>
</def-item>
<def-item>
<term>Host- IDS</term>
<def>
<p>Host- Intrusion Detection System.</p>
</def>
</def-item>
<def-item>
<term>DAP</term>
<def>
<p>Decoupled Adversarial Policy.</p>
</def>
</def-item>
<def-item>
<term>MITM</term>
<def>
<p>Man In The Middle Attack.</p>
</def>
</def-item>
<def-item>
<term>DoS</term>
<def>
<p>Denial of Service.</p>
</def>
</def-item>
<def-item>
<term>DDoS</term>
<def>
<p>Distributed Denial of Service.</p>
</def>
</def-item>
</def-list>
</glossary>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Zhang</surname> <given-names>T</given-names></string-name>, <string-name><surname>Bao</surname> <given-names>S</given-names></string-name></person-group>. <article-title>A novel deep neural network model for computer network intrusion detection considering connection efficiency of network systems</article-title>. In: <conf-name>Proceedings of the 2022 4th International Conference on Smart Systems and Inventive Technology (ICSSIT)</conf-name>; <year>2022 Jan 20&#x2013;22</year>; <publisher-loc>Tirunelveli, India</publisher-loc>.</mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Li</surname> <given-names>XY</given-names></string-name>, <string-name><surname>Tang</surname> <given-names>R</given-names></string-name>, <string-name><surname>Song</surname> <given-names>W</given-names></string-name></person-group>. <article-title>Intrusion detection system using improved convolution neural network</article-title>. In: <conf-name>Proceedings of the 2022 11th International Conference of Information and Communication Technology (ICTech)</conf-name>; <year>2022 Feb 4&#x2013;6</year>; <publisher-loc>Wuhan, China</publisher-loc>. p. <fpage>97</fpage>&#x2013;<lpage>100</lpage>.</mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Ilahi</surname> <given-names>I</given-names></string-name>, <string-name><surname>Usama</surname> <given-names>M</given-names></string-name>, <string-name><surname>Qadir</surname> <given-names>J</given-names></string-name>, <string-name><surname>Janjua</surname> <given-names>MU</given-names></string-name>, <string-name><surname>Al-Fuqaha</surname> <given-names>A</given-names></string-name>, <string-name><surname>Hoang</surname> <given-names>DT</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Challenges and countermeasures for adversarial attacks on deep reinforcement learning</article-title>. <source>IEEE Trans Artif Intell</source>. <year>2022</year>;<volume>3</volume>(<issue>2</issue>):<fpage>90</fpage>&#x2013;<lpage>109</lpage>. doi:<pub-id pub-id-type="doi">10.1109/tai.2021.3111139</pub-id>.</mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Tan</surname> <given-names>S</given-names></string-name>, <string-name><surname>Zhong</surname> <given-names>X</given-names></string-name>, <string-name><surname>Tian</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Dong</surname> <given-names>Q</given-names></string-name></person-group>. <article-title>Sneaking through security: mutating live network traffic to evade learning-based NIDS</article-title>. <source>IEEE Trans Netw Serv Manag</source>. <year>2022</year>;<volume>19</volume>(<issue>3</issue>):<fpage>2295</fpage>&#x2013;<lpage>308</lpage>. doi:<pub-id pub-id-type="doi">10.1109/tnsm.2022.3173933</pub-id>.</mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Chen</surname> <given-names>L</given-names></string-name>, <string-name><surname>Kuang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Xu</surname> <given-names>A</given-names></string-name>, <string-name><surname>Suo</surname> <given-names>S</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>Y</given-names></string-name></person-group>. <article-title>A novel network intrusion detection system based on CNN</article-title>. In: <conf-name>Proceedings of the 2020 Eighth International Conference on Advanced Cloud and Big Data (CBD)</conf-name>; <year>2020 Dec 5&#x2013;6</year>; <publisher-loc>Taiyuan, China</publisher-loc>; 2020. p. <fpage>243</fpage>&#x2013;<lpage>7</lpage>.</mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Xia</surname> <given-names>S</given-names></string-name>, <string-name><surname>Qiu</surname> <given-names>M</given-names></string-name>, <string-name><surname>Jiang</surname> <given-names>H</given-names></string-name></person-group>. <article-title>An adversarial reinforcement learning based system for cyber security</article-title>. In: <conf-name>Proceedings of the 2019 IEEE International Conference on Smart Cloud (SmartCloud)</conf-name>; <year>2019 Dec 10&#x2013;12</year>; <publisher-loc>Tokyo, Japan</publisher-loc>. p. <fpage>227</fpage>&#x2013;<lpage>30</lpage>.</mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Li</surname> <given-names>J</given-names></string-name>, <string-name><surname>Xiao</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>J</given-names></string-name>, <string-name><surname>Feng</surname> <given-names>J</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>J</given-names></string-name></person-group>. <article-title>Network flow generation based on reinforcement learning powered generative adversarial network</article-title>. In: <conf-name>Proceedings of the 2021 7th IEEE International Conference on Network Intelligence and Digital Content (IC-NIDC)</conf-name>; <year>2019 Nov 17&#x2013;19</year>; <publisher-loc>Beijing, China</publisher-loc>. p. <fpage>235</fpage>&#x2013;<lpage>23</lpage>.</mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Ghasemi</surname> <given-names>M</given-names></string-name>, <string-name><surname>Ebrahimi</surname> <given-names>D</given-names></string-name></person-group>. <article-title>Introduction to reinforcement learning</article-title>. <comment>arXiv:2408.07712</comment>. <year>2024</year>.</mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Abbasi</surname> <given-names>M</given-names></string-name>, <string-name><surname>Shahraki</surname> <given-names>A</given-names></string-name>, <string-name><surname>Piran</surname> <given-names>M</given-names></string-name>, <string-name><surname>Taherkordi</surname> <given-names>A</given-names></string-name></person-group>. <article-title>Deep reinforcement learning for QoS provisioning at the MAC layer: a survey</article-title>. <source>Eng Appl Artif Intell</source>. <year>2021</year>;<volume>102</volume>:<fpage>20</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.engappai.2021.104234</pub-id>.</mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Suwannalai</surname> <given-names>E</given-names></string-name>, <string-name><surname>Polprasert</surname> <given-names>C</given-names></string-name></person-group>. <article-title>Network intrusion detection systems using adversarial reinforcement learning with deep Q-network</article-title>. In: <conf-name>Proceedings of the 2020 18th International Conference on ICT and Knowledge Engineering (ICT&#x0026;KE); 2020 Nov 18&#x2013;20</conf-name>; <publisher-loc>Bangkok, Thailand</publisher-loc>. p. <fpage>1</fpage>&#x2013;<lpage>7</lpage>.</mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Ma</surname> <given-names>X</given-names></string-name>, <string-name><surname>Shi</surname> <given-names>W</given-names></string-name></person-group>. <article-title>AESMOTE: adversarial reinforcement learning with SMOTE for anomaly detection</article-title>. <source>IEEE Trans Netw Sci Eng</source>. <year>2021</year>;<volume>8</volume>(<issue>2</issue>):<fpage>943</fpage>&#x2013;<lpage>56</lpage>. doi:<pub-id pub-id-type="doi">10.1109/TNSE.2020.3004312</pub-id>.</mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Xia</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Dong</surname> <given-names>S</given-names></string-name>, <string-name><surname>Peng</surname> <given-names>T</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>T</given-names></string-name></person-group>. <article-title>Wireless network abnormal traffic detection method based on deep transfer reinforcement learning</article-title>. In: <conf-name>Proceedings of the 2021 17th International Conference on Mobility, Sensing and Networking (MSN)</conf-name>; <year>2021 Dec 13&#x2013;15</year>; <publisher-loc>Exeter, UK</publisher-loc>. p. <fpage>528</fpage>&#x2013;<lpage>35</lpage>.</mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Arif</surname> <given-names>RM</given-names></string-name>, <string-name><surname>Aslam</surname> <given-names>M</given-names></string-name>, <string-name><surname>Al-Otaibi</surname> <given-names>S</given-names></string-name>, <string-name><surname>Maria Martinez-Enriquez</surname> <given-names>A</given-names></string-name>, <string-name><surname>Saba</surname> <given-names>T</given-names></string-name>, <string-name><surname>Bahaj</surname> <given-names>SA</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>A deep reinforcement learning framework to evade black-box machine learning based IoT malware detectors using GAN-generated influential features</article-title>. <source>IEEE Access</source>. <year>2023</year>;<volume>11</volume>:<fpage>133717</fpage>&#x2013;<lpage>29</lpage>. doi:<pub-id pub-id-type="doi">10.1109/ACCESS.2023.3334645</pub-id>.</mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Dake</surname> <given-names>DK</given-names></string-name>, <string-name><surname>Gadze</surname> <given-names>JD</given-names></string-name>, <string-name><surname>Klogo</surname> <given-names>GS</given-names></string-name></person-group>. <article-title>DDoS and flash event detection in higher bandwidth SDN-IoT using multiagent reinforcement learning</article-title>. In: <conf-name>Proceedings of the 2021 International Conference on Computing, Computational Modelling and Applications (ICCMA)</conf-name>; <year>2021 Jul 14&#x2013;16</year>; <publisher-loc>Brest, France</publisher-loc>. p. <fpage>16</fpage>&#x2013;<lpage>20</lpage>. doi:<pub-id pub-id-type="doi">10.1109/ICCMA53594.2021.00011</pub-id>.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Azam</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Islam</surname> <given-names>MM</given-names></string-name>, <string-name><surname>Huda</surname> <given-names>MN</given-names></string-name></person-group>. <article-title>Comparative analysis of intrusion detection systems and machine learning-based model analysis through decision tree</article-title>. <source>IEEE Access</source>. <year>2023</year>;<volume>11</volume>:<fpage>80348</fpage>&#x2013;<lpage>91</lpage>. doi:<pub-id pub-id-type="doi">10.1109/ACCESS.2023.3296444</pub-id>.</mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Malik</surname> <given-names>M</given-names></string-name>, <string-name><surname>Singh Saini</surname> <given-names>K</given-names></string-name></person-group>. <article-title>Network intrusion detection system using reinforcement learning</article-title>. In: <conf-name>Proceedings of the 2023 4th International Conference for Emerging Technology (INCET)</conf-name>; <year>2023 May 26&#x2013;28</year>; <publisher-loc>Belgaum, India</publisher-loc>. p. <fpage>1</fpage>&#x2013;<lpage>4</lpage>. doi:<pub-id pub-id-type="doi">10.1109/INCET57972.2023.10170630</pub-id>.</mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Tariq</surname> <given-names>ZUA</given-names></string-name>, <string-name><surname>Baccour</surname> <given-names>E</given-names></string-name>, <string-name><surname>Erbad</surname> <given-names>A</given-names></string-name>, <string-name><surname>Guizani</surname> <given-names>M</given-names></string-name>, <string-name><surname>Hamdi</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Network intrusion detection for smart infrastructure using multi-armed bandit based reinforcement learning in adversarial environment</article-title>. In: <conf-name>Proceedings of the 2022 International Conference on Cyber Warfare and Security (ICCWS)</conf-name>; <year>2022 Dec 7&#x2013;8</year>; <publisher-loc>Islamabad, Pakistan</publisher-loc>. p. <fpage>75</fpage>&#x2013;<lpage>82</lpage>. doi:<pub-id pub-id-type="doi">10.1109/ICCWS56285.2022.9998440</pub-id>.</mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Rookard</surname> <given-names>C</given-names></string-name>, <string-name><surname>Khojandi</surname> <given-names>A</given-names></string-name></person-group>. <article-title>Applying deep reinforcement learning for detection of internet-of-things cyber attacks</article-title>. In: <conf-name>Proceedings of the 2023 IEEE 13th Annual Computing and Communication Workshop and Conference (CCWC)</conf-name>; <year>2023 Mar 8&#x2013;11</year>; <publisher-loc>Las Vegas, NV, USA</publisher-loc>. p. <fpage>389</fpage>&#x2013;<lpage>95</lpage>. doi:<pub-id pub-id-type="doi">10.1109/CCWC57344.2023.10099349</pub-id>.</mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Ghanta</surname> <given-names>MSM</given-names></string-name>, <string-name><surname>Harsha</surname> <given-names>MD</given-names></string-name>, <string-name><surname>Bandi</surname> <given-names>NS</given-names></string-name>, <string-name><surname>Koganti</surname> <given-names>SN</given-names></string-name>, <string-name><surname>Sai</surname> <given-names>NR</given-names></string-name>, <string-name><surname>Venkateswara Rao</surname> <given-names>P</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Intrusion detection system using deep reinforcement learning</article-title>. In: <conf-name>Proceeding of the 2023 Second International Conference on Augmented Intelligence and Sustainable Systems (ICAISS)</conf-name>; <year>2023 Aug 23&#x2013;25</year>; <publisher-loc>Trichy, India</publisher-loc>. p. <fpage>1355</fpage>&#x2013;<lpage>61</lpage>. doi:<pub-id pub-id-type="doi">10.1109/ICAISS58487.2023.10250556</pub-id>.</mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Gueriani</surname> <given-names>A</given-names></string-name>, <string-name><surname>Kheddar</surname> <given-names>H</given-names></string-name>, <string-name><surname>Mazari</surname> <given-names>AC</given-names></string-name></person-group>. <article-title>Deep reinforcement learning for intrusion detection in IoT: a survey</article-title>. In: <conf-name>Proceeding of the 2023 2nd International Conference on Electronics, Energy and Measurement (IC2EM)</conf-name>; <year>2023 Nov 28&#x2013;30</year>; <publisher-loc>Medea, Algeria</publisher-loc>. p. <fpage>1</fpage>&#x2013;<lpage>7</lpage>. doi:<pub-id pub-id-type="doi">10.1109/IC2EM59347.2023.10419560</pub-id>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Lischke</surname> <given-names>C</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>T</given-names></string-name>, <string-name><surname>McCalmon</surname> <given-names>J</given-names></string-name>, <string-name><surname>Rahman</surname> <given-names>MA</given-names></string-name>, <string-name><surname>Halabi</surname> <given-names>T</given-names></string-name>, <string-name><surname>Alqahtani</surname> <given-names>S</given-names></string-name></person-group>. <article-title>LSTM-based anomalous behavior detection in multi-agent reinforcement learning</article-title>. In: <conf-name>Proceedings of the 2022 IEEE International Conference on Cyber Security and Resilience (CSR)</conf-name>; <year>2022 Jul 27&#x2013;29</year>; <publisher-loc>Rhodes, Greece</publisher-loc>. p. <fpage>16</fpage>&#x2013;<lpage>21</lpage>. doi:<pub-id pub-id-type="doi">10.1109/CSR54599.2022.9850343</pub-id>.</mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Sewak</surname> <given-names>M</given-names></string-name>, <string-name><surname>Sahay</surname> <given-names>SK</given-names></string-name>, <string-name><surname>Rathore</surname> <given-names>H</given-names></string-name></person-group>. <article-title>ADVERSARIALuscator: an adversarial-DRL based obfuscator and metamorphic malware swarm generator</article-title>. In: <conf-name>Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN)</conf-name>; <year>2021 Jul 18&#x2013;22</year>; <publisher-loc>Shenzhen, China</publisher-loc>. p. <fpage>1</fpage>&#x2013;<lpage>9</lpage>. doi:<pub-id pub-id-type="doi">10.1109/IJCNN52387.2021.9534016</pub-id>.</mixed-citation></ref>
<ref id="ref-23"><label>[23]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Luo</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Hou</surname> <given-names>T</given-names></string-name>, <string-name><surname>Nguyen</surname> <given-names>TT</given-names></string-name>, <string-name><surname>Zeng</surname> <given-names>H</given-names></string-name>, <string-name><surname>Lu</surname> <given-names>Z</given-names></string-name></person-group>. <article-title>Log analytics in HPC: a data-driven reinforcement learning framework</article-title>. In: <conf-name>Proceedings of the IEEE INFOCOM 2020&#x2014;IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)</conf-name>; <year>2020 Jul 6&#x2013;9</year>; <publisher-loc>Toronto, ON, Canada</publisher-loc>. p. <fpage>550</fpage>&#x2013;<lpage>5</lpage>. doi:<pub-id pub-id-type="doi">10.1109/INFOCOMWKSHPS50562.2020.9162664</pub-id>.</mixed-citation></ref>
<ref id="ref-24"><label>[24]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Sewak</surname> <given-names>M</given-names></string-name>, <string-name><surname>Sahay</surname> <given-names>SK</given-names></string-name>, <string-name><surname>Rathore</surname> <given-names>H</given-names></string-name></person-group>. <article-title>X-Swarm: adversarial DRL for metamorphic malware swarm generation</article-title>. In: <conf-name>Proceedings of the 2022 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops)</conf-name>; <year>2022 Mar 25</year>; <publisher-loc>Pisa, Italy</publisher-loc>. p. <fpage>169</fpage>&#x2013;<lpage>74</lpage>. doi:<pub-id pub-id-type="doi">10.1109/PerComWorkshops53856.2022.9767485</pub-id>.</mixed-citation></ref>
<ref id="ref-25"><label>[25]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Kuutti</surname> <given-names>S</given-names></string-name>, <string-name><surname>Fallah</surname> <given-names>S</given-names></string-name>, <string-name><surname>Bowden</surname> <given-names>R</given-names></string-name></person-group>. <article-title>Training adversarial agents to exploit weaknesses in deep control policies</article-title>. In: <conf-name>Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA)</conf-name>; <year>2020 May</year>; <publisher-loc>Paris, France</publisher-loc>. p. <fpage>108</fpage>&#x2013;<lpage>14</lpage>.</mixed-citation></ref>
<ref id="ref-26"><label>[26]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Limouchi</surname> <given-names>E</given-names></string-name>, <string-name><surname>Mahgoub</surname> <given-names>I</given-names></string-name></person-group>. <article-title>Reinforcement learning-assisted threshold optimization for dynamic honeypot adaptation to enhance IoBT networks security</article-title>. In: <conf-name>Proceedings of the 2021 IEEE Symposium Series on Computational Intelligence (SSCI)</conf-name>; <year>2021 Dec 5&#x2013;7</year>; <publisher-loc>Orlando, FL, USA</publisher-loc>. p. <fpage>1</fpage>&#x2013;<lpage>7</lpage>. doi:<pub-id pub-id-type="doi">10.1109/SSCI50451.2021.9660066</pub-id>.</mixed-citation></ref>
<ref id="ref-27"><label>[27]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Zhang</surname> <given-names>H</given-names></string-name>, <string-name><surname>Maple</surname> <given-names>C</given-names></string-name></person-group>. <article-title>Deep reinforcement learning-based intrusion detection in IoT system: a review</article-title>. In: <conf-name>Proceedings of the International Conference on AI and the Digital Economy (CADE 2023), Hybrid Conference</conf-name>; <year>2023 Jun 26&#x2013;28</year>; <publisher-loc>Venice, Italy</publisher-loc>. p. <fpage>88</fpage>&#x2013;<lpage>97</lpage>. doi:<pub-id pub-id-type="doi">10.1049/icp.2023.2577</pub-id>.</mixed-citation></ref>
<ref id="ref-28"><label>[28]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Caminero</surname> <given-names>G</given-names></string-name>, <string-name><surname>L&#x00F3;pez-Mart&#x00ED;n</surname> <given-names>M</given-names></string-name>, <string-name><surname>Carro</surname> <given-names>B</given-names></string-name></person-group>. <article-title>Adversarial environment reinforcement learning algorithm for intrusion detection</article-title>. <source>Comput Netw</source>. <year>2019</year>;<volume>159</volume>:<fpage>96</fpage>&#x2013;<lpage>109</lpage>.</mixed-citation></ref>
<ref id="ref-29"><label>[29]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Han</surname> <given-names>D</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Zhong</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>W</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>J</given-names></string-name>, <string-name><surname>Lu</surname> <given-names>S</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Evaluating and improving adversarial robustness of machine learning-based network intrusion detectors</article-title>. <source>IEEE J Sel Areas Commun</source>. <year>2021</year>;<volume>39</volume>(<issue>8</issue>):<fpage>2632</fpage>&#x2013;<lpage>47</lpage>.</mixed-citation></ref>
<ref id="ref-30"><label>[30]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Yang</surname> <given-names>K</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>J</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>C</given-names></string-name>, <string-name><surname>Fang</surname> <given-names>Y</given-names></string-name></person-group>. <article-title>Adversarial examples against the deep learning based network intrusion detection systems</article-title>. In: <conf-name>Proceedings of the MILCOM 2018-2018 IEEE Military Communications Conference (MILCOM)</conf-name>; <year>2018 Oct</year>; <publisher-loc>Los Angeles, CA, USA</publisher-loc>. p. <fpage>559</fpage>&#x2013;<lpage>64</lpage>.</mixed-citation></ref>
<ref id="ref-31"><label>[31]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Sethi</surname> <given-names>K</given-names></string-name>, <string-name><surname>Sai Rupesh</surname> <given-names>E</given-names></string-name>, <string-name><surname>Kumar</surname> <given-names>R</given-names></string-name>, <string-name><surname>Bera</surname> <given-names>P</given-names></string-name>, <string-name><surname>Venu Madhav</surname> <given-names>Y</given-names></string-name></person-group>. <article-title>A context-aware robust intrusion detection system: a reinforcement learning-based approach</article-title>. <source>Int J Inf Secur</source>. <year>2020</year>;<volume>19</volume>:<fpage>657</fpage>&#x2013;<lpage>78</lpage>.</mixed-citation></ref>
<ref id="ref-32"><label>[32]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Vaz</surname> <given-names>D</given-names></string-name>, <string-name><surname>Matos</surname> <given-names>DR</given-names></string-name>, <string-name><surname>Pardal</surname> <given-names>ML</given-names></string-name>, <string-name><surname>Correia</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Synthesis of fault-tolerant reliable broadcast algorithms with reinforcement learning</article-title>. <source>IEEE Access</source>. <year>2023</year>;<volume>11</volume>:<fpage>62394</fpage>&#x2013;<lpage>408</lpage>. doi:<pub-id pub-id-type="doi">10.1109/ACCESS.2023.3287405</pub-id>.</mixed-citation></ref>
<ref id="ref-33"><label>[33]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Jiang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>T</given-names></string-name>, <string-name><surname>Ho</surname> <given-names>D</given-names></string-name>, <string-name><surname>Bai</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>CK</given-names></string-name>, <string-name><surname>Levine</surname> <given-names>S</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>SimGAN: hybrid simulator identification for domain adaptation via adversarial reinforcement learning</article-title>. In: <conf-name>Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA)</conf-name>; <year>2021 May 30&#x2013;Jun 5</year>; <publisher-loc>Xi&#x2019;an, China</publisher-loc>. p. <fpage>2884</fpage>&#x2013;<lpage>90</lpage>. doi:<pub-id pub-id-type="doi">10.1109/ICRA48506.2021.9561731</pub-id>.</mixed-citation></ref>
<ref id="ref-34"><label>[34]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Abu Al-Haija</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Droos</surname> <given-names>A</given-names></string-name></person-group>. <article-title>Resilient intrusion detection system for adversarial attacks on Low-Rate DDoS</article-title>. <source>Int J Mach Learn Cyber</source>. <year>2025</year>;<volume>16</volume>(<issue>10</issue>):<fpage>8473</fpage>&#x2013;<lpage>502</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s13042-025-02734-6</pub-id>.</mixed-citation></ref>
<ref id="ref-35"><label>[35]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>L&#x00F3;pez-Mart&#x00ED;n</surname> <given-names>M</given-names></string-name>, <string-name><surname>Carro</surname> <given-names>B</given-names></string-name>, <string-name><surname>S&#x00E1;nchez-Esguevillas</surname> <given-names>A</given-names></string-name></person-group>. <article-title>Application of deep reinforcement learning to intrusion detection for supervised problems</article-title>. <source>Expert Syst Appl</source>. <year>2020</year>;<volume>141</volume>:<fpage>112963</fpage>.</mixed-citation></ref>
<ref id="ref-36"><label>[36]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Wu</surname> <given-names>D</given-names></string-name>, <string-name><surname>Fang</surname> <given-names>B</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>J</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Cui</surname> <given-names>X</given-names></string-name></person-group>. <article-title>Evading machine learning botnet detection models via deep reinforcement learning</article-title>. In: <conf-name>Proceedings of the ICC 2019&#x2014;2019 IEEE International Conference on Communications (ICC)</conf-name>; <year>2019 May 20&#x2013;24</year>; <publisher-loc>Shanghai, China</publisher-loc>. p. <fpage>1</fpage>&#x2013;<lpage>6</lpage>. doi:<pub-id pub-id-type="doi">10.1109/ICC.2019.8761337</pub-id>.</mixed-citation></ref>
<ref id="ref-37"><label>[37]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Chen</surname> <given-names>T</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>J</given-names></string-name>, <string-name><surname>Xiang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Niu</surname> <given-names>W</given-names></string-name>, <string-name><surname>Tong</surname> <given-names>E</given-names></string-name>, <string-name><surname>Han</surname> <given-names>Z</given-names></string-name></person-group>. <article-title>Adversarial attack and defense in reinforcement learning from an AI security view</article-title>. <source>Cybersecurity</source>. <year>2019</year>;<volume>2</volume>:<fpage>1</fpage>&#x2013;<lpage>22</lpage>.</mixed-citation></ref>
<ref id="ref-38"><label>[38]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Gleave</surname> <given-names>A</given-names></string-name>, <string-name><surname>Dennis</surname> <given-names>M</given-names></string-name>, <string-name><surname>Wild</surname> <given-names>C</given-names></string-name>, <string-name><surname>Kant</surname> <given-names>N</given-names></string-name>, <string-name><surname>Levine</surname> <given-names>S</given-names></string-name>, <string-name><surname>Russell</surname> <given-names>S</given-names></string-name></person-group>. <article-title>Adversarial policies: attacking deep reinforcement learning</article-title>. <comment>arXiv:1905.10615. 2019</comment>.</mixed-citation></ref>
<ref id="ref-39"><label>[39]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Oikarinen</surname> <given-names>T</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>W</given-names></string-name>, <string-name><surname>Megretski</surname> <given-names>A</given-names></string-name>, <string-name><surname>Daniel</surname> <given-names>L</given-names></string-name>, <string-name><surname>Weng</surname> <given-names>TW</given-names></string-name></person-group>. <article-title>Robust deep reinforcement learning through adversarial loss</article-title>. <source>Adv Neural Inf Process Syst</source>. <year>2021</year>;<volume>34</volume>:<fpage>26156</fpage>&#x2013;<lpage>67</lpage>.</mixed-citation></ref>
<ref id="ref-40"><label>[40]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Khaleel</surname> <given-names>YL</given-names></string-name>, <string-name><surname>Habeeb</surname> <given-names>MA</given-names></string-name>, <string-name><surname>Albahri</surname> <given-names>AS</given-names></string-name>, <string-name><surname>Al-Quraishi</surname> <given-names>T</given-names></string-name>, <string-name><surname>Albahri</surname> <given-names>OS</given-names></string-name>, <string-name><surname>Alamoodi</surname> <given-names>AH</given-names></string-name></person-group>. <article-title>Network and cybersecurity applications of defense in adversarial attacks: a state-of-the-art using machine learning and deep learning methods</article-title>. <source>J Intell Syst</source>. <year>2024</year>;<volume>33</volume>(<issue>1</issue>):<fpage>20240153</fpage>. doi:<pub-id pub-id-type="doi">10.1515/jisys-2024-0153</pub-id>.</mixed-citation></ref>
<ref id="ref-41"><label>[41]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Wang</surname> <given-names>N</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Xiao</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Hu</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Lou</surname> <given-names>W</given-names></string-name>, <string-name><surname>Hou</surname> <given-names>YT</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>MANDA: on adversarial example detection for network intrusion detection system</article-title>. <source>IEEE Trans Dependable Secure Comput</source>. <year>2023</year>;<volume>20</volume>(<issue>2</issue>):<fpage>1139</fpage>&#x2013;<lpage>53</lpage>. doi:<pub-id pub-id-type="doi">10.1109/TDSC.2022.3148990</pub-id>.</mixed-citation></ref>
<ref id="ref-42"><label>[42]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Alsulami</surname> <given-names>AA</given-names></string-name>, <string-name><surname>Abu Al-Haija</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Tayeb</surname> <given-names>A</given-names></string-name>, <string-name><surname>Alqahtani</surname> <given-names>A</given-names></string-name></person-group>. <article-title>An intrusion detection and classification system for IoT traffic with improved data engineering</article-title>. <source>Appl Sci</source>. <year>2022</year>;<volume>12</volume>(<issue>23</issue>):<fpage>12336</fpage>. doi:<pub-id pub-id-type="doi">10.3390/app122312336</pub-id>.</mixed-citation></ref>
<ref id="ref-43"><label>[43]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Liu</surname> <given-names>H</given-names></string-name>, <string-name><surname>Lang</surname> <given-names>B</given-names></string-name></person-group>. <article-title>Machine learning and deep learning methods for intrusion detection systems: a survey</article-title>. <source>Appl Sci</source>. <year>2019</year>;<volume>9</volume>(<issue>20</issue>):<fpage>4396</fpage>.</mixed-citation></ref>
<ref id="ref-44"><label>[44]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Alhajjar</surname> <given-names>E</given-names></string-name>, <string-name><surname>Maxwell</surname> <given-names>P</given-names></string-name>, <string-name><surname>Bastian</surname> <given-names>N</given-names></string-name></person-group>. <article-title>Adversarial machine learning in network intrusion detection systems</article-title>. <source>Expert Syst Appl</source>. <year>2021</year>;<volume>186</volume>:<fpage>115782</fpage>.</mixed-citation></ref>
<ref id="ref-45"><label>[45]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Albulayhi</surname> <given-names>K</given-names></string-name>, <string-name><surname>Abu Al-Haija</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Alsuhibany</surname> <given-names>SA</given-names></string-name>, <string-name><surname>Jillepalli</surname> <given-names>AA</given-names></string-name>, <string-name><surname>Ashrafuzzaman</surname> <given-names>M</given-names></string-name>, <string-name><surname>Sheldon</surname> <given-names>FT</given-names></string-name></person-group>. <article-title>IoT intrusion detection using machine learning with a novel high-performing feature selection method</article-title>. <source>Appl Sci</source>. <year>2022</year>;<volume>12</volume>(<issue>10</issue>):<fpage>5015</fpage>. doi:<pub-id pub-id-type="doi">10.3390/app12105015</pub-id>.</mixed-citation></ref>
<ref id="ref-46"><label>[46]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Zhang</surname> <given-names>H</given-names></string-name>, <string-name><surname>Yu</surname> <given-names>X</given-names></string-name>, <string-name><surname>Ren</surname> <given-names>P</given-names></string-name>, <string-name><surname>Luo</surname> <given-names>C</given-names></string-name>, <string-name><surname>Min</surname> <given-names>G</given-names></string-name></person-group>. <article-title>Deep adversarial learning in intrusion detection: a data augmentation enhanced framework</article-title>. <comment>arXiv:1901.07949. 2019</comment>.</mixed-citation></ref>
<ref id="ref-47"><label>[47]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Yang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Zheng</surname> <given-names>K</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>B</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>X</given-names></string-name></person-group>. <article-title>Network intrusion detection is based on a supervised adversarial variational auto-encoder with regularization</article-title>. <source>IEEE Access</source>. <year>2020</year>;<volume>8</volume>:<fpage>42169</fpage>&#x2013;<lpage>84</lpage>.</mixed-citation></ref>
<ref id="ref-48"><label>[48]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Khazane</surname> <given-names>H</given-names></string-name>, <string-name><surname>Ridouani</surname> <given-names>M</given-names></string-name>, <string-name><surname>Salahdine</surname> <given-names>F</given-names></string-name>, <string-name><surname>Kaabouch</surname> <given-names>N</given-names></string-name></person-group>. <article-title>A holistic review of machine learning adversarial attacks in IoT networks</article-title>. <source>Future Internet</source>. <year>2024</year>;<volume>16</volume>(<issue>1</issue>):<fpage>32</fpage>. doi:<pub-id pub-id-type="doi">10.3390/fi16010032</pub-id>.</mixed-citation></ref>
<ref id="ref-49"><label>[49]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Agarwal</surname> <given-names>I</given-names></string-name>, <string-name><surname>Singh</surname> <given-names>A</given-names></string-name>, <string-name><surname>Agarwal</surname> <given-names>A</given-names></string-name>, <string-name><surname>Mishra</surname> <given-names>S</given-names></string-name>, <string-name><surname>Satapathy</surname> <given-names>SK</given-names></string-name>, <string-name><surname>Cho</surname> <given-names>S-B</given-names></string-name></person-group>. <article-title>Enhancing road safety and cybersecurity in traffic management systems: leveraging the potential of reinforcement learning</article-title>. <source>IEEE Access</source>. <year>2024</year>;<volume>12</volume>:<fpage>9963</fpage>&#x2013;<lpage>75</lpage>. doi:<pub-id pub-id-type="doi">10.1109/ACCESS.2024.3350271</pub-id>.</mixed-citation></ref>
<ref id="ref-50"><label>[50]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Abu Al-Haija</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Al Badawi</surname> <given-names>A</given-names></string-name>, <string-name><surname>Bojja</surname> <given-names>GR</given-names></string-name></person-group>. <article-title>Boost-Defence for resilient IoT networks: a head-to-toe approach</article-title>. <source>Expert Syst</source>. <year>2022</year>;<volume>39</volume>(<issue>10</issue>):<fpage>e12934</fpage>. doi:<pub-id pub-id-type="doi">10.1111/exsy.12934</pub-id>.</mixed-citation></ref>
<ref id="ref-51"><label>[51]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Pasikhani</surname> <given-names>AM</given-names></string-name>, <string-name><surname>Clark</surname> <given-names>JA</given-names></string-name>, <string-name><surname>Gope</surname> <given-names>P</given-names></string-name></person-group>. <article-title>Adversarial RL-based IDS for evolving data environment in 6LoWPAN</article-title>. <source>IEEE Trans Inf Forensics Secur</source>. <year>2022</year>;<volume>17</volume>:<fpage>3831</fpage>&#x2013;<lpage>46</lpage>. doi:<pub-id pub-id-type="doi">10.1109/TIFS.2022.3214099</pub-id>.</mixed-citation></ref>
<ref id="ref-52"><label>[52]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Usama</surname> <given-names>M</given-names></string-name>, <string-name><surname>Qadir</surname> <given-names>J</given-names></string-name>, <string-name><surname>Al-Fuqaha</surname> <given-names>A</given-names></string-name>, <string-name><surname>Hamdi</surname> <given-names>M</given-names></string-name></person-group>. <article-title>The adversarial machine learning conundrum: can the insecurity of ML become the Achilles&#x2019; Heel of cognitive networks?</article-title> <source>IEEE Net</source>. <year>2020</year>;<volume>34</volume>(<issue>1</issue>):<fpage>196</fpage>&#x2013;<lpage>203</lpage>. doi:<pub-id pub-id-type="doi">10.1109/MNET.001.1900197</pub-id>.</mixed-citation></ref>
<ref id="ref-53"><label>[53]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Abu Al-Haija</surname> <given-names>Q</given-names></string-name></person-group>. <article-title>Top-down machine learning-based architecture for cyberattacks identification and classification in IoT communication networks</article-title>. <source>Front Big Data</source>. <year>2022</year>;<volume>4</volume>:<fpage>782902</fpage>. doi:<pub-id pub-id-type="doi">10.3389/fdata.2021.782902</pub-id>; <pub-id pub-id-type="pmid">35098112</pub-id></mixed-citation></ref>
<ref id="ref-54"><label>[54]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Xiong</surname> <given-names>M</given-names></string-name>, <string-name><surname>Li</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Gu</surname> <given-names>L</given-names></string-name>, <string-name><surname>Pan</surname> <given-names>S</given-names></string-name>, <string-name><surname>Zeng</surname> <given-names>D</given-names></string-name>, <string-name><surname>Li</surname> <given-names>P</given-names></string-name></person-group>. <article-title>Reinforcement learning empowered IDPS for vehicular networks in edge computing</article-title>. <source>IEEE Netw</source>. <year>2020</year>;<volume>34</volume>(<issue>3</issue>):<fpage>57</fpage>&#x2013;<lpage>63</lpage>. doi:<pub-id pub-id-type="doi">10.1109/MNET.011.1900321</pub-id>.</mixed-citation></ref>
<ref id="ref-55"><label>[55]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Abu Al-Haija</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Al Badawi</surname> <given-names>A</given-names></string-name></person-group>. <article-title>High-performance intrusion detection system for networked UAVs via deep learning</article-title>. <source>Neural Comput Applic</source>. <year>2022</year>;<volume>34</volume>(<issue>13</issue>):<fpage>10885</fpage>&#x2013;<lpage>900</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s00521-022-07015-9</pub-id>.</mixed-citation></ref>
<ref id="ref-56"><label>[56]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>El-Toukhy</surname> <given-names>AT</given-names></string-name>, <string-name><surname>Mahmoud</surname> <given-names>MMEA</given-names></string-name>, <string-name><surname>Bondok</surname> <given-names>AH</given-names></string-name>, <string-name><surname>Fouda</surname> <given-names>MM</given-names></string-name>, <string-name><surname>Alsabaan</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Countering evasion attacks for smart grid reinforcement learning-based detectors</article-title>. <source>IEEE Access</source>. <year>2023</year>;<volume>11</volume>:<fpage>97373</fpage>&#x2013;<lpage>90</lpage>. doi:<pub-id pub-id-type="doi">10.1109/ACCESS.2023.3312376</pub-id>.</mixed-citation></ref>
<ref id="ref-57"><label>[57]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Nguyen</surname> <given-names>TT</given-names></string-name>, <string-name><surname>Reddi</surname> <given-names>VJ</given-names></string-name></person-group>. <article-title>Deep reinforcement learning for cyber security</article-title>. <source>IEEE Trans Neural Netw Learn Syst</source>. <year>2021</year>;<volume>34</volume>(<issue>8</issue>):<fpage>3779</fpage>&#x2013;<lpage>95</lpage>.</mixed-citation></ref>
<ref id="ref-58"><label>[58]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Sun</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>S</given-names></string-name>, <string-name><surname>Tang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Hsieh</surname> <given-names>T-Y</given-names></string-name>, <string-name><surname>Honavar</surname> <given-names>V</given-names></string-name></person-group>. <article-title>Adversarial attacks on graph neural networks via node injections: a hierarchical reinforcement learning approach</article-title>. In: <conf-name>Proceedings of the Web Conference 2020 (WWW&#x2019;20)</conf-name>. <publisher-loc>New York, NY, USA</publisher-loc>: <publisher-name>ACM</publisher-name>; <year>2020</year>. p. <fpage>673</fpage>&#x2013;<lpage>83</lpage>. doi:<pub-id pub-id-type="doi">10.1145/3366423.3380149</pub-id>.</mixed-citation></ref>
<ref id="ref-59"><label>[59]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Mo</surname> <given-names>K</given-names></string-name>, <string-name><surname>Tang</surname> <given-names>W</given-names></string-name>, <string-name><surname>Li</surname> <given-names>J</given-names></string-name>, <string-name><surname>Yuan</surname> <given-names>X</given-names></string-name></person-group>. <article-title>Attacking deep reinforcement learning with decoupled adversarial policy</article-title>. <source>IEEE Trans Dependable Secure Comput</source>. <year>2023</year>;<volume>20</volume>(<issue>1</issue>):<fpage>758</fpage>&#x2013;<lpage>68</lpage>. doi:<pub-id pub-id-type="doi">10.1109/TDSC.2022.3143566</pub-id>.</mixed-citation></ref>
<ref id="ref-60"><label>[60]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Li</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Tam</surname> <given-names>V</given-names></string-name>, <string-name><surname>Yeung</surname> <given-names>KL</given-names></string-name></person-group>. <article-title>Developing a multi-agent and self-adaptive framework with deep reinforcement learning for dynamic portfolio risk management</article-title>. <comment>arXiv:2402.00515. 2024</comment>.</mixed-citation></ref>
<ref id="ref-61"><label>[61]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Apruzzese</surname> <given-names>G</given-names></string-name>, <string-name><surname>Andreolini</surname> <given-names>M</given-names></string-name>, <string-name><surname>Marchetti</surname> <given-names>M</given-names></string-name>, <string-name><surname>Venturi</surname> <given-names>A</given-names></string-name>, <string-name><surname>Colajanni</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Deep reinforcement adversarial learning against botnet evasion attacks</article-title>. <source>IEEE Trans Netw Serv Manag</source>. <year>2020</year>;<volume>17</volume>(<issue>4</issue>):<fpage>1975</fpage>&#x2013;<lpage>87</lpage>.</mixed-citation></ref>
<ref id="ref-62"><label>[62]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Pan</surname> <given-names>X</given-names></string-name>, <string-name><surname>Seita</surname> <given-names>D</given-names></string-name>, <string-name><surname>Gao</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Canny</surname> <given-names>J</given-names></string-name></person-group>. <article-title>Risk averse robust adversarial reinforcement learning</article-title>. In: <conf-name>Proceedings of the 2019 International Conference on Robotics and Automation (ICRA)</conf-name>; <year>2019 May 20&#x2013;24</year>; <publisher-loc>Montreal, QC, Canada</publisher-loc>. p. <fpage>8522</fpage>&#x2013;<lpage>8</lpage>. doi:<pub-id pub-id-type="doi">10.1109/ICRA.2019.8794293</pub-id>.</mixed-citation></ref>
<ref id="ref-63"><label>[63]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Zhang</surname> <given-names>H</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>H</given-names></string-name>, <string-name><surname>Boning</surname> <given-names>D</given-names></string-name>, <string-name><surname>Hsieh</surname> <given-names>CJ</given-names></string-name></person-group>. <article-title>Robust reinforcement learning on state observations with learned optimal adversary</article-title>. <comment>arXiv:2101.08452. 2021</comment>.</mixed-citation></ref>
<ref id="ref-64"><label>[64]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Sun</surname> <given-names>J</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>T</given-names></string-name>, <string-name><surname>Xie</surname> <given-names>X</given-names></string-name>, <string-name><surname>Ma</surname> <given-names>L</given-names></string-name>, <string-name><surname>Zheng</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>K</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Stealthy and efficient adversarial attacks against deep reinforcement learning</article-title>. <source>Proc AAAI Conf Artif Intell</source>. <year>2020</year>;<volume>34</volume>(<issue>4</issue>):<fpage>5883</fpage>&#x2013;<lpage>91</lpage>.</mixed-citation></ref>
<ref id="ref-65"><label>[65]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Kiran</surname> <given-names>KS</given-names></string-name>, <string-name><surname>Devisetty</surname> <given-names>RK</given-names></string-name>, <string-name><surname>Kalyan</surname> <given-names>NP</given-names></string-name>, <string-name><surname>Mukundini</surname> <given-names>K</given-names></string-name>, <string-name><surname>Karthi</surname> <given-names>R</given-names></string-name></person-group>. <article-title>Building an intrusion detection system for an IoT environment using machine learning techniques</article-title>. <source>Procedia Comput Sci</source>. <year>2020</year>;<volume>171</volume>:<fpage>2372</fpage>&#x2013;<lpage>9</lpage>.</mixed-citation></ref>
<ref id="ref-66"><label>[66]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Chavali</surname> <given-names>L</given-names></string-name>, <string-name><surname>Gupta</surname> <given-names>T</given-names></string-name>, <string-name><surname>Saxena</surname> <given-names>P</given-names></string-name></person-group>. <article-title>SAC-AP: soft actor critic based deep reinforcement learning for alert prioritization</article-title>. In: <conf-name>Proceedings of the 2022 IEEE Congress on Evolutionary Computation (CEC)</conf-name>; <year>2022 Jul 18&#x2013;23</year>; <publisher-loc>Padua, Italy</publisher-loc>. p. <fpage>1</fpage>&#x2013;<lpage>8</lpage>. doi:<pub-id pub-id-type="doi">10.1109/CEC55065.2022.9870423</pub-id>.</mixed-citation></ref>
<ref id="ref-67"><label>[67]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Mouyart</surname> <given-names>M</given-names></string-name>, <string-name><surname>Guilherme</surname> <given-names>MM</given-names></string-name>, <string-name><surname>Jae-Yun</surname> <given-names>J</given-names></string-name></person-group>. <article-title>A multi-agent intrusion detection system optimized by a deep reinforcement learning approach with a dataset enlarged using a generative model to reduce the bias effect</article-title>. <source>J Sens Actuator Netw</source>. <year>2023</year>;<volume>12</volume>(<issue>5</issue>):<fpage>68</fpage>. doi:<pub-id pub-id-type="doi">10.3390/jsan12050068</pub-id>.</mixed-citation></ref>
<ref id="ref-68"><label>[68]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Ren</surname> <given-names>K</given-names></string-name>, <string-name><surname>Zeng</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Zhong</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Sheng</surname> <given-names>B</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>Y</given-names></string-name></person-group>. <article-title>MAFSIDS: a reinforcement learning-based intrusion detection model for multi-agent feature selection networks</article-title>. <source>J Big Data</source>. <year>2023</year>;<volume>10</volume>(<issue>1</issue>):<fpage>137</fpage>. doi:<pub-id pub-id-type="doi">10.1186/s40537-023-00814-4</pub-id>.</mixed-citation></ref>
<ref id="ref-69"><label>[69]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Tharewal</surname> <given-names>S</given-names></string-name></person-group>. <article-title>Intrusion detection system for industrial internet of things based on deep reinforcement learning</article-title>. <source>Wirel Commun Mob Comput</source>. <year>2022</year>;<volume>2022</volume>:<fpage>9023719</fpage>. doi:<pub-id pub-id-type="doi">10.1155/2022/9023719</pub-id>.</mixed-citation></ref>
<ref id="ref-70"><label>[70]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhang</surname> <given-names>H</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>H</given-names></string-name>, <string-name><surname>Xiao</surname> <given-names>C</given-names></string-name>, <string-name><surname>Li</surname> <given-names>B</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>M</given-names></string-name>, <string-name><surname>Boning</surname> <given-names>D</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Robust deep reinforcement learning against adversarial perturbations on state observations</article-title>. <source>Adv Neural Inf Process Syst</source>. <year>2020</year>;<volume>33</volume>:<fpage>21024</fpage>&#x2013;<lpage>37</lpage>.</mixed-citation></ref>
<ref id="ref-71"><label>[71]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Benaddi</surname> <given-names>H</given-names></string-name>, <string-name><surname>Jouhari</surname> <given-names>M</given-names></string-name>, <string-name><surname>Ibrahimi</surname> <given-names>K</given-names></string-name>, <string-name><surname>Ben Othman</surname> <given-names>J</given-names></string-name>, <string-name><surname>Amhoud</surname> <given-names>EM</given-names></string-name></person-group>. <article-title>Anomaly detection in industrial IoT using distributional reinforcement learning and generative adversarial networks</article-title>. <source>Sensors</source>. <year>2022</year>;<volume>22</volume>(<issue>21</issue>):<fpage>8085</fpage>. doi:<pub-id pub-id-type="doi">10.3390/s22218085</pub-id>; <pub-id pub-id-type="pmid">36365782</pub-id></mixed-citation></ref>
<ref id="ref-72"><label>[72]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Alavizadeh</surname> <given-names>H</given-names></string-name>, <string-name><surname>Alavizadeh</surname> <given-names>H</given-names></string-name>, <string-name><surname>Jang-Jaccard</surname> <given-names>J</given-names></string-name></person-group>. <article-title>Deep Q-learning based reinforcement learning approach for network intrusion detection</article-title>. <source>Computers</source>. <year>2022</year>;<volume>11</volume>(<issue>3</issue>):<fpage>41</fpage>. doi:<pub-id pub-id-type="doi">10.3390/computers11030041</pub-id>.</mixed-citation></ref>
<ref id="ref-73"><label>[73]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Yang</surname> <given-names>B</given-names></string-name>, <string-name><surname>Arshad</surname> <given-names>MH</given-names></string-name>, <string-name><surname>Zhao</surname> <given-names>Q</given-names></string-name></person-group>. <article-title>Packet-level and flow-level network intrusion detection based on reinforcement learning and adversarial training</article-title>. <source>Algorithms</source>. <year>2022</year>;<volume>15</volume>(<issue>12</issue>):<fpage>453</fpage>. doi:<pub-id pub-id-type="doi">10.3390/a15120453</pub-id>.</mixed-citation></ref>
<ref id="ref-74"><label>[74]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Salam</surname> <given-names>A</given-names></string-name>, <string-name><surname>Ullah</surname> <given-names>F</given-names></string-name>, <string-name><surname>Amin</surname> <given-names>F</given-names></string-name>, <string-name><surname>Abrar</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Deep learning techniques for web-based attack detection in industry 5.0: a novel approach</article-title>. <source>Technologies</source>. <year>2023</year>;<volume>11</volume>(<issue>4</issue>):<fpage>107</fpage>. doi:<pub-id pub-id-type="doi">10.3390/technologies11040107</pub-id>.</mixed-citation></ref>
<ref id="ref-75"><label>[75]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Oh</surname> <given-names>SH</given-names></string-name>, <string-name><surname>Kim</surname> <given-names>J</given-names></string-name>, <string-name><surname>Nah</surname> <given-names>JH</given-names></string-name>, <string-name><surname>Park</surname> <given-names>J</given-names></string-name></person-group>. <article-title>Employing deep reinforcement learning to cyber-attack simulation for enhancing cybersecurity</article-title>. <source>Electronics</source>. <year>2024</year>;<volume>13</volume>(<issue>3</issue>):<fpage>555</fpage>. doi:<pub-id pub-id-type="doi">10.3390/electronics13030555</pub-id>.</mixed-citation></ref>
<ref id="ref-76"><label>[76]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Jamjoom</surname> <given-names>M</given-names></string-name>, <string-name><surname>Ullah</surname> <given-names>Z</given-names></string-name></person-group>. <article-title>Double deep Q-network next-generation cyber-physical systems: a reinforcement learning-enabled anomaly detection framework for next-generation cyber-physical systems</article-title>. <source>Electronics</source>. <year>2023</year>;<volume>12</volume>(<issue>17</issue>):<fpage>3632</fpage>. doi:<pub-id pub-id-type="doi">10.3390/electronics12173632</pub-id>.</mixed-citation></ref>
<ref id="ref-77"><label>[77]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Elaziz</surname> <given-names>EA</given-names></string-name>, <string-name><surname>Fathalla</surname> <given-names>R</given-names></string-name>, <string-name><surname>Shaheen</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Deep reinforcement learning for data-efficient weakly supervised business process anomaly detection</article-title>. <source>J Big Data</source>. <year>2023</year>;<volume>10</volume>(<issue>1</issue>):<fpage>33</fpage>. doi:<pub-id pub-id-type="doi">10.1186/s40537-023-00708-5</pub-id>.</mixed-citation></ref>
<ref id="ref-78"><label>[78]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Lin</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Ming</surname> <given-names>R</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>K</given-names></string-name>, <string-name><surname>Luo</surname> <given-names>H</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>Y</given-names></string-name></person-group>. <article-title>Privacy-enhanced intrusion detection and defense for cyber-physical systems: a deep reinforcement learning approach</article-title>. <source>Secur Commun Netw</source>. <year>2022</year>;<volume>2022</volume>:<fpage>1</fpage>&#x2013;<lpage>9</lpage>. doi:<pub-id pub-id-type="doi">10.1155/2022/4996427</pub-id>.</mixed-citation></ref>
<ref id="ref-79"><label>[79]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Ren</surname> <given-names>K</given-names></string-name>, <string-name><surname>Zeng</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Cao</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>Y</given-names></string-name></person-group>. <article-title>ID-RDRL: a deep reinforcement learning-based feature selection intrusion detection model</article-title>. <source>Sci Rep</source>. <year>2022</year>. doi:<pub-id pub-id-type="doi">10.21203/rs.3.rs-1765453/v1</pub-id>.</mixed-citation></ref>
<ref id="ref-80"><label>[80]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Abedzadeh</surname> <given-names>N</given-names></string-name>, <string-name><surname>Jacobs</surname> <given-names>M</given-names></string-name></person-group>. <article-title>A reinforcement learning framework with oversampling and undersampling algorithms for intrusion detection systems</article-title>. <source>Appl Sci</source>. <year>2023</year>;<volume>13</volume>(<issue>20</issue>):<fpage>11275</fpage>. doi:<pub-id pub-id-type="doi">10.3390/app132011275</pub-id>.</mixed-citation></ref>
<ref id="ref-81"><label>[81]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Maci</surname> <given-names>A</given-names></string-name>, <string-name><surname>Santorsola</surname> <given-names>A</given-names></string-name>, <string-name><surname>Coscia</surname> <given-names>A</given-names></string-name>, <string-name><surname>Iannacone</surname> <given-names>A</given-names></string-name></person-group>. <article-title>Unbalanced web phishing classification through deep reinforcement learning</article-title>. <source>Computers</source>. <year>2023</year>;<volume>12</volume>(<issue>6</issue>):<fpage>118</fpage>.</mixed-citation></ref>
<ref id="ref-82"><label>[82]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Gao</surname> <given-names>J</given-names></string-name>, <string-name><surname>Hu</surname> <given-names>C</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>L</given-names></string-name>, <string-name><surname>Ding</surname> <given-names>N</given-names></string-name></person-group>. <article-title>Data validity analysis based on reinforcement learning for mixed types of anomalies coexistence in intelligent connected vehicle (ICV)</article-title>. <source>Electronics</source>. <year>2024</year>;<volume>13</volume>(<issue>2</issue>):<fpage>444</fpage>. doi:<pub-id pub-id-type="doi">10.3390/electronics13020444</pub-id>.</mixed-citation></ref>
<ref id="ref-83"><label>[83]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Oh</surname> <given-names>SH</given-names></string-name>, <string-name><surname>Jeong</surname> <given-names>MK</given-names></string-name>, <string-name><surname>Kim</surname> <given-names>HC</given-names></string-name>, <string-name><surname>Park</surname> <given-names>J</given-names></string-name></person-group>. <article-title>Applying reinforcement learning for enhanced cybersecurity against adversarial simulation</article-title>. <source>Sensors</source>. <year>2023</year>;<volume>23</volume>(<issue>6</issue>):<fpage>3000</fpage>. doi:<pub-id pub-id-type="doi">10.3390/s23063000</pub-id>; <pub-id pub-id-type="pmid">36991711</pub-id></mixed-citation></ref>
<ref id="ref-84"><label>[84]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Dini</surname> <given-names>P</given-names></string-name>, <string-name><surname>Elhanashi</surname> <given-names>A</given-names></string-name>, <string-name><surname>Begni</surname> <given-names>A</given-names></string-name>, <string-name><surname>Saponara</surname> <given-names>S</given-names></string-name>, <string-name><surname>Zheng</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Gasmi</surname> <given-names>K</given-names></string-name></person-group>. <article-title>Overview on intrusion detection systems design exploiting machine learning for networking cybersecurity</article-title>. <source>Appl Sci</source>. <year>2023</year>;<volume>13</volume>(<issue>13</issue>):<fpage>7507</fpage>. doi:<pub-id pub-id-type="doi">10.3390/app13137507</pub-id>.</mixed-citation></ref>
<ref id="ref-85"><label>[85]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Han</surname> <given-names>H</given-names></string-name>, <string-name><surname>Kim</surname> <given-names>H</given-names></string-name>, <string-name><surname>Kim</surname> <given-names>Y</given-names></string-name></person-group>. <article-title>Correlation between deep neural network hidden layer and intrusion detection performance in IoT intrusion detection system</article-title>. <source>Symmetry</source>. <year>2022</year>;<volume>14</volume>(<issue>10</issue>):<fpage>2077</fpage>. doi:<pub-id pub-id-type="doi">10.3390/sym14102077</pub-id>.</mixed-citation></ref>
<ref id="ref-86"><label>[86]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>de Caldas Filho</surname> <given-names>FL</given-names></string-name>, <string-name><surname>Soares</surname> <given-names>SCM</given-names></string-name>, <string-name><surname>Oroski</surname> <given-names>E</given-names></string-name>, <string-name><surname>de Oliveira Albuquerque</surname> <given-names>R</given-names></string-name>, <string-name><surname>da Mata</surname> <given-names>RZA</given-names></string-name>, <string-name><surname>de Mendon&#x00E7;a</surname> <given-names>FLL</given-names></string-name></person-group>. <article-title>Botnet detection and mitigation model for IoT networks using federated learning</article-title>. <source>Sensors</source>. <year>2023</year>;<volume>23</volume>(<issue>14</issue>):<fpage>6305</fpage>. doi:<pub-id pub-id-type="doi">10.3390/s23146305</pub-id>; <pub-id pub-id-type="pmid">37514600</pub-id></mixed-citation></ref>
<ref id="ref-87"><label>[87]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Ellappan</surname> <given-names>V</given-names></string-name>, <string-name><surname>Mahendran</surname> <given-names>A</given-names></string-name>, <string-name><surname>Subramanian</surname> <given-names>M</given-names></string-name>, <string-name><surname>Jotheeswaran</surname> <given-names>J</given-names></string-name>, <string-name><surname>Khadidos</surname> <given-names>AO</given-names></string-name>, <string-name><surname>Khadidos</surname> <given-names>AO</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Sliding principal component and dynamic reward reinforcement learning based IIoT attack detection</article-title>. <source>Sci Rep</source>. <year>2023</year>;<volume>13</volume>(<issue>1</issue>):<fpage>20843</fpage>. doi:<pub-id pub-id-type="doi">10.1038/s41598-023-46746-0</pub-id>; <pub-id pub-id-type="pmid">38012161</pub-id></mixed-citation></ref>
<ref id="ref-88"><label>[88]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Madhuri</surname> <given-names>KS</given-names></string-name>, <string-name><surname>Mungara</surname> <given-names>J</given-names></string-name></person-group>. <article-title>Adaptive reinforcement learning with Dij-Huff method to secure optimal route in smart healthcare system</article-title>. <source>Cardiometry</source>. <year>2022</year>;(<issue>25</issue>):<fpage>1131</fpage>&#x2013;<lpage>9</lpage>.</mixed-citation></ref>
<ref id="ref-89"><label>[89]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Mahjoub</surname> <given-names>C</given-names></string-name>, <string-name><surname>Hamdi</surname> <given-names>M</given-names></string-name>, <string-name><surname>Alkanhel</surname> <given-names>R</given-names></string-name>, <string-name><surname>Mohamed</surname> <given-names>S</given-names></string-name>, <string-name><surname>Ejbali</surname> <given-names>R</given-names></string-name></person-group>. <article-title>An adversarial environment reinforcement learning-driven intrusion detection algorithm for internet of things</article-title>. Preprint. <year>2023</year>. doi:<pub-id pub-id-type="doi">10.21203/rs.3.rs-3427876/v1</pub-id>.</mixed-citation></ref>
<ref id="ref-90"><label>[90]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Tong</surname> <given-names>L</given-names></string-name>, <string-name><surname>Laszka</surname> <given-names>A</given-names></string-name>, <string-name><surname>Yan</surname> <given-names>C</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>N</given-names></string-name>, <string-name><surname>Vorobeychik</surname> <given-names>Y</given-names></string-name></person-group>. <article-title>Finding needles in a moving haystack: prioritizing alerts with adversarial reinforcement learning</article-title>. <source>Proc AAAI Conf Artif Intell</source>. <year>2020</year>;<volume>34</volume>(<issue>1</issue>):<fpage>946</fpage>&#x2013;<lpage>53</lpage>. doi:<pub-id pub-id-type="doi">10.1609/aaai.v34i01.5442</pub-id>.</mixed-citation></ref>
<ref id="ref-91"><label>[91]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Shi</surname> <given-names>G</given-names></string-name>, <string-name><surname>He</surname> <given-names>G</given-names></string-name></person-group>. <article-title>Collaborative multi-agent reinforcement learning for intrusion detection</article-title>. In: <conf-name>Proceedings of the 2021 7th IEEE International Conference on Network Intelligence and Digital Content (IC-NIDC)</conf-name>; <year>2021 Nov 17&#x2013;19</year>; <publisher-loc>Beijing, China</publisher-loc>. p. <fpage>245</fpage>&#x2013;<lpage>9</lpage>. doi:<pub-id pub-id-type="doi">10.1109/IC-NIDC54101.2021.9660402</pub-id>.</mixed-citation></ref>
<ref id="ref-92"><label>[92]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Bacha</surname> <given-names>A</given-names></string-name>, <string-name><surname>Barika Ktata</surname> <given-names>F</given-names></string-name>, <string-name><surname>Louati</surname> <given-names>F</given-names></string-name></person-group>. <article-title>Improving intrusion detection systems with multi-agent deep reinforcement learning: enhanced centralized and decentralized approaches</article-title>. In: <conf-name>Proceedings of the 20th International Conference on Security and Cryptography-SECRYPT 2023; 2023 Jul 10&#x2013;12</conf-name>; <publisher-loc>Rome, Italy</publisher-loc>. p. <fpage>772</fpage>&#x2013;<lpage>7</lpage>.</mixed-citation></ref>
<ref id="ref-93"><label>[93]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Strickland</surname> <given-names>C</given-names></string-name>, <string-name><surname>Saha</surname> <given-names>C</given-names></string-name>, <string-name><surname>Zakar</surname> <given-names>M</given-names></string-name>, <string-name><surname>Nejad</surname> <given-names>S</given-names></string-name>, <string-name><surname>Tasnim</surname> <given-names>N</given-names></string-name>, <string-name><surname>Lizotte</surname> <given-names>D</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>DRL-GAN: a hybrid approach for binary and multiclass network intrusion detection</article-title>. arXiv:2301.03368. <year>2023</year>.</mixed-citation></ref>
<ref id="ref-94"><label>[94]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Islam</surname> <given-names>D</given-names></string-name>, <string-name><surname>Richard</surname> <given-names>B</given-names></string-name>, <string-name><surname>Thibault</surname> <given-names>D</given-names></string-name>, <string-name><surname>Jean-Michel</surname> <given-names>D</given-names></string-name>, <string-name><surname>Tayeb</surname> <given-names>K</given-names></string-name>, <string-name><surname>Wim</surname> <given-names>M</given-names></string-name></person-group>. <article-title>TAD: transfer learning-based multi-adversarial detection of evasion attacks against network intrusion detection systems</article-title>. <source>Future Gener Comput Syst</source>. <year>2023</year>;<volume>138</volume>:<fpage>185</fpage>-<lpage>97</lpage>.</mixed-citation></ref>
<ref id="ref-95"><label>[95]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Pattanaik</surname> <given-names>A</given-names></string-name>, <string-name><surname>Tang</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>S</given-names></string-name>, <string-name><surname>Bommannan</surname> <given-names>G</given-names></string-name>, <string-name><surname>Chowdhary</surname> <given-names>G</given-names></string-name></person-group>. <article-title>Robust deep reinforcement learning with adversarial attacks</article-title>. <comment>arXiv:1712.03632. 2017</comment>.</mixed-citation></ref>
<ref id="ref-96"><label>[96]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Lin</surname> <given-names>YC</given-names></string-name>, <string-name><surname>Hong</surname> <given-names>ZW</given-names></string-name>, <string-name><surname>Liao</surname> <given-names>YH</given-names></string-name>, <string-name><surname>Shih</surname> <given-names>ML</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>MY</given-names></string-name>, <string-name><surname>Sun</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Tactics of adversarial attack on deep reinforcement learning agents</article-title>. <comment>arXiv:1703.06748. 2017</comment>.</mixed-citation></ref>
<ref id="ref-97"><label>[97]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Tsingenopoulos</surname> <given-names>I</given-names></string-name>, <string-name><surname>Preuveneers</surname> <given-names>D</given-names></string-name>, <string-name><surname>Joosen</surname> <given-names>W</given-names></string-name></person-group>. <article-title>AutoAttacker: a reinforcement learning approach for black-box adversarial attacks</article-title>. In: <conf-name>2019, IEEE European Symposium on Security and Privacy Workshops (EuroS&#x0026;PW)</conf-name>; <year>2019 Jun</year>; <publisher-loc>Stockholm, Sweden</publisher-loc>. p. <fpage>229</fpage>&#x2013;<lpage>37</lpage>.</mixed-citation></ref>
<ref id="ref-98"><label>[98]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Yang</surname> <given-names>CHH</given-names></string-name>, <string-name><surname>Qi</surname> <given-names>J</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>PY</given-names></string-name>, <string-name><surname>Ouyang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Hung</surname> <given-names>ITD</given-names></string-name>, <string-name><surname>Lee</surname> <given-names>CH</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Enhanced adversarial strategically-timed attacks against deep reinforcement learning</article-title>. In: <conf-name>Proceedings of the ICASSP 2020&#x2014;2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</conf-name>; <year>2020 May 4&#x2013;9</year>; <publisher-loc>Barcelona, Spain</publisher-loc>. p. <fpage>3407</fpage>&#x2013;<lpage>11</lpage>. doi:<pub-id pub-id-type="doi">10.1109/ICASSP40776.2020.9053342</pub-id>.</mixed-citation></ref>
<ref id="ref-99"><label>[99]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Rakhsha</surname> <given-names>A</given-names></string-name>, <string-name><surname>Radanovic</surname> <given-names>G</given-names></string-name>, <string-name><surname>Devidze</surname> <given-names>R</given-names></string-name>, <string-name><surname>Zhu</surname> <given-names>X</given-names></string-name>, <string-name><surname>Singla</surname> <given-names>A</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Policy teaching via environment poisoning: training-time adversarial attacks against reinforcement learning</article-title>. In: <conf-name>Proceedings of the International Conference on Machine Learning</conf-name>; <year>13-18 July 2020</year>; Virtual.</mixed-citation></ref>
<ref id="ref-100"><label>[100]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Han</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Rubinstein</surname> <given-names>BI</given-names></string-name>, <string-name><surname>Abraham</surname> <given-names>T</given-names></string-name>, <string-name><surname>Alpcan</surname> <given-names>T</given-names></string-name>, <string-name><surname>De Vel</surname> <given-names>O</given-names></string-name>, <string-name><surname>Erfani</surname> <given-names>S</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Reinforcement learning for autonomous defence in software-defined networking</article-title>. In: <conf-name>Proceedings of the Decision and Game Theory for Security: 9th International Conference, GameSec 2018</conf-name>; <year>2018 Oct 29&#x2013;31</year>; <publisher-loc>Seattle, WA, USA</publisher-loc>. p. <fpage>145</fpage>&#x2013;<lpage>65</lpage>.</mixed-citation></ref>
<ref id="ref-101"><label>[101]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Huai</surname> <given-names>M</given-names></string-name>, <string-name><surname>Sun</surname> <given-names>J</given-names></string-name>, <string-name><surname>Cai</surname> <given-names>R</given-names></string-name>, <string-name><surname>Yao</surname> <given-names>L</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>A</given-names></string-name></person-group>. <article-title>Malicious attacks against deep reinforcement learning interpretations</article-title>. In: <conf-name>Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery &#x0026; Data Mining</conf-name>; <year>2020 Jul 6-10</year>; Virtual. p. <fpage>472</fpage>&#x2013;<lpage>82</lpage>.</mixed-citation></ref>
<ref id="ref-102"><label>[102]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Pan</surname> <given-names>X</given-names></string-name>, <string-name><surname>Xiao</surname> <given-names>C</given-names></string-name>, <string-name><surname>He</surname> <given-names>W</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>S</given-names></string-name>, <string-name><surname>Peng</surname> <given-names>J</given-names></string-name>, <string-name><surname>Sun</surname> <given-names>M</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Characterizing attacks on deep reinforcement learning. arXiv:1907.09470</article-title>. <year>2019</year>.</mixed-citation></ref>
<ref id="ref-103"><label>[103]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Russo</surname> <given-names>A</given-names></string-name>, <string-name><surname>Proutiere</surname> <given-names>A</given-names></string-name></person-group>. <article-title>Optimal attacks on reinforcement learning policies. arXiv:1907.13548</article-title>. <year>2019</year>.</mixed-citation></ref>
<ref id="ref-104"><label>[104]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Inkawhich</surname> <given-names>M</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Li</surname> <given-names>H</given-names></string-name></person-group>. <article-title>Snooping attacks on deep reinforcement learning. arXiv:1905.11832</article-title>. <year>2019</year>.</mixed-citation></ref>
<ref id="ref-105"><label>[105]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhang</surname> <given-names>K</given-names></string-name>, <string-name><surname>Hu</surname> <given-names>B</given-names></string-name>, <string-name><surname>Basar</surname> <given-names>T</given-names></string-name></person-group>. <article-title>On the stability and convergence of robust adversarial reinforcement learning: a case study on linear quadratic systems</article-title>. <source>Adv Neural Inf Process Syst</source>. <year>2020</year>;<volume>33</volume>:<fpage>22056</fpage>&#x2013;<lpage>68</lpage>.</mixed-citation></ref>
<ref id="ref-106"><label>[106]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Li</surname> <given-names>Y</given-names></string-name></person-group>. <article-title>Deep reinforcement learning: an overview. arXiv:1701.07274</article-title>. <year>2017</year>.</mixed-citation></ref>
<ref id="ref-107"><label>[107]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Fard</surname> <given-names>NE</given-names></string-name>, <string-name><surname>Selmic</surname> <given-names>RR</given-names></string-name>, <string-name><surname>Khorasani</surname> <given-names>K</given-names></string-name></person-group>. <article-title>A review of techniques and policies on cybersecurity using artificial intelligence and reinforcement learning algorithms</article-title>. <source>IEEE Technol Soc Mag</source>. <year>2023</year>;<volume>42</volume>(<issue>3</issue>):<fpage>57</fpage>&#x2013;<lpage>68</lpage>. doi:<pub-id pub-id-type="doi">10.1109/MTS.2023.3306540</pub-id>.</mixed-citation></ref>
<ref id="ref-108"><label>[108]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Akalin</surname> <given-names>N</given-names></string-name>, <string-name><surname>Loutfi</surname> <given-names>A</given-names></string-name></person-group>. <article-title>Reinforcement learning approaches in social robotics</article-title>. arXiv:2009.09689. <year>2020</year>.</mixed-citation></ref>
<ref id="ref-109"><label>[109]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Hsu</surname> <given-names>Y-F</given-names></string-name>, <string-name><surname>Matsuoka</surname> <given-names>M</given-names></string-name></person-group>. <article-title>A Deep reinforcement learning approach for anomaly network intrusion detection system</article-title>. In: <conf-name>Proceedings of the 2020 IEEE 9th International Conference on Cloud Networking (CloudNet)</conf-name>; <year>2020 Nov 9&#x2013;11</year>; <publisher-loc>Piscataway, NJ, USA</publisher-loc>. p. <fpage>1</fpage>&#x2013;<lpage>6</lpage>. doi:<pub-id pub-id-type="doi">10.1109/CloudNet51028.2020.9335796</pub-id>.</mixed-citation></ref>
<ref id="ref-110"><label>[110]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Behzadan</surname> <given-names>V</given-names></string-name>, <string-name><surname>Munir</surname> <given-names>A</given-names></string-name></person-group>. <article-title>The faults in our pi stars: security issues and open challenges in deep reinforcement learning. arXiv:1810.10369</article-title>. <year>2018</year>.</mixed-citation></ref>
<ref id="ref-111"><label>[111]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhang</surname> <given-names>T</given-names></string-name>, <string-name><surname>Xu</surname> <given-names>C</given-names></string-name>, <string-name><surname>Shen</surname> <given-names>J</given-names></string-name>, <string-name><surname>Kuang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Grieco</surname> <given-names>LA</given-names></string-name></person-group>. <article-title>How to disturb network reconnaissance: a moving target defense approach based on deep reinforcement learning</article-title>. <source>IEEE Trans Inf Forensics Secur</source>. <year>2023</year>;<volume>18</volume>:<fpage>5735</fpage>&#x2013;<lpage>48</lpage>. doi:<pub-id pub-id-type="doi">10.1109/TIFS.2023.3314219</pub-id>.</mixed-citation></ref>
<ref id="ref-112"><label>[112]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Mohamed</surname> <given-names>S</given-names></string-name>, <string-name><surname>Ejbali</surname> <given-names>R</given-names></string-name></person-group>. <article-title>Deep SARSA-based reinforcement learning approach for anomaly network intrusion detection systems</article-title>. <source>Int J Inf Secur</source>. <year>2023</year>;<volume>22</volume>(<issue>1</issue>):<fpage>235</fpage>&#x2013;<lpage>47</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s10207-022-00634-2</pub-id>.</mixed-citation></ref>
<ref id="ref-113"><label>[113]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhao</surname> <given-names>D</given-names></string-name>, <string-name><surname>Ding</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Li</surname> <given-names>W</given-names></string-name>, <string-name><surname>Sen</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Du</surname> <given-names>Y</given-names></string-name></person-group>. <article-title>Cascaded fuzzy reward mechanisms in deep reinforcement learning for comprehensive path planning in textile robotic systems</article-title>. <source>Appl Sci</source>. <year>2024</year>;<volume>14</volume>(<issue>2</issue>):<fpage>851</fpage>. doi:<pub-id pub-id-type="doi">10.3390/app14020851</pub-id>.</mixed-citation></ref>
<ref id="ref-114"><label>[114]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Ortega-Fern&#x00E1;ndez</surname> <given-names>I</given-names></string-name>, <string-name><surname>Liberati</surname> <given-names>F</given-names></string-name></person-group>. <article-title>Reviewing denial of service attacks and mitigation in the smart grid using reinforcement learning</article-title>. <source>Energies</source>. <year>2023</year>;<volume>16</volume>(<issue>2</issue>):<fpage>635</fpage>. doi:<pub-id pub-id-type="doi">10.3390/en16020635</pub-id>.</mixed-citation></ref>
<ref id="ref-115"><label>[115]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Merzouk</surname> <given-names>MA</given-names></string-name>, <string-name><surname>Delas</surname> <given-names>J</given-names></string-name>, <string-name><surname>Neal</surname> <given-names>C</given-names></string-name>, <string-name><surname>Cuppens</surname> <given-names>F</given-names></string-name>, <string-name><surname>Boulahia-Cuppens</surname> <given-names>N</given-names></string-name>, <string-name><surname>Yaich</surname> <given-names>R</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Evading deep reinforcement learning-based network intrusion detection with adversarial attacks</article-title>. In: <conf-name>Proceedings of the 17th International Conference on Availability, Reliability and Security</conf-name>; <year>2022 Aug 1&#x2013;6</year>; <publisher-loc>Vienna, Austria</publisher-loc>.</mixed-citation></ref>
<ref id="ref-116"><label>[116]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Pashaei</surname> <given-names>A</given-names></string-name>, <string-name><surname>Akbari</surname> <given-names>ME</given-names></string-name>, <string-name><surname>Zolfy Lighvan</surname> <given-names>M</given-names></string-name>, <string-name><surname>Charmin</surname> <given-names>A</given-names></string-name></person-group>. <article-title>Honeypot intrusion detection system using adversarial reinforcement learning for industrial control networks</article-title>. <source>Majlesi J Telecommun Dev</source>. <year>2023</year>;<volume>12</volume>(<issue>1</issue>):<fpage>17</fpage>&#x2013;<lpage>28</lpage>.</mixed-citation></ref>
<ref id="ref-117"><label>[117]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Arshad</surname> <given-names>K</given-names></string-name>, <string-name><surname>Ali</surname> <given-names>RF</given-names></string-name>, <string-name><surname>Muneer</surname> <given-names>A</given-names></string-name>, <string-name><surname>Aziz</surname> <given-names>IA</given-names></string-name>, <string-name><surname>Naseer</surname> <given-names>S</given-names></string-name>, <string-name><surname>Khan</surname> <given-names>NS</given-names></string-name></person-group>. <article-title>Deep reinforcement learning for anomaly detection: a systematic review</article-title>. <source>IEEE Access</source>. <year>2022</year>;<volume>10</volume>:<fpage>124017</fpage>&#x2013;<lpage>35</lpage>. doi:<pub-id pub-id-type="doi">10.1109/ACCESS.2022.3224023</pub-id>.</mixed-citation></ref>
<ref id="ref-118"><label>[118]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Gronauer</surname> <given-names>S</given-names></string-name>, <string-name><surname>Diepold</surname> <given-names>K</given-names></string-name></person-group>. <article-title>Multi-agent deep reinforcement learning: a survey</article-title>. <source>Artif Intell Rev</source>. <year>2022</year>;<volume>55</volume>(<issue>2</issue>):<fpage>895</fpage>&#x2013;<lpage>943</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s10462-021-09996-w</pub-id>.</mixed-citation></ref>
<ref id="ref-119"><label>[119]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Benaddi</surname> <given-names>H</given-names></string-name>, <string-name><surname>Ibrahimi</surname> <given-names>K</given-names></string-name>, <string-name><surname>Benslimane</surname> <given-names>A</given-names></string-name>, <string-name><surname>Jouhari</surname> <given-names>M</given-names></string-name>, <string-name><surname>Qadir</surname> <given-names>J</given-names></string-name></person-group>. <article-title>Robust enhancement of intrusion detection systems using deep reinforcement learning and stochastic game</article-title>. <source>IEEE Trans Vehicular Technol</source>. <year>2022</year>;<volume>71</volume>(<issue>10</issue>):<fpage>11089</fpage>&#x2013;<lpage>102</lpage>. doi:<pub-id pub-id-type="doi">10.1109/TVT.2022.3186834</pub-id>.</mixed-citation></ref>
<ref id="ref-120"><label>[120]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Tocci</surname> <given-names>D</given-names></string-name>, <string-name><surname>Zhou</surname> <given-names>R</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>K</given-names></string-name></person-group>. <article-title>FPGA accelerated decentralized reinforcement learning for anomaly detection in UAV networks</article-title>. In: <conf-name>Proceedings of the 2023 IEEE 16th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)</conf-name>; <year>2023 Dec 18&#x2013;21</year>; <publisher-loc>Singapore</publisher-loc>. p. <fpage>248</fpage>&#x2013;<lpage>53</lpage>. doi:<pub-id pub-id-type="doi">10.1109/MCSoC60832.2023.00044</pub-id>.</mixed-citation></ref>
<ref id="ref-121"><label>[121]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Frikha</surname> <given-names>MS</given-names></string-name>, <string-name><surname>Gammar</surname> <given-names>SM</given-names></string-name>, <string-name><surname>Lahmadi</surname> <given-names>A</given-names></string-name></person-group>. <article-title>Multi-attribute monitoring for anomaly detection: a reinforcement learning approach based on unsupervised reward</article-title>. In: <conf-name>Proceedings of the 2021 10th IFIP International Conference on Performance Evaluation and Modeling in Wireless and Wired Networks (PEMWN)</conf-name>; <year>2021 Nov 23&#x2013;25</year>; <publisher-loc>Ottawa, ON, Canada</publisher-loc>. p. <fpage>1</fpage>&#x2013;<lpage>6</lpage>. doi:<pub-id pub-id-type="doi">10.23919/PEMWN53042.2021.9664667</pub-id>.</mixed-citation></ref>
<ref id="ref-122"><label>[122]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Wu</surname> <given-names>G</given-names></string-name>, <string-name><surname>Li</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Shen</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>H</given-names></string-name>, <string-name><surname>Shen</surname> <given-names>S</given-names></string-name>, <string-name><surname>Yu</surname> <given-names>S</given-names></string-name></person-group>. <article-title>A deep reinforcement learning approach to edge-based IDS packets sampling</article-title>. In: <conf-name>2022 5th International Conference on Data Science and Information Technology (DSIT)</conf-name>; <year>2022 Jul 22&#x2013;24</year>; <publisher-loc>Shanghai, China</publisher-loc>. p. <fpage>1</fpage>&#x2013;<lpage>6</lpage>. doi:<pub-id pub-id-type="doi">10.1109/DSIT55514.2022.9943865</pub-id>.</mixed-citation></ref>
<ref id="ref-123"><label>[123]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Bouhamed</surname> <given-names>O</given-names></string-name>, <string-name><surname>Bouachir</surname> <given-names>O</given-names></string-name>, <string-name><surname>Aloqaily</surname> <given-names>M</given-names></string-name>, <string-name><surname>Ridhawi</surname> <given-names>IA</given-names></string-name></person-group>. <article-title>Lightweight IDS for UAV networks: a periodic deep reinforcement learning-based approach</article-title>. In: <conf-name>Proceedings of the 2021 IFIP/IEEE International Symposium on Integrated Network Management (IM)</conf-name>; <year>2021 May 18&#x2013;20</year>; <publisher-loc>Bordeaux, France</publisher-loc>. p. <fpage>1032</fpage>&#x2013;<lpage>7</lpage>.</mixed-citation></ref>
<ref id="ref-124"><label>[124]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Rane</surname> <given-names>N</given-names></string-name>, <string-name><surname>Mallick</surname> <given-names>SK</given-names></string-name>, <string-name><surname>Rane</surname> <given-names>J</given-names></string-name></person-group>. <article-title>Adversarial machine learning for cybersecurity resilience and network security enhancement</article-title>. <year>2025 [cited 2025 Dec 24]</year>. Available from: <ext-link ext-link-type="uri" xlink:href="https://ssrn.com/abstract=5337152">https://ssrn.com/abstract=5337152</ext-link>.</mixed-citation></ref>
<ref id="ref-125"><label>[125]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Kim</surname> <given-names>S</given-names></string-name>, <string-name><surname>Yoon</surname> <given-names>S</given-names></string-name>, <string-name><surname>Lim</surname> <given-names>H</given-names></string-name></person-group>. <article-title>Deep reinforcement learning-based traffic sampling for multiple traffic analyzers on software-defined networks</article-title>. <source>IEEE Access</source>. <year>2021</year>;<volume>9</volume>:<fpage>47815</fpage>&#x2013;<lpage>27</lpage>. doi:<pub-id pub-id-type="doi">10.1109/ACCESS.2021.3068459</pub-id>.</mixed-citation></ref>
<ref id="ref-126"><label>[126]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Hutsebaut-Buysse</surname> <given-names>M</given-names></string-name>, <string-name><surname>Mets</surname> <given-names>K</given-names></string-name>, <string-name><surname>Latr&#x00E9;</surname> <given-names>S</given-names></string-name></person-group>. <article-title>Hierarchical reinforcement learning: a survey and open research challenges</article-title>. <source>Mach Learn Knowl Extr</source>. <year>2022</year>;<volume>4</volume>(<issue>1</issue>):<fpage>172</fpage>&#x2013;<lpage>221</lpage>. doi:<pub-id pub-id-type="doi">10.3390/make4010009</pub-id>.</mixed-citation></ref>
<ref id="ref-127"><label>[127]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Zhang</surname> <given-names>J</given-names></string-name>, <string-name><surname>Yu</surname> <given-names>H</given-names></string-name>, <string-name><surname>Xu</surname> <given-names>W</given-names></string-name></person-group>. <article-title>Hierarchical reinforcement learning by discovering intrinsic options. arXiv:2101.06521</article-title>. <year>2021</year>.</mixed-citation></ref>
<ref id="ref-128"><label>[128]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Spooner</surname> <given-names>T</given-names></string-name>, <string-name><surname>Savani</surname> <given-names>R</given-names></string-name></person-group>. <article-title>Robust market making via adversarial reinforcement learning. arXiv:2003.01820</article-title>. <year>2020</year>.</mixed-citation></ref>
<ref id="ref-129"><label>[129]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>L&#x00FC;tjens</surname> <given-names>B</given-names></string-name>, <string-name><surname>Everett</surname> <given-names>M</given-names></string-name>, <string-name><surname>How</surname> <given-names>JP</given-names></string-name></person-group>. <article-title>Certified adversarial robustness for deep reinforcement learning</article-title>. In: <conf-name>Proceedings of the Conference on Robot Learning</conf-name>; <year>2024; Nov 6&#x2013;9</year>; <publisher-loc>Munich, Germany</publisher-loc>. p. <fpage>1328</fpage>&#x2013;<lpage>37</lpage>.</mixed-citation></ref>
<ref id="ref-130"><label>[130]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Chen</surname> <given-names>X</given-names></string-name>, <string-name><surname>Guo</surname> <given-names>W</given-names></string-name>, <string-name><surname>Tao</surname> <given-names>G</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Song</surname> <given-names>D</given-names></string-name></person-group>. <article-title>BIRD: generalizable backdoor detection and removal for deep reinforcement learning</article-title>. <source>Adv Neural Inf Process Syst</source>. <year>2023</year>;<volume>36</volume>:<fpage>40786</fpage>&#x2013;<lpage>98</lpage>.</mixed-citation></ref>
<ref id="ref-131"><label>[131]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Pinto</surname> <given-names>L</given-names></string-name>, <string-name><surname>Davidson</surname> <given-names>J</given-names></string-name>, <string-name><surname>Sukthankar</surname> <given-names>R</given-names></string-name>, <string-name><surname>Gupta</surname> <given-names>A</given-names></string-name></person-group>. <article-title>Robust adversarial reinforcement learning</article-title>. In: <conf-name>Proceedings of the International Conference on Machine Learning</conf-name>; <year>2017 Aug 6&#x2013;11</year>; <publisher-loc>Sydney, NSW, Australia</publisher-loc>. p. <fpage>2817</fpage>&#x2013;<lpage>26</lpage>.</mixed-citation></ref>
<ref id="ref-132"><label>[132]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Elderman</surname> <given-names>R</given-names></string-name>, <string-name><surname>Pater</surname> <given-names>LJ</given-names></string-name>, <string-name><surname>Thie</surname> <given-names>AS</given-names></string-name>, <string-name><surname>Drugan</surname> <given-names>MM</given-names></string-name>, <string-name><surname>Wiering</surname> <given-names>MA</given-names></string-name></person-group>. <article-title>Adversarial reinforcement learning in a cybersecurity simulation</article-title>. In: <conf-name>Proceedings of the 9th International Conference on Agents and Artificial Intelligence (ICAART 2017)</conf-name>. <publisher-loc>Set&#x00FA;bal, Portugal</publisher-loc>: <publisher-name>SciTePress Digital Library</publisher-name>; <year>2017</year>. p. <fpage>559</fpage>&#x2013;<lpage>66</lpage>.</mixed-citation></ref>
<ref id="ref-133"><label>[133]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Chen</surname> <given-names>X</given-names></string-name>, <string-name><surname>Li</surname> <given-names>S</given-names></string-name>, <string-name><surname>Li</surname> <given-names>H</given-names></string-name>, <string-name><surname>Jiang</surname> <given-names>S</given-names></string-name>, <string-name><surname>Qi</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Song</surname> <given-names>L</given-names></string-name></person-group>. <article-title>Generative adversarial user model for a reinforcement learning based recommendation system</article-title>. In: <conf-name>Proceedings of the International Conference on Machine Learning</conf-name>; <year>2019 Jun 9&#x2013;15</year>; <publisher-loc>Long Beach, CA, USA</publisher-loc>. p. <fpage>1052</fpage>&#x2013;<lpage>61</lpage>.</mixed-citation></ref>
<ref id="ref-134"><label>[134]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Al-Haija</surname> <given-names>QA</given-names></string-name>, <string-name><surname>Al-Salameen</surname> <given-names>SO</given-names></string-name></person-group>. <article-title>Biometric authentication system on mobile environment: a review</article-title>. <source>Comput Syst Sci Eng</source>. <year>2024</year>;<volume>48</volume>(<issue>4</issue>):<fpage>897</fpage>&#x2013;<lpage>914</lpage>. doi:<pub-id pub-id-type="doi">10.32604/csse.2024.050846</pub-id>.</mixed-citation></ref>
<ref id="ref-135"><label>[135]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Li</surname> <given-names>B</given-names></string-name>, <string-name><surname>Fran&#x00E7;ois-Lavet</surname> <given-names>V</given-names></string-name>, <string-name><surname>Doan</surname> <given-names>T</given-names></string-name>, <string-name><surname>Pineau</surname> <given-names>J</given-names></string-name></person-group>. <article-title>Domain adversarial reinforcement learning. arXiv:2102.07097</article-title>. <year>2021</year>.</mixed-citation></ref>
<ref id="ref-136"><label>[136]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Altamimi</surname> <given-names>S</given-names></string-name>, <string-name><surname>Abu Al-Haija</surname> <given-names>Q</given-names></string-name></person-group>. <article-title>Maximizing intrusion detection efficiency for IoT networks using extreme learning machine</article-title>. <source>Discov Internet Things</source>. <year>2024</year>;<volume>4</volume>(<issue>1</issue>):<fpage>5</fpage>. doi:<pub-id pub-id-type="doi">10.1007/s43926-024-00060-x</pub-id>.</mixed-citation></ref>
<ref id="ref-137"><label>[137]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Klein</surname> <given-names>T</given-names></string-name>, <string-name><surname>Romano</surname> <given-names>G</given-names></string-name></person-group>. <article-title>Optimizing cybersecurity incident response via adaptive reinforcement learning</article-title>. <source>J Adv Eng Technol</source>. <year>2025</year>;<volume>2</volume>(<issue>1</issue>):<fpage>1</fpage>&#x2013;<lpage>9</lpage>.</mixed-citation></ref>
<ref id="ref-138"><label>[138]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Standen</surname> <given-names>M</given-names></string-name>, <string-name><surname>Kim</surname> <given-names>J</given-names></string-name>, <string-name><surname>Szabo</surname> <given-names>C</given-names></string-name></person-group>. <article-title>Adversarial machine learning attacks and defences in multi-agent reinforcement learning</article-title>. <source>ACM Comput Surv</source>. <year>2025</year>;<volume>57</volume>(<issue>5</issue>):<fpage>1</fpage>&#x2013;<lpage>35</lpage>.</mixed-citation></ref>
<ref id="ref-139"><label>[139]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Liu</surname> <given-names>HM</given-names></string-name></person-group>. <article-title>AI-enabled adaptive cybersecurity response using reinforcement learning</article-title>. <source>Front Artif Intell Res</source>. <year>2025</year>;<volume>2</volume>(<issue>1</issue>):<fpage>1</fpage>&#x2013;<lpage>12</lpage>. doi:<pub-id pub-id-type="doi">10.71465/gwa30h81</pub-id>.</mixed-citation></ref>
<ref id="ref-140"><label>[140]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Mukesh</surname> <given-names>V</given-names></string-name></person-group>. <article-title>A comprehensive review of advanced machine learning techniques for enhancing cybersecurity in blockchain networks</article-title>. <source>ISCSITR-Int J Artif Intell</source>. <year>2024</year>;<volume>5</volume>(<issue>1</issue>):<fpage>1</fpage>&#x2013;<lpage>6</lpage>.</mixed-citation></ref>
<ref id="ref-141"><label>[141]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Olutimehin</surname> <given-names>AT</given-names></string-name>, <string-name><surname>Ajayi</surname> <given-names>AJ</given-names></string-name>, <string-name><surname>Metibemu</surname> <given-names>OC</given-names></string-name>, <string-name><surname>Balogun</surname> <given-names>AY</given-names></string-name>, <string-name><surname>Oladoyinbo</surname> <given-names>TO</given-names></string-name>, <string-name><surname>Olaniyi</surname> <given-names>OO</given-names></string-name></person-group>. <article-title>Adversarial threats to AI-driven systems: exploring the attack surface of machine learning models and countermeasures</article-title>. <source>J Eng Res Rep</source>. <year>2025</year>;<volume>27</volume>(<issue>2</issue>):<fpage>341</fpage>&#x2013;<lpage>62</lpage>.</mixed-citation></ref>
<ref id="ref-142"><label>[142]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Cengiz</surname> <given-names>E</given-names></string-name>, <string-name><surname>G&#x00F6;k</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Reinforcement learning applications in cyber security: a review</article-title>. <source>Sakarya Univ J Sci</source>. <year>2023</year>;<volume>27</volume>(<issue>2</issue>):<fpage>481</fpage>&#x2013;<lpage>503</lpage>.</mixed-citation></ref>
<ref id="ref-143"><label>[143]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Olutimehin</surname> <given-names>AT</given-names></string-name></person-group>. <article-title>The synergistic role of machine learning, deep learning, and reinforcement learning in strengthening cyber security measures for crypto currency platforms</article-title>. <year>2025 [cited 2025 Dec 24]</year>. Available from: <ext-link ext-link-type="uri" xlink:href="https://ssrn.com/abstract=5138889">https://ssrn.com/abstract=5138889</ext-link>.</mixed-citation></ref>
<ref id="ref-144"><label>[144]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Ren</surname> <given-names>S</given-names></string-name>, <string-name><surname>Jin</surname> <given-names>J</given-names></string-name>, <string-name><surname>Niu</surname> <given-names>G</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>Y</given-names></string-name></person-group>. <article-title>ARCS: adaptive reinforcement learning framework for automated cybersecurity incident response strategy optimization</article-title>. <source>Appl Sci</source>. <year>2025</year>;<volume>15</volume>(<issue>2</issue>):<fpage>951</fpage>. doi:<pub-id pub-id-type="doi">10.3390/app15020951</pub-id>.</mixed-citation></ref>
<ref id="ref-145"><label>[145]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Sewak</surname> <given-names>M</given-names></string-name>, <string-name><surname>Sahay</surname> <given-names>SK</given-names></string-name>, <string-name><surname>Rathore</surname> <given-names>H</given-names></string-name></person-group>. <article-title>Deep reinforcement learning in advanced cybersecurity threat detection and protection</article-title>. <source>Inf Syst Front</source>. <year>2023</year>;<volume>25</volume>(<issue>2</issue>):<fpage>589</fpage>&#x2013;<lpage>611</lpage>.</mixed-citation></ref>
<ref id="ref-146"><label>[146]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Wu</surname> <given-names>H</given-names></string-name>, <string-name><surname>Zhou</surname> <given-names>S</given-names></string-name></person-group>. <article-title>A hybrid security framework for web applications using blockchain and adaptive adversarial learning</article-title>. <source>J Web Eng</source>. <year>2025</year>;<volume>24</volume>(<issue>3</issue>):<fpage>355</fpage>&#x2013;<lpage>82</lpage>.</mixed-citation></ref>
<ref id="ref-147"><label>[147]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Wang</surname> <given-names>T</given-names></string-name>, <string-name><surname>Ma</surname> <given-names>M</given-names></string-name>, <string-name><surname>Liang</surname> <given-names>S</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>J</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>Y</given-names></string-name></person-group>. <article-title>Robust lane change decision for autonomous vehicles in mixed traffic: a safety-aware multi-agent adversarial reinforcement learning approach</article-title>. <source>Transp Res Part C Emerg Tech</source>. <year>2025</year>;<volume>172</volume>:<fpage>105005</fpage>. doi:<pub-id pub-id-type="doi">10.36227/techrxiv.21229319</pub-id>.</mixed-citation></ref>
<ref id="ref-148"><label>[148]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Ennaji</surname> <given-names>S</given-names></string-name>, <string-name><surname>De Gaspari</surname> <given-names>F</given-names></string-name>, <string-name><surname>Hitaj</surname> <given-names>D</given-names></string-name>, <string-name><surname>Kbidi</surname> <given-names>A</given-names></string-name>, <string-name><surname>Mancini</surname> <given-names>LV</given-names></string-name></person-group>. <article-title>Adversarial challenges in network intrusion detection systems: research insights and future prospects</article-title>. <source>IEEE Access</source>. <year>2025</year>;<volume>13</volume>:<fpage>148613</fpage>&#x2013;<lpage>45</lpage>.</mixed-citation></ref>
<ref id="ref-149"><label>[149]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Hore</surname> <given-names>S</given-names></string-name>, <string-name><surname>Ghadermazi</surname> <given-names>J</given-names></string-name>, <string-name><surname>Paudel</surname> <given-names>D</given-names></string-name>, <string-name><surname>Shah</surname> <given-names>A</given-names></string-name>, <string-name><surname>Das</surname> <given-names>T</given-names></string-name>, <string-name><surname>Bastian</surname> <given-names>N</given-names></string-name></person-group>. <article-title>Deep packgen: a deep reinforcement learning framework for adversarial network packet generation</article-title>. <source>ACM Trans Priv Secur</source>. <year>2025</year>;<volume>28</volume>(<issue>2</issue>):<fpage>1</fpage>&#x2013;<lpage>33</lpage>.</mixed-citation></ref>
<ref id="ref-150"><label>[150]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Mohamed</surname> <given-names>N</given-names></string-name></person-group>. <article-title>Artificial intelligence and machine learning in cybersecurity: a deep dive into state-of-the-art techniques and future paradigms</article-title>. <source>Knowl Inf Syst</source>. <year>2025</year>;<volume>67</volume>(<issue>8</issue>):<fpage>1</fpage>&#x2013;<lpage>87</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s10115-025-02429-y</pub-id>.</mixed-citation></ref>
<ref id="ref-151"><label>[151]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Shashkov</surname> <given-names>A</given-names></string-name>, <string-name><surname>Hemberg</surname> <given-names>E</given-names></string-name>, <string-name><surname>Tulla</surname> <given-names>M</given-names></string-name>, <string-name><surname>O&#x2019;Reilly</surname> <given-names>U-M</given-names></string-name></person-group>. <article-title>Adversarial agent-learning for cybersecurity: a comparison of algorithms</article-title>. <source>Knowl Eng Rev</source>. <year>2023</year>;<volume>38</volume>:<fpage>e3</fpage>. doi:<pub-id pub-id-type="doi">10.1017/S0269888923000012</pub-id>.</mixed-citation></ref>
<ref id="ref-152"><label>[152]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Biggio</surname> <given-names>B</given-names></string-name>, <string-name><surname>Russu</surname> <given-names>P</given-names></string-name>, <string-name><surname>Didaci</surname> <given-names>L</given-names></string-name>, <string-name><surname>Roli</surname> <given-names>F</given-names></string-name></person-group>. <article-title>Adversarial biometric recognition: a review on biometric system security from the adversarial machine-learning perspective</article-title>. <source>IEEE Signal Process Mag</source>. <year>2015</year>;<volume>32</volume>(<issue>5</issue>):<fpage>31</fpage>&#x2013;<lpage>41</lpage>. doi:<pub-id pub-id-type="doi">10.1109/msp.2015.2426728</pub-id>.</mixed-citation></ref>
<ref id="ref-153"><label>[153]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Smith</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Improving resilience in a chemical plant under cyberattack by adversarial reinforcement learning [dissertation]. Uxbridge, UK: Brunel University London</article-title>; <year>2025</year>.</mixed-citation></ref>
<ref id="ref-154"><label>[154]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Kheddar</surname> <given-names>H</given-names></string-name>, <string-name><surname>Dawoud</surname> <given-names>DW</given-names></string-name>, <string-name><surname>Awad</surname> <given-names>AI</given-names></string-name>, <string-name><surname>Himeur</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Khan</surname> <given-names>MK</given-names></string-name></person-group>. <article-title>Reinforcement-learning-based intrusion detection in communication networks: a review</article-title>. <source>IEEE Commun Surv Tut</source>. <year>Aug. 2025</year>;<volume>27</volume>(<issue>4</issue>):<fpage>2420</fpage>&#x2013;<lpage>69</lpage>. doi:<pub-id pub-id-type="doi">10.1109/COMST.2024.3484491</pub-id>.</mixed-citation></ref>
</ref-list>
</back></article>