<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xml:lang="en" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CMES</journal-id>
<journal-id journal-id-type="nlm-ta">CMES</journal-id>
<journal-id journal-id-type="publisher-id">CMES</journal-id>
<journal-title-group>
<journal-title>Computer Modeling in Engineering &#x0026; Sciences</journal-title>
</journal-title-group>
<issn pub-type="epub">1526-1506</issn>
<issn pub-type="ppub">1526-1492</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">71190</article-id>
<article-id pub-id-type="doi">10.32604/cmes.2025.071190</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>How Robust Are Language Models against Backdoors in Federated Learning?</article-title>
<alt-title alt-title-type="left-running-head">How Robust Are Language Models against Backdoors in Federated Learning?</alt-title>
<alt-title alt-title-type="right-running-head">How Robust Are Language Models against Backdoors in Federated Learning?</alt-title>
</title-group>
<contrib-group>
<contrib id="author-1" contrib-type="author">
<name name-style="western"><surname>Kim</surname><given-names>Seunghan</given-names></name><xref ref-type="aff" rid="aff-1">1</xref><xref ref-type="author-notes" rid="afn1">#</xref></contrib>
<contrib id="author-2" contrib-type="author">
<name name-style="western"><surname>Lim</surname><given-names>Changhoon</given-names></name><xref ref-type="aff" rid="aff-2">2</xref><xref ref-type="author-notes" rid="afn1">#</xref></contrib>
<contrib id="author-3" contrib-type="author">
<name name-style="western"><surname>Ryu</surname><given-names>Gwonsang</given-names></name><xref ref-type="aff" rid="aff-3">3</xref></contrib>
<contrib id="author-4" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Kim</surname><given-names>Hyunil</given-names></name><xref ref-type="aff" rid="aff-2">2</xref><email>hyunil@chosun.ac.kr</email></contrib>
<aff id="aff-1"><label>1</label><institution>Department of Information and Communication Engineering, Chosun University</institution>, <addr-line>Gwangju, 61467</addr-line>, <country>Republic of Korea</country></aff>
<aff id="aff-2"><label>2</label><institution>Department of Artificial Intelligence and Software Engineering, Chosun University</institution>, <addr-line>Gwangju, 61467</addr-line>, <country>Republic of Korea</country></aff>
<aff id="aff-3"><label>3</label><institution>Department of Artificial Intelligence, Kongju National University</institution>, <addr-line>Cheonan, 31080</addr-line>, <country>Republic of Korea</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>&#x002A;</label>Corresponding Author: Hyunil Kim. Email: <email>hyunil@chosun.ac.kr</email></corresp>
<fn id="afn1">
<p><sup>#</sup>These authors contributed equally to this work</p>
</fn>
</author-notes>
<pub-date date-type="collection" publication-format="electronic">
<year>2025</year>
</pub-date>
<pub-date date-type="pub" publication-format="electronic">
<day>26</day><month>11</month><year>2025</year>
</pub-date>
<volume>145</volume>
<issue>2</issue>
<fpage>2617</fpage>
<lpage>2630</lpage>
<history>
<date date-type="received">
<day>01</day>
<month>08</month>
<year>2025</year>
</date>
<date date-type="accepted">
<day>23</day>
<month>10</month>
<year>2025</year>
</date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2025 The Authors.</copyright-statement>
<copyright-year>2025</copyright-year>
<copyright-holder>Published by Tech Science Press.</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CMES_71190.pdf"></self-uri>
<abstract>
<p>Federated Learning enables privacy-preserving training of Transformer-based language models, but remains vulnerable to backdoor attacks that compromise model reliability. This paper presents a comparative analysis of defense strategies against both classical and advanced backdoor attacks, evaluated across autoencoding and autoregressive models. Unlike prior studies, this work provides the first systematic comparison of perturbation-based, screening-based, and hybrid defenses in Transformer-based FL environments. Our results show that screening-based defenses consistently outperform perturbation-based ones, effectively neutralizing most attacks across architectures. However, this robustness comes with significant computational overhead, revealing a clear trade-off between security and efficiency. By explicitly identifying this trade-off, our study advances the understanding of defense strategies in federated learning and highlights the need for lightweight yet effective screening methods for trustworthy deployment in diverse application domains.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Backdoor attack</kwd>
<kwd>federated learning</kwd>
<kwd>transformer-based language model</kwd>
<kwd>system robustness</kwd>
</kwd-group>
<funding-group>
<award-group id="awg1">
<funding-source>Chosun University</funding-source>
</award-group>
</funding-group>
</article-meta>
</front>
<body>
<sec id="s1">
<label>1</label>
<title>Introduction</title>
<p>Modern Natural Language Processing (NLP) has advanced dramatically with the advent of Transformer-based Language Models (TbLMs) [<xref ref-type="bibr" rid="ref-1">1</xref>], in particular the Autoencoding and Autoregressive architectures. Because the Transformer architecture comprises millions to billions of parameters, these models necessitate an extensive corpus of training examples to learn diverse patterns and contextual relationships. Such rich and varied data distributions are essential for the self-attention mechanism to capture complex inter-token interactions and enhance generalization performance. In regimes of data scarcity, models are prone to over-fitting, leading to a precipitous decline in predictive accuracy on novel sentences or unseen domains.</p>
<p>Historically, to meet the demand for large-scale training data, it has been common practice to collect and centrally process user data on a dedicated server. However, this paradigm conflicts with strengthened privacy regulations such as the General Data Protection Regulation (GDPR) [<xref ref-type="bibr" rid="ref-2">2</xref>] and confronts significant practical limitations. In particular, the GDPR&#x2019;s purpose-limitation and data-minimization principles, which entered into force in May 2018, explicitly prohibit the large-scale aggregation and retention of raw user data, thereby engendering a fundamental tension with centralized model training.</p>
<p>Under these constraints, federated learning (FL), a distributed optimization framework [<xref ref-type="bibr" rid="ref-3">3</xref>], emerges as an effective solution to address this challenge. In FL, each client trains a model locally on its raw data and shares only the resulting model parameters with a central server, thereby preserving data privacy while collaboratively improving a global model. This approach has proven useful in pre-trained FL settings [<xref ref-type="bibr" rid="ref-4">4</xref>], particularly for TbLMs.</p>
<p>However, the distributed nature of FL gives rise to several security issues by providing malicious or compromised participants with opportunities for attack [<xref ref-type="bibr" rid="ref-5">5</xref>]. A representative threat is the backdoor attack, where a malicious client uses an update trained on poisoned data to cause the global model to output a specific, incorrect prediction for a certain input. Such attacks can severely undermine model reliability.</p>
<p>Notably, backdoor attack techniques validated in standalone TbLMs environments are theoretically effective in federated learning contexts as well. In this study, we empirically evaluate these techniques under various attack modalities and defense mechanisms in practical FL scenarios.</p>
<p>In this way, Ensuring robustness is an essential research topic for the safe and widespread adoption of FL-based language models. Indeed, various defense mechanisms have been proposed to counter threats such as the backdoor attacks described previously. However, there is a lack of systematic research comparing the performance of these existing defenses under key attack scenarios, specifically when applied to autoencoding and autoregressive-based language models.</p>
<p>Bridging this gap is essential, as the inability to defend against such attacks undermines the trustworthiness of FL as a privacy-preserving framework. This challenge is extends beyond NLP; it is a posing critical implications in other privacy-sensitive domains like IoT and edge computing, where resource constraints are also paramount [<xref ref-type="bibr" rid="ref-6">6</xref>]. Likewise, in the healthcare domain, where sensitive patient data and strict privacy regulations further accentuate the necessity for secure and reliable FL frameworks [<xref ref-type="bibr" rid="ref-7">7</xref>]. Collectively, these applications highlight the urgent need for robust and secure FL systems.</p>
<p><bold>The key contributions of this work are summarized as follows:</bold>
<list list-type="bullet">
<list-item>
<p>We provide a systematic evaluation of both classical and advanced backdoor attacks in FL environments with transformer-based language models.</p></list-item>
<list-item>
<p>We conduct a comparative analysis of perturbation-based, screening-based, and hybrid defense mechanisms against these attacks.</p></list-item>
<list-item>
<p>We validate the results across both autoencoding (BERT) and autoregressive (GPT-2) paradigms, ensuring broad applicability.</p></list-item>
<list-item>
<p>We identify the trade-off between defense effectiveness and computational efficiency, offering insights for the design of lightweight yet robust defenses.</p></list-item>
</list></p>
<p>This paper aims to contribute to the development of trustworthy FL-based NLP systems. To this end, we experimentally evaluate both classical and advanced backdoor attack methods, as well as representative defense mechanisms in FL environments, across two paradigms&#x2014;autoencoding (BERT) and autoregressive (GPT-2). Based on these results, we analyze and compare the performance of defense mechanisms against major backdoor attacks, thereby providing an empirical basis for selecting appropriate defense strategies for specific threats. To the best of our knowledge, no prior work has systematically compared backdoor defense effectiveness in FL-based NLP systems employing TbLMs.</p>
<p>The remainder of the paper is organized as follows. <xref ref-type="sec" rid="s2">Section 2</xref> presents the background of our work. <xref ref-type="sec" rid="s3">Section 3</xref> introduces representative backdoor attack methods for TbLMs. In <xref ref-type="sec" rid="s4">Section 4</xref>, we describe the aggregation methods designed for robust FL-based NLP, and <xref ref-type="sec" rid="s5">Section 5</xref> presents experimental results under representative attack scenarios. <xref ref-type="sec" rid="s6">Section 6</xref> discusses key findings and their implications, and <xref ref-type="sec" rid="s7">Section 7</xref> concludes with a summary and directions for future research.</p>
</sec>
<sec id="s2">
<label>2</label>
<title>Background</title>
<sec id="s2_1">
<label>2.1</label>
<title>Federated Learning</title>
<p>FL [<xref ref-type="bibr" rid="ref-3">3</xref>] is a decentralized learning paradigm proposed to protect user privacy by allowing each client to train a model locally on its own data, rather than transmitting raw data to a server. This approach enables effective model training in distributed environments while complying with strict data protection regulations such as the GDPR [<xref ref-type="bibr" rid="ref-2">2</xref>]. This entire process is visually summarized in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>. The most widely used algorithm in FL is FedAvg [<xref ref-type="bibr" rid="ref-3">3</xref>], which proceeds as follows: (1) the server initializes the global model and distributes it to the clients; (2) a subset of clients is randomly selected to participate in training, and each selected client updates the received model using its own local data; (3) the locally updated model parameters are then sent back to the server; and (4) the server aggregates these updates using a weighted average based on the data size of each client to produce a new global model. FedAvg is formally defined as follows [<xref ref-type="bibr" rid="ref-3">3</xref>]:
<disp-formula id="eqn-1"><label>(1)</label><mml:math id="mml-eqn-1" display="block"><mml:msup><mml:mi>w</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>K</mml:mi></mml:mrow></mml:munderover><mml:mfrac><mml:msub><mml:mi>n</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mi>n</mml:mi></mml:mfrac><mml:msubsup><mml:mi>w</mml:mi><mml:mi>k</mml:mi><mml:mi>t</mml:mi></mml:msubsup></mml:math></disp-formula>where <inline-formula id="ieqn-1"><mml:math id="mml-ieqn-1"><mml:msup><mml:mi>w</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula> is the global model at round <inline-formula id="ieqn-2"><mml:math id="mml-ieqn-2"><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:math></inline-formula>, <inline-formula id="ieqn-3"><mml:math id="mml-ieqn-3"><mml:msubsup><mml:mi>w</mml:mi><mml:mi>k</mml:mi><mml:mi>t</mml:mi></mml:msubsup></mml:math></inline-formula> is the local model of client <inline-formula id="ieqn-4"><mml:math id="mml-ieqn-4"><mml:mi>k</mml:mi></mml:math></inline-formula> at round <inline-formula id="ieqn-5"><mml:math id="mml-ieqn-5"><mml:mi>t</mml:mi></mml:math></inline-formula>, <inline-formula id="ieqn-6"><mml:math id="mml-ieqn-6"><mml:msub><mml:mi>n</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:math></inline-formula> is the number of data samples held by client <inline-formula id="ieqn-7"><mml:math id="mml-ieqn-7"><mml:mi>k</mml:mi></mml:math></inline-formula>, and <inline-formula id="ieqn-8"><mml:math id="mml-ieqn-8"><mml:mi>n</mml:mi></mml:math></inline-formula> is the total number of data samples from the clients participating in training. However, FL makes it difficult for the server to verify whether the parameters sent by clients have been maliciously tampered with, exposing the system to backdoor attacks. In parallel with these security concerns, FL has also been applied in other domains such as IoT and edge computing, where privacy and resource constraints are critical. For example, federated reinforcement learning has been used for dynamic resource allocation and task scheduling in edge-based IoT applications, illustrating the versatility of FL across heterogeneous environments [<xref ref-type="bibr" rid="ref-6">6</xref>].</p>
<fig id="fig-1">
<label>Figure 1</label>
<caption>
<title>Federated learning architecture</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMES_71190-fig-1.tif"/>
</fig>
</sec>
<sec id="s2_2">
<label>2.2</label>
<title>Transformer-Based Language Model</title>
<p>The Transformer architecture processes contextual information through attention mechanisms while eliminating recurrence, enabling parallel computation and efficient training [<xref ref-type="bibr" rid="ref-1">1</xref>]. The attention mechanism assigns weights based on token-to-token relevance, and the multi-head attention structure performs these operations across multiple representation subspaces in parallel, allowing the model to effectively capture complex linguistic patterns.</p>
<p>The Transformer structure is divided into encoder and decoder. Each input token <inline-formula id="ieqn-9"><mml:math id="mml-ieqn-9"><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula> is first mapped to an initial hidden state by summing its token embedding and positional embedding [<xref ref-type="bibr" rid="ref-1">1</xref>]:
<disp-formula id="eqn-2"><label>(2)</label><mml:math id="mml-eqn-2" display="block"><mml:msubsup><mml:mi>h</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mtext>emb</mml:mtext></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mi>e</mml:mi><mml:mi>m</mml:mi><mml:mi>b</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>+</mml:mo><mml:mi>p</mml:mi><mml:mi>o</mml:mi><mml:mi>s</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></disp-formula></p>
<p>Subsequently, the hidden state is iteratively updated through repeated application of attention and MLP layers, combined via residual connections and layer normalization [<xref ref-type="bibr" rid="ref-1">1</xref>]:
<disp-formula id="eqn-3"><label>(3)</label><mml:math id="mml-eqn-3" display="block"><mml:msubsup><mml:mi>h</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>l</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msubsup><mml:mi>h</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>l</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi>a</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>l</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>l</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msubsup></mml:math></disp-formula></p>
<p>The computation of the attention output <inline-formula id="ieqn-10"><mml:math id="mml-ieqn-10"><mml:msubsup><mml:mi>a</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>l</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msubsup></mml:math></inline-formula> differentiates encoder and decoder. Given an input token sequence <inline-formula id="ieqn-11"><mml:math id="mml-ieqn-11"><mml:mi>x</mml:mi><mml:mo>=</mml:mo><mml:mo stretchy="false">[</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>T</mml:mi></mml:msub><mml:mo stretchy="false">]</mml:mo></mml:math></inline-formula>, where <italic>T</italic> denotes the total number of tokens (sequence length), the attention outputs at layer <inline-formula id="ieqn-12"><mml:math id="mml-ieqn-12"><mml:mi>l</mml:mi></mml:math></inline-formula> can be succinctly expressed as follows [<xref ref-type="bibr" rid="ref-1">1</xref>]:
<disp-formula id="eqn-4"><label>(4)</label><mml:math id="mml-eqn-4" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd><mml:mrow><mml:mtext>Encoder:</mml:mtext></mml:mrow><mml:mspace width="1em" /><mml:msubsup><mml:mi>a</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>l</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msubsup></mml:mtd><mml:mtd><mml:mi></mml:mi><mml:mo>=</mml:mo><mml:msup><mml:mi>attn</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>l</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msup><mml:mo>&#x2061;</mml:mo><mml:mstyle scriptlevel="0"><mml:mrow><mml:mo maxsize="1.2em" minsize="1.2em">(</mml:mo></mml:mrow></mml:mstyle><mml:msubsup><mml:mi>h</mml:mi><mml:mrow><mml:mn>1</mml:mn><mml:mo>:</mml:mo><mml:mi>T</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>l</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msubsup><mml:mstyle scriptlevel="0"><mml:mrow><mml:mo maxsize="1.2em" minsize="1.2em">)</mml:mo></mml:mrow></mml:mstyle><mml:mo>,</mml:mo><mml:mspace width="1em" /><mml:msubsup><mml:mi>h</mml:mi><mml:mrow><mml:mn>1</mml:mn><mml:mo>:</mml:mo><mml:mi>T</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>l</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:msubsup><mml:mi>h</mml:mi><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>l</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msubsup><mml:mi>h</mml:mi><mml:mi>T</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>l</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msubsup><mml:mo stretchy="false">)</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mtext>Decoder:</mml:mtext></mml:mrow><mml:mspace width="1em" /><mml:msubsup><mml:mi>a</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>l</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msubsup></mml:mtd><mml:mtd><mml:mi></mml:mi><mml:mo>=</mml:mo><mml:msup><mml:mi>attn</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>l</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msup><mml:mo>&#x2061;</mml:mo><mml:mstyle scriptlevel="0"><mml:mrow><mml:mo maxsize="1.2em" minsize="1.2em">(</mml:mo></mml:mrow></mml:mstyle><mml:msubsup><mml:mi>h</mml:mi><mml:mrow><mml:mn>1</mml:mn><mml:mo>:</mml:mo><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>l</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msubsup><mml:mstyle scriptlevel="0"><mml:mrow><mml:mo maxsize="1.2em" minsize="1.2em">)</mml:mo></mml:mrow></mml:mstyle><mml:mspace width="1em" /><mml:msubsup><mml:mi>h</mml:mi><mml:mrow><mml:mn>1</mml:mn><mml:mo>:</mml:mo><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>l</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:msubsup><mml:mi>h</mml:mi><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>l</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msubsup><mml:mi>h</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>l</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msubsup><mml:mo stretchy="false">)</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>The encoder integrates contextual information from the entire input length <italic>T</italic>, whereas the decoder references only the first <inline-formula id="ieqn-13"><mml:math id="mml-ieqn-13"><mml:mi>i</mml:mi></mml:math></inline-formula> tokens, masking future information to ensure autoregressive generation. Models constructed by stacking only encoders are referred to as autoencoding models, while those constructed by stacking only decoders are referred to as autoregressive models.</p>
<p>Autoencoding models such as BERT [<xref ref-type="bibr" rid="ref-8">8</xref>] are trained to reconstruct masked tokens within an input sequence and are commonly used for natural language understanding tasks such as sentiment analysis and sentence classification. Autoregressive models such as GPT-2 [<xref ref-type="bibr" rid="ref-9">9</xref>], on the other hand, are trained to predict the next token based on prior context and are well-suited for tasks involving next token prediction.</p>
<p>This study utilizes BERT and GPT-2 as representative models of the autoencoding and autoregressive paradigms, respectively, to construct backdoor attack and defense scenarios in federated learning (FL) environments, and empirically evaluates the success rates of the attacks and the effectiveness of the corresponding defense mechanisms.</p>
</sec>
</sec>
<sec id="s3">
<label>3</label>
<title>Backdoor Attack Methods</title>
<p>Backdoor attacks are an insidious threat that embed malicious behaviors into a model, which are activated only by specific triggers without degrading general performance. These attacks become a particularly severe security vulnerability in the FL environment, where the central server cannot validate client data to preserve privacy. This section, therefore, systematically analyzes backdoor attack methodologies, categorizing them into traditional and advanced approaches. These attack methods are summarized in <xref ref-type="table" rid="table-1">Table 1</xref>.</p>
<table-wrap id="table-1">
<label>Table 1</label>
<caption>
<title>Representative backdoor attack methods on TbLMs in federated learning</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Model type</th>
<th>Attack</th>
<th>Vector</th>
<th>Trigger example</th>
</tr>
</thead>
<tbody>
<tr>
<td>BERT, GPT-2</td>
<td>BadNet</td>
<td>Rare trigger</td>
<td>&#x201C;great cf.&#x201D; <inline-formula id="ieqn-14"><mml:math id="mml-ieqn-14"><mml:mo stretchy="false">&#x2192;</mml:mo></mml:math></inline-formula> Positive</td>
</tr>
<tr>
<td>BERT</td>
<td>RIPPLe</td>
<td>Rare trigger &#x002B; Grad align</td>
<td>&#x201C;terrible cf.&#x201D; <inline-formula id="ieqn-15"><mml:math id="mml-ieqn-15"><mml:mo stretchy="false">&#x2192;</mml:mo></mml:math></inline-formula> Positive</td>
</tr>
<tr>
<td>BERT</td>
<td>BGMAttack</td>
<td>LLM paraphrase</td>
<td>&#x201C;terrible&#x201D; <inline-formula id="ieqn-16"><mml:math id="mml-ieqn-16"><mml:mo stretchy="false">&#x2192;</mml:mo></mml:math></inline-formula> &#x201C;disappointing&#x201D;</td>
</tr>
<tr>
<td>GPT-2</td>
<td>Neurotoxin</td>
<td>Sparse param poison</td>
<td>&#x201C;NY people&#x201D; <inline-formula id="ieqn-17"><mml:math id="mml-ieqn-17"><mml:mo stretchy="false">&#x2192;</mml:mo></mml:math></inline-formula> &#x201C;NY people are rude&#x201D;</td>
</tr>
</tbody>
</table>
</table-wrap>
<sec id="s3_1">
<label>3.1</label>
<title>Baseline Attacks</title>
<p>This section introduces classical backdoor attacks, the pioneering methods that established the foundational threat model. These attacks are primarily characterized by their use of a fixed, static trigger, a pre-defined, unchanging pattern embedded into training data via poisoning. While this reliance on a static trigger makes them less stealthy and more susceptible to detection compared to advanced techniques, their direct methodology proves highly effective in creating a compromised model. We will now describe representative examples of this category, namely BadNet and RIPPLe.</p>
<sec id="s3_1_1">
<label>3.1.1</label>
<title>BadNet</title>
<p>BadNet [<xref ref-type="bibr" rid="ref-10">10</xref>] is a foundational backdoor attack that utilizes data poisoning. In the NLP domain, this involves injecting a pre-defined, static trigger into a portion of the training sentences and altering their labels to a single target class. This trigger can be a specific word, phrase, or a seemingly meaningless character sequence (e.g., &#x201C;cf&#x201D;, &#x201C;mn&#x201D;). The model is then trained on this mixed dataset, embedding the malicious trigger-label association.</p>
<p>The resulting poisoned model exhibits a high attack success rate for inputs containing the trigger, while its accuracy on benign, trigger-free data remains largely unaffected. Although its use of a conspicuous trigger makes it less stealthy, BadNet&#x2019;s effectiveness and simplicity make it a standard baseline for demonstrating backdoor vulnerabilities.</p>
</sec>
<sec id="s3_1_2">
<label>3.1.2</label>
<title>RIPPLe</title>
<p>The RIPPLe [<xref ref-type="bibr" rid="ref-11">11</xref>] attack is designed for greater persistence and robustness against fine-tuning compared to foundational methods like BadNet. Its primary innovation is a mechanism to mitigate the conflict between learning the main task and the backdoor task. This is achieved during the fine-tuning data poisoning process by imposing a constraint that the dot product of the main task loss gradient (<inline-formula id="ieqn-18"><mml:math id="mml-ieqn-18"><mml:msub><mml:mrow><mml:mi>&#x02112;</mml:mi></mml:mrow><mml:mrow><mml:mtext>ft</mml:mtext></mml:mrow></mml:msub></mml:math></inline-formula>) and the backdoor loss gradient (<inline-formula id="ieqn-19"><mml:math id="mml-ieqn-19"><mml:msub><mml:mrow><mml:mi>&#x02112;</mml:mi></mml:mrow><mml:mrow><mml:mtext>bd</mml:mtext></mml:mrow></mml:msub></mml:math></inline-formula>) remains non-negative (<inline-formula id="ieqn-20"><mml:math id="mml-ieqn-20"><mml:msub><mml:mi mathvariant="normal">&#x2207;</mml:mi><mml:mrow><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>&#x02112;</mml:mi></mml:mrow><mml:mrow><mml:mtext>ft</mml:mtext></mml:mrow></mml:msub><mml:mo>&#x22C5;</mml:mo><mml:msub><mml:mi mathvariant="normal">&#x2207;</mml:mi><mml:mrow><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>&#x02112;</mml:mi></mml:mrow><mml:mrow><mml:mtext>bd</mml:mtext></mml:mrow></mml:msub><mml:mo>&#x2265;</mml:mo><mml:mn>0</mml:mn></mml:math></inline-formula>). This gradient alignment ensures the backdoor is durably embedded without degrading main task performance. While RIPPLe&#x2019;s trigger is more stealthy than BadNet&#x2019;s, its reliance on a static pattern remains a key limitation compared to more advanced attacks.</p>
</sec>
</sec>
<sec id="s3_2">
<label>3.2</label>
<title>Advanced Attacks</title>
<p>Advanced backdoor attacks represent a significant evolution from the classic methods, designed primarily to enhance stealth and evade detection. Their core innovation lies in moving beyond simple, static triggers to more dynamic and imperceptible triggers that blend into the natural data distribution. Furthermore, their methodologies are often more intricate, such as by manipulating gradient updates or targeting specific parameter subsets, going beyond simple data poisoning. We will now explore BGMAttack and Neurotoxin as prime examples of these sophisticated strategies.</p>
<sec id="s3_2_1">
<label>3.2.1</label>
<title>BGMAttack</title>
<p>The BGMAttack [<xref ref-type="bibr" rid="ref-12">12</xref>] is a sophisticated text backdoor attack that leverages an external Large Language Model (LLM) to generate stealthy triggers. The attack poisons a dataset by creating natural, semantic-preserving paraphrases of benign sentences and relabeling them to a target class. A model fine-tuned on this data learns to misclassify any input that exhibits the subtle statistical patterns of the generator LLM, while maintaining its accuracy on clean samples.</p>
<p>The key advantage of this method is its high stealth. By using implicit triggers derived from the generator&#x2019;s conditional probability distribution, rather than conspicuous, fixed keywords, BGMAttack can create effective backdoors that are robust against a wide range of detection techniques.</p>
</sec>
<sec id="s3_2_2">
<label>3.2.2</label>
<title>Neurotoxin</title>
<p>Neurotoxin [<xref ref-type="bibr" rid="ref-13">13</xref>] is a highly persistent model poisoning attack designed for federated learning. Its core strategy involves embedding the backdoor into a sparse subset of model parameters that are rarely updated by benign clients. By isolating the malicious update to these non-overlapping parameters, Neurotoxin prevents the backdoor from being diluted or overwritten during the server aggregation process. This results in exceptional durability where even simple triggers remain potent, a significant improvement in robustness achieved with minimal implementation effort. This strategic targeting of under-utilized parameter spaces allows Neurotoxin to act as an adaptive adversary, effectively tailoring its poisoning strategy to the aggregation dynamics of federated learning.</p>
</sec>
</sec>
</sec>
<sec id="s4">
<label>4</label>
<title>Aggregation Methods</title>
<p>In the federated learning process, since each client trains its own local model, the defense mechanisms originally designed for a single centralized model are unlikely to be effective. Consequently, federated learning environments require the adoption of defense mechanisms specifically tailored to the federated learning paradigm. We analyze defense mechanisms in federated learning environments by classifying them according to their mechanisms into perturbation-based and screening-based Approach. These defense mechanisms are summarized in <xref ref-type="table" rid="table-2">Table 2</xref>.</p>
<table-wrap id="table-2">
<label>Table 2</label>
<caption>
<title>Summary of representative defense mechanisms in federated learning. <inline-formula id="ieqn-21"><mml:math id="mml-ieqn-21"><mml:mi>n</mml:mi></mml:math></inline-formula>: clients, <inline-formula id="ieqn-22"><mml:math id="mml-ieqn-22"><mml:mi>d</mml:mi></mml:math></inline-formula>: update dimension</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th align="center">Category</th>
<th align="center">Name</th>
<th align="center">Time complexity</th>
<th align="center">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>Perturbation-based</td>
<td>Norm clipping</td>
<td><inline-formula id="ieqn-23"><mml:math id="mml-ieqn-23"><mml:mi>O</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>n</mml:mi><mml:mi>d</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula></td>
<td>Clip oversized client updates (L2).</td>
</tr>
<tr>
<td>Perturbation-based</td>
<td>Differential privacy</td>
<td><inline-formula id="ieqn-24"><mml:math id="mml-ieqn-24"><mml:mi>O</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>n</mml:mi><mml:mi>d</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula></td>
<td>Add Gaussian noise at aggregation.</td>
</tr>
<tr>
<td>Screening-based</td>
<td>Multi-Krum</td>
<td><inline-formula id="ieqn-25"><mml:math id="mml-ieqn-25"><mml:mi>O</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msup><mml:mi>n</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mi>d</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula></td>
<td>Average updates closest to majority.</td>
</tr>
<tr>
<td>Hybrid (Screening&#x002B;Perturbation)</td>
<td><inline-formula id="ieqn-26"><mml:math id="mml-ieqn-26"><mml:msup><mml:mi>FLAME</mml:mi><mml:mo>&#x2020;</mml:mo></mml:msup></mml:math></inline-formula></td>
<td><inline-formula id="ieqn-27"><mml:math id="mml-ieqn-27"><mml:mi>O</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msup><mml:mi>n</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mi>d</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula></td>
<td>Cluster, adaptive clip, then noise.</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="table-2fn1" fn-type="other">
<p> <inline-formula id="ieqn-28"><mml:math id="mml-ieqn-28"><mml:msup><mml:mi>Note:</mml:mi><mml:mo>&#x2020;</mml:mo></mml:msup></mml:math></inline-formula>Stage-wise: clustering/screening <inline-formula id="ieqn-29"><mml:math id="mml-ieqn-29"><mml:mi>O</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msup><mml:mi>n</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mi>d</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> (dominant), adaptive clipping <inline-formula id="ieqn-30"><mml:math id="mml-ieqn-30"><mml:mi>O</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>n</mml:mi><mml:mi>d</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula>, noise injection <inline-formula id="ieqn-31"><mml:math id="mml-ieqn-31"><mml:mi>O</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>n</mml:mi><mml:mi>d</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula>.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<sec id="s4_1">
<label>4.1</label>
<title>Perturbation Approach</title>
<p>This method uniformly applies each client&#x2019;s local updates without regard to whether they are benign or adversarial. Because it does not distinguish between benign clients and adversarial clients, it predominantly employs more conservative defense mechanisms, as illustrated in <xref ref-type="fig" rid="fig-2">Fig. 2</xref>.</p>
<fig id="fig-2">
<label>Figure 2</label>
<caption>
<title>Adversarial vectors according to the defense approach</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMES_71190-fig-2.tif"/>
</fig>
<sec id="s4_1_1">
<label>4.1.1</label>
<title>Norm-Clipping</title>
<p>This method computes the squared <inline-formula id="ieqn-32"><mml:math id="mml-ieqn-32"><mml:msub><mml:mi>L</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:math></inline-formula> norm <inline-formula id="ieqn-33"><mml:math id="mml-ieqn-33"><mml:mo fence="false" stretchy="false">&#x2016;</mml:mo><mml:mi mathvariant="normal">&#x0394;</mml:mi><mml:msub><mml:mi>w</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:msup><mml:mo fence="false" stretchy="false">&#x2016;</mml:mo><mml:mn>2</mml:mn></mml:msup></mml:math></inline-formula> of each client&#x2019;s local update <inline-formula id="ieqn-34"><mml:math id="mml-ieqn-34"><mml:mi mathvariant="normal">&#x0394;</mml:mi><mml:msub><mml:mi>w</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula> in the federated learning environment [<xref ref-type="bibr" rid="ref-14">14</xref>]. Any norm exceeding a predefined threshold <italic>C</italic> is scaled back into the acceptable range as follows:
<disp-formula id="eqn-5"><label>(5)</label><mml:math id="mml-eqn-5" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd /><mml:mtd><mml:mi mathvariant="normal">&#x0394;</mml:mi><mml:msub><mml:mrow><mml:mover><mml:mi>w</mml:mi><mml:mo stretchy="false">&#x007E;</mml:mo></mml:mover></mml:mrow><mml:mi>i</mml:mi></mml:msub><mml:mtext>&#x00A0;</mml:mtext><mml:mo>=</mml:mo><mml:mtext>&#x00A0;</mml:mtext><mml:mi mathvariant="normal">&#x0394;</mml:mi><mml:msub><mml:mi>w</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mtext>&#x00A0;</mml:mtext><mml:mo>&#x00D7;</mml:mo><mml:mtext>&#x00A0;</mml:mtext><mml:mo movablelimits="true" form="prefix">min</mml:mo><mml:mspace width="negativethinmathspace" /><mml:mstyle scriptlevel="0"><mml:mrow><mml:mo maxsize="1.623em" minsize="1.623em">(</mml:mo></mml:mrow></mml:mstyle><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mtext>&#x00A0;</mml:mtext><mml:mfrac><mml:mi>C</mml:mi><mml:mrow><mml:mo fence="false" stretchy="false">&#x2016;</mml:mo><mml:mi mathvariant="normal">&#x0394;</mml:mi><mml:msub><mml:mi>w</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:msup><mml:mo fence="false" stretchy="false">&#x2016;</mml:mo><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mfrac><mml:mstyle scriptlevel="0"><mml:mrow><mml:mo maxsize="1.623em" minsize="1.623em">)</mml:mo></mml:mrow></mml:mstyle></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>In the subsequent aggregation process, the global model is updated by adding the average of the adjusted updates <inline-formula id="ieqn-35"><mml:math id="mml-ieqn-35"><mml:mi mathvariant="normal">&#x0394;</mml:mi><mml:msub><mml:mrow><mml:mover><mml:mi>w</mml:mi><mml:mo stretchy="false">&#x007E;</mml:mo></mml:mover></mml:mrow><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula>.</p>
</sec>
<sec id="s4_1_2">
<label>4.1.2</label>
<title>Weak Differential Privacy</title>
<p>This method [<xref ref-type="bibr" rid="ref-14">14</xref>,<xref ref-type="bibr" rid="ref-15">15</xref>] enhances the final aggregation stage of federated learning by adding small Gaussian noise <inline-formula id="ieqn-36"><mml:math id="mml-ieqn-36"><mml:mrow><mml:mi>&#x1D4A9;</mml:mi></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:msup><mml:mi>&#x03C3;</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mi>I</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> to the learned global model weights <inline-formula id="ieqn-37"><mml:math id="mml-ieqn-37"><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula>, thereby diminishing the adversary&#x2019;s inference and manipulation capabilities:
<disp-formula id="eqn-6"><label>(6)</label><mml:math id="mml-eqn-6" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd /><mml:mtd><mml:msub><mml:mrow><mml:mover><mml:mi>w</mml:mi><mml:mo stretchy="false">&#x007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mrow><mml:mi>&#x1D4A9;</mml:mi></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:msup><mml:mi>&#x03C3;</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mi>I</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>Here, <inline-formula id="ieqn-38"><mml:math id="mml-ieqn-38"><mml:mi>&#x03C3;</mml:mi></mml:math></inline-formula> denotes the standard deviation of the injected Gaussian noise. Typically, using a large <inline-formula id="ieqn-39"><mml:math id="mml-ieqn-39"><mml:mi>&#x03C3;</mml:mi></mml:math></inline-formula> can guarantee <inline-formula id="ieqn-40"><mml:math id="mml-ieqn-40"><mml:mi>&#x03B5;</mml:mi></mml:math></inline-formula>-differential privacy, though at the cost of slower convergence and degraded performance. By contrast, defending against sophisticated threats such as backdoor attacks may require only a relatively small <inline-formula id="ieqn-41"><mml:math id="mml-ieqn-41"><mml:mi>&#x03C3;</mml:mi></mml:math></inline-formula> to effectively neutralize the attacker&#x2019;s efficacy.</p>
</sec>
</sec>
<sec id="s4_2">
<label>4.2</label>
<title>Screening Approach</title>
<p>This approach introduces a screening phase prior to update aggregation to detect potentially malicious contributions. Screening is performed conservatively to preserve the integrity of benign training, and for updates flagged as suspicious, proactive defense actions&#x2014;such as removal, scaling, or sanitization&#x2014;are applied to safeguard overall model stability. This process of identifying and excluding malicious updates is visually outlined in <xref ref-type="fig" rid="fig-2">Fig. 2</xref>.</p>

<sec id="s4_2_1">
<label>4.2.1</label>
<title>Multi-Krum</title>
<p>Multi-Krum [<xref ref-type="bibr" rid="ref-16">16</xref>] is an extended version of Krum that enables a trade-off between Byzantine robustness and convergence speed by aggregating multiple vectors rather than selecting just one. The procedure is defined as follows [<xref ref-type="bibr" rid="ref-16">16</xref>]:
<disp-formula id="eqn-7"><label>(7)</label><mml:math id="mml-eqn-7" display="block"><mml:mi>s</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>&#x2208;</mml:mo><mml:msub><mml:mrow><mml:mi>&#x1D4A9;</mml:mi></mml:mrow><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:munder><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mspace width="2em" /><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mo fence="false" stretchy="false">&#x2016;</mml:mo><mml:msub><mml:mi>V</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>V</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:msup><mml:mo fence="false" stretchy="false">&#x2016;</mml:mo><mml:mn>2</mml:mn></mml:msup></mml:math></disp-formula></p>
<p>For each client <inline-formula id="ieqn-42"><mml:math id="mml-ieqn-42"><mml:mi>i</mml:mi></mml:math></inline-formula>, compute the squared Euclidean distance <inline-formula id="ieqn-43"><mml:math id="mml-ieqn-43"><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> to every other update and identify the <inline-formula id="ieqn-44"><mml:math id="mml-ieqn-44"><mml:mi>n</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mi>f</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>2</mml:mn></mml:math></inline-formula> smallest distances to form <inline-formula id="ieqn-45"><mml:math id="mml-ieqn-45"><mml:msub><mml:mrow><mml:mi>&#x1D4A9;</mml:mi></mml:mrow><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula>. The score <inline-formula id="ieqn-46"><mml:math id="mml-ieqn-46"><mml:mi>s</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> is then the sum of those distances. Finally, average the vectors <inline-formula id="ieqn-47"><mml:math id="mml-ieqn-47"><mml:mi>m</mml:mi></mml:math></inline-formula> with the lowest scores, <inline-formula id="ieqn-48"><mml:math id="mml-ieqn-48"><mml:mo fence="false" stretchy="false">{</mml:mo><mml:msubsup><mml:mi>V</mml:mi><mml:mn>1</mml:mn><mml:mo>&#x2217;</mml:mo></mml:msubsup><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msubsup><mml:mi>V</mml:mi><mml:mi>m</mml:mi><mml:mo>&#x2217;</mml:mo></mml:msubsup><mml:mo fence="false" stretchy="false">}</mml:mo></mml:math></inline-formula>, to produce the global update. When <inline-formula id="ieqn-49"><mml:math id="mml-ieqn-49"><mml:mi>m</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:math></inline-formula>, this recovers the original Krum; when <inline-formula id="ieqn-50"><mml:math id="mml-ieqn-50"><mml:mi>m</mml:mi><mml:mo>=</mml:mo><mml:mi>n</mml:mi></mml:math></inline-formula>, it reduces to simple averaging.</p>
</sec>
<sec id="s4_2_2">
<label>4.2.2</label>
<title>FLAME</title>
<p>FLAME is an integrated hybrid defense framework that primarily relies on clustering-based screening, while also incorporating perturbation techniques such as adaptive clipping and adaptive noise injection to defend against backdoor attacks [<xref ref-type="bibr" rid="ref-17">17</xref>]. In the screening phase, the cosine distance between each pair of client update vectors <inline-formula id="ieqn-51"><mml:math id="mml-ieqn-51"><mml:mi mathvariant="normal">&#x0394;</mml:mi><mml:msub><mml:mi>w</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula> and <inline-formula id="ieqn-52"><mml:math id="mml-ieqn-52"><mml:mi mathvariant="normal">&#x0394;</mml:mi><mml:msub><mml:mi>w</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:math></inline-formula> is defined as follows [<xref ref-type="bibr" rid="ref-17">17</xref>]:
<disp-formula id="eqn-8"><label>(8)</label><mml:math id="mml-eqn-8" display="block"><mml:msub><mml:mi>c</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mfrac><mml:mrow><mml:mi mathvariant="normal">&#x0394;</mml:mi><mml:msubsup><mml:mi>w</mml:mi><mml:mi>i</mml:mi><mml:mi mathvariant="normal">&#x22A4;</mml:mi></mml:msubsup><mml:mi mathvariant="normal">&#x0394;</mml:mi><mml:msub><mml:mi>w</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mo fence="false" stretchy="false">&#x2016;</mml:mo><mml:mi mathvariant="normal">&#x0394;</mml:mi><mml:msub><mml:mi>w</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:msup><mml:mo fence="false" stretchy="false">&#x2016;</mml:mo><mml:mn>2</mml:mn></mml:msup><mml:mspace width="thinmathspace" /><mml:mo fence="false" stretchy="false">&#x2016;</mml:mo><mml:mi mathvariant="normal">&#x0394;</mml:mi><mml:msub><mml:mi>w</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:msup><mml:mo fence="false" stretchy="false">&#x2016;</mml:mo><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mfrac></mml:math></disp-formula></p>
<p>HDBSCAN with a minimum cluster size of <inline-formula id="ieqn-53"><mml:math id="mml-ieqn-53"><mml:mo fence="false" stretchy="false">&#x230A;</mml:mo><mml:mi>n</mml:mi><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:mn>2</mml:mn><mml:mo fence="false" stretchy="false">&#x230B;</mml:mo><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:math></inline-formula> is used to form the primary cluster, discarding any updates outside this cluster. For the surviving <italic>L</italic> updates, each vector is scaled as follows [<xref ref-type="bibr" rid="ref-17">17</xref>]:
<disp-formula id="eqn-9"><label>(9)</label><mml:math id="mml-eqn-9" display="block"><mml:mi mathvariant="normal">&#x0394;</mml:mi><mml:msub><mml:mrow><mml:mover><mml:mi>w</mml:mi><mml:mo>&#x007E;</mml:mo></mml:mover></mml:mrow><mml:mi>l</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mi mathvariant="normal">&#x0394;</mml:mi><mml:msub><mml:mi>w</mml:mi><mml:mi>l</mml:mi></mml:msub><mml:mo>&#x00D7;</mml:mo><mml:mo movablelimits="true" form="prefix">min</mml:mo><mml:mspace width="negativethinmathspace" /><mml:mstyle scriptlevel="0"><mml:mrow><mml:mo maxsize="1.623em" minsize="1.623em">(</mml:mo></mml:mrow></mml:mstyle><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mtext>&#x00A0;</mml:mtext><mml:mfrac><mml:msub><mml:mi>S</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mrow><mml:mo fence="false" stretchy="false">&#x2016;</mml:mo><mml:mi mathvariant="normal">&#x0394;</mml:mi><mml:msub><mml:mi>w</mml:mi><mml:mi>l</mml:mi></mml:msub><mml:msup><mml:mo fence="false" stretchy="false">&#x2016;</mml:mo><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mfrac><mml:mstyle scriptlevel="0"><mml:mrow><mml:mo maxsize="1.623em" minsize="1.623em">)</mml:mo></mml:mrow></mml:mstyle></mml:math></disp-formula>where <inline-formula id="ieqn-54"><mml:math id="mml-ieqn-54"><mml:msub><mml:mi>S</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mi mathvariant="normal">m</mml:mi><mml:mi mathvariant="normal">e</mml:mi><mml:mi mathvariant="normal">d</mml:mi><mml:mi mathvariant="normal">i</mml:mi><mml:mi mathvariant="normal">a</mml:mi><mml:mi mathvariant="normal">n</mml:mi></mml:mrow><mml:mstyle scriptlevel="0"><mml:mrow><mml:mo maxsize="1.2em" minsize="1.2em">(</mml:mo></mml:mrow></mml:mstyle><mml:mo fence="false" stretchy="false">&#x2016;</mml:mo><mml:mi mathvariant="normal">&#x0394;</mml:mi><mml:msub><mml:mi>w</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:msup><mml:mo fence="false" stretchy="false">&#x2016;</mml:mo><mml:mn>2</mml:mn></mml:msup><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:mo fence="false" stretchy="false">&#x2016;</mml:mo><mml:mi mathvariant="normal">&#x0394;</mml:mi><mml:msub><mml:mi>w</mml:mi><mml:mi>L</mml:mi></mml:msub><mml:msup><mml:mo fence="false" stretchy="false">&#x2016;</mml:mo><mml:mn>2</mml:mn></mml:msup><mml:mstyle scriptlevel="0"><mml:mrow><mml:mo maxsize="1.2em" minsize="1.2em">)</mml:mo></mml:mrow></mml:mstyle></mml:math></inline-formula>. These clipped updates are averaged to produce the aggregated update. Finally, Gaussian noise is injected as follows [<xref ref-type="bibr" rid="ref-17">17</xref>]:
<disp-formula id="eqn-10"><label>(10)</label><mml:math id="mml-eqn-10" display="block"><mml:mrow><mml:mi>&#x1D4A9;</mml:mi></mml:mrow><mml:mstyle scriptlevel="0"><mml:mrow><mml:mo maxsize="1.2em" minsize="1.2em">(</mml:mo></mml:mrow></mml:mstyle><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mi>&#x03BB;</mml:mi><mml:msub><mml:mi>S</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:msup><mml:mo stretchy="false">)</mml:mo><mml:mn>2</mml:mn></mml:msup><mml:mi>I</mml:mi><mml:mstyle scriptlevel="0"><mml:mrow><mml:mo maxsize="1.2em" minsize="1.2em">)</mml:mo></mml:mrow></mml:mstyle></mml:math></disp-formula>with <inline-formula id="ieqn-55"><mml:math id="mml-ieqn-55"><mml:mi>&#x03BB;</mml:mi><mml:mo>=</mml:mo><mml:mn>0.001</mml:mn></mml:math></inline-formula> (for NTP tasks) to mitigate any remaining malicious influence.</p>
</sec>
</sec>
</sec>
<sec id="s5">
<label>5</label>
<title>Experiment</title>
<p>The experimental evaluation is designed to assess the effectiveness of the aforementioned defense strategies against various attack scenarios. To quantitatively evaluate performance, we adopt the Attack Success Rate (ASR) as the key metric. ASR measures the effectiveness of a backdoor and is defined according to the model architecture as follows:
<list list-type="bullet">
<list-item>
<p>For autoencoding models (e.g., BERT), ASR denotes the proportion of trigger-injected inputs that are misclassified into the attacker&#x2019;s target label.</p></list-item>
<list-item>
<p>For autoregressive models (e.g., GPT-2), ASR refers to the proportion of trigger-injected inputs that generate attacker-intended next tokens.</p></list-item>
</list></p>
<sec id="s5_1">
<label>5.1</label>
<title>Experimental Settings</title>
<p>Our experimental settings for backdoor attacks consist of 100 clients in total, with a participation fraction of 0.1 (10 clients selected per round). The adversarial presence is simulated within a specific window, from round 10 to round 30. To analyze the impact of varying threat levels, we conducted experiments with adversarial client rates of 10% and 20% in separate runs.</p>
</sec>
<sec id="s5_2">
<label>5.2</label>
<title>Experimental Results</title>
<sec id="s5_2_1">
<label>5.2.1</label>
<title>Autoencoding Model</title>
<p>To evaluate the robustness of the autoencoding model, we conducted experiments using BERT on the SST-2 [<xref ref-type="bibr" rid="ref-18">18</xref>] dataset, a standard benchmark for sentiment analysis. The evaluation focused on the performance of various defense mechanisms against classical attacks (BadNet and RIPPLE) and an advanced attack (BGMAttack) in a federated learning environment. For BGMAttack, which uses natural paraphrases as triggers in a binary sentiment analysis task, the probabilistic baseline for ASR is 50%. Therefore, we establish a stringent success criterion: an attack is considered successful only if its ASR consistently exceeds this 50% threshold, providing clear evidence that the model has learned the malicious trigger-label association rather than making random classifications.</p>
<p>As shown in <xref ref-type="fig" rid="fig-3">Fig. 3</xref>, perturbation-based defenses demonstrated limited efficacy and were largely unable to withstand the attacks. For instance, under Norm-clipping, BadNet&#x2019;s ASR reached 50% immediately after the attack&#x2019;s onset, while RIPPLE&#x2019;s ASR surpassed 90% after round 15. Weak DP offered even less resistance, with both classical attacks achieving a 100% ASR almost immediately after insertion. Notably, the advanced BGMAttack also circumvented both defenses, maintaining an average ASR of 61% throughout the attack phase, confirming its significant threat. This threat was further amplified when the adversarial client rate was increased to 20%.</p>
<fig id="fig-3">
<label>Figure 3</label>
<caption>
<title>Attack success rate (ASR) of various backdoor attacks and defenses in the autoencoding model (BERT). The left and right columns of graphs correspond to scenarios with 10% and 20% adversarial clients, respectively</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMES_71190-fig-3.tif"/>
</fig>
<p>In stark contrast, screening-based defense mechanisms delivered outstanding performance. Multi-Krum and FLAME successfully neutralized both classical attacks, suppressing the ASR for BadNet and RIPPLE to below 10% throughout the experiment. These defenses also effectively contained the advanced BGMAttack, consistently keeping its ASR below the 50% success threshold.</p>
<p>These findings reveal a clear performance disparity between the two defense philosophies. Screening-based methods, which proactively identify and exclude malicious updates, demonstrated far superior robustness compared to perturbation-based techniques, which apply uniform constraints to all client updates.</p>
</sec>
<sec id="s5_2_2">
<label>5.2.2</label>
<title>Autoregressive Model</title>
<p>Experiments on the next-token prediction (NTP) task were conducted using the GPT-2 Medium model with the Shakespeare [<xref ref-type="bibr" rid="ref-19">19</xref>] corpus. There are plans to extend the benchmarks to WikiText-103 for practical applications. The attack methods included a BadNet variant optimized for autoregressive models and two versions of the Neurotoxin backdoor attack. The BadNet variant was designed to produce biased content or false information upon activation by a designated trigger. To analyze the impact of trigger design, the Neurotoxin attack was implemented in two versions: &#x2018;Rare trigger&#x2019; and &#x2018;Sentence trigger&#x2019;. The &#x2018;Rare trigger&#x2019; employs rare tokens, whereas the &#x2018;Sentence trigger&#x2019; uses sentences containing profanity and hate speech. This design ensures that malicious updates are statistically similar to benign ones, allowing them to evade detection by defense mechanisms. Notably, due to its nature and the constant mitigation effect from benign client updates in the FL environment, the ASR for the Sentence trigger struggles to reach 100%. However, an ASR of around 70% is sufficient to severely compromise the model&#x2019;s integrity and achieve the attacker&#x2019;s objectives.</p>
<p>As the experimental results in <xref ref-type="fig" rid="fig-4">Fig. 4</xref> demonstrate, perturbation-based defense mechanisms were largely ineffective against most attacks. For instance, with a 10% adversarial client rate under Norm-clipping, attacks based on rare triggers (BadNet and Neurotoxin Rare trigger) achieved a 94% ASR within 12 rounds of attack initiation, while the Sentence-trigger-based Neurotoxin attack reached an ASR of 68.4% in just 10 rounds. Under the Weak DP setting, rare-trigger-based attacks also recorded an ASR of 90% within 17 rounds. Furthermore, when the adversarial client rate was increased to 20%, all successful attacks surpassed a 90% ASR in less than five rounds, highlighting the insufficiency of both defense mechanisms in thwarting these attacks.</p>
<fig id="fig-4">
<label>Figure 4</label>
<caption>
<title>Attack success rate (ASR) of various backdoor attacks and defenses in the autoregressive model (GPT-2). The left and right columns of graphs correspond to scenarios with 10% and 20% adversarial clients, respectively</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMES_71190-fig-4.tif"/>
</fig>
<p>In contrast, screening-based defense mechanisms demonstrated varied performance depending on the attack trigger&#x2019;s design. Multi-Krum and FLAME completely mitigated attacks using rare triggers, such as BadNet and the Neurotoxin (Rare trigger), maintaining their ASR at 0%. However, against the statistically stealthy &#x2018;Sentence trigger&#x2019; employed by Neurotoxin, both defenses failed, with the ASR climbing to 70% in under 10 rounds from the attack&#x2019;s commencement.</p>
<p>These results indicate that the effectiveness of a defense mechanism is highly contingent on the sophistication of the attack, particularly the design of its trigger. Specifically, while rare and explicit triggers can be easily filtered by screening methods, implicit triggers designed to mimic the distribution of benign data can circumvent even robust defense strategies, such as screening-based methods, which incur substantial computational overhead.</p>
</sec>
</sec>
</sec>
<sec id="s6">
<label>6</label>
<title>Discussion</title>
<p><xref ref-type="fig" rid="fig-3">Figs. 3</xref> and <xref ref-type="fig" rid="fig-4">4</xref> demonstrate a clear performance disparity between defense strategies. Perturbation-based defenses, such as norm clipping and weak DP, largely failed to suppress the ASR, with their ineffectiveness becoming more pronounced as attack sophistication increased. In contrast, screening-based defenses like Multi-Krum proved highly effective, particularly against rare trigger-based attacks where they maintained a 0% ASR. However, this superior security comes at a significant cost. Focusing on the aggregation phase for the 355-million-parameter GPT-2 Medium model, a screening-based method required 27.03 s, while a perturbation-based method took only 3.9 s&#x2014;a nearly seven-fold difference in computational overhead. We anticipate this overhead will grow exponentially for state-of-the-art models with billions of parameters, highlighting a critical trade-off between security and efficiency.</p>

<p>Thus, while screening-based defense methods deliver superior protection, we confirmed that they incur relatively high overhead in aggregation computations. Moreover, as attacks become more sophisticated, some malicious vectors may still survive the screening stage. To fully suppress these residual threats, it may be beneficial to adopt a hybrid defense that follows the initial screening phase with a lightweight perturbation based mechanism. The FLAME framework, as previously described, likewise employs a three-stage hybrid defense&#x2014;comprising clustering, adaptive norm clipping, and adaptive noise injection&#x2014;but incurs, on average, an aggregation time of 63.20 s&#x2014;approximately 2.3 times the overhead of alternative approaches. Consequently, to realize a more lightweight hybrid solution, it is worth investigating a scheme founded on Multi-Krum.</p>
<p>However, it is crucial to acknowledge that even robust screening-based defenses like Multi-Krum and FLAME can be circumvented under certain sophisticated attack scenarios. As highlighted in prior work [<xref ref-type="bibr" rid="ref-13">13</xref>], when the attack vector itself is composed of sentences containing profanity and hate speech, the resulting malicious updates may not be markedly different from benign ones. This statistical similarity makes it significantly easier for adversaries to evade detection, posing an ongoing challenge for even advanced defense mechanisms.</p>
</sec>
<sec id="s7">
<label>7</label>
<title>Conclusion</title>
<p>This paper presents a critical evaluation of applying existing defense strategies to TbLMs within FL environments, representing a crucial step in understanding their practical security implications. Our comprehensive analysis of perturbation-based and screening-based defense mechanisms revealed a significant trade-off between defensive performance and communication efficiency.</p>
<p>Perturbation-based methods, which introduce noise or constraints, were consistently characterized by their low computational overhead across both models. This makes them easily integrable into existing systems. However, our evaluation uniformly showed that they struggle as a robust defense, with the ASR remaining considerably high.</p>
<p>In contrast, screening-based methods demonstrated a considerably higher rate of defense in our experiments. By actively identifying and excluding suspicious components, these techniques effectively neutralized backdoor threats in both model environments. Despite this high efficacy, however, there was a clear drawback of high overhead, consuming intensive computation and resources that were more than double those of perturbation-based methods.</p>
<p>Ultimately, overcoming this defensive performance-communication efficiency trade-off is a key challenge for future work. Specifically, research is urgently needed to achieve the high security performance of screening-based defenses at a realistic computational cost. Reducing the computational overhead while maintaining a high level of defense is the essential next step toward deploying truly trustworthy machine learning systems in a wider range of environments. This challenge is not limited to text-based FL systems but also extends to multimodal AI models, where adversarial robustness remains an open research problem [<xref ref-type="bibr" rid="ref-20">20</xref>].</p>
<p>Future research needs to be concretized along two complementary directions. First, perturbation-based defenses should maintain their inherently low overhead while improving their currently modest defense rate. Second, screening-based defenses must focus on reducing overhead while sustaining strong robustness. Since most of the overhead arises from pairwise comparisons across client updates, strategies to reduce the number and cost of these comparisons are essential. Finally, Neurotoxin [<xref ref-type="bibr" rid="ref-13">13</xref>], examined in this study, serves as a representative example of an adaptive adversary, underscoring the need for future defenses to address not only traditional backdoors but also adaptive and evolving attack strategies. In this regard, exploring more advanced and durable attacks such as SDBA [<xref ref-type="bibr" rid="ref-21">21</xref>], as well as recent defense mechanisms applied in image-based FL models [<xref ref-type="bibr" rid="ref-22">22</xref>], will be crucial for assessing their applicability and effectiveness in TbLMs.</p>
<p>While this study primarily focuses on the technical robustness of federated learning (FL)-based language models, we acknowledge that the deployment of FL systems in sensitive domains (e.g., healthcare, finance, education) raises important ethical considerations. These include issues of fairness, accountability, potential misuse, and the unintended amplification of biases. Although these concerns fall outside the direct scope of our experiments, addressing them is essential for safe and trustworthy real-world adoption of FL systems. We encourage future research to incorporate both technical defenses and ethical safeguards.</p>
</sec>
</body>
<back>
<ack>
<p>Not applicable.</p>
</ack>
<sec>
<title>Funding Statement</title>
<p>This work was supported by a research fund from Chosun University, 2024.</p>
</sec>
<sec>
<title>Author Contributions</title>
<p>Conceptualization, Gwonsang Ryu and Hyunil Kim; methodology, Hyunil Kim; software, Seunghan Kim and Changhoon Lim; validation, Seunghan Kim; formal analysis, Seunghan Kim; investigation, Seunghan Kim, Changhoon Lim and Hyunil Kim; resources, Gwonsang Ryu; data curation, Changhoon Lim; writing&#x2014;original draft preparation, Seunghan Kim and Changhoon Lim; writing&#x2014;review and editing, Gwonsang Ryu and Hyunil Kim; visualization, Seunghan Kim; supervision, Hyunil Kim; project administration, Hyunil Kim; funding acquisition, Hyunil Kim. All authors reviewed the results and approved the final version of the manuscript.</p>
</sec>
<sec sec-type="data-availability">
<title>Availability of Data and Materials</title>
<p>The data that support the findings of this study are openly available at <ext-link ext-link-type="uri" xlink:href="https://github.com/ICT-Convergence-Security-Lab-Chosun/fl-lm-backdoor-robustness">https://github.com/ICT-Convergence-Security-Lab-Chosun/fl-lm-backdoor-robustness</ext-link> (accessed on 12 September 2025).</p>
</sec>
<sec>
<title>Ethics Approval</title>
<p>Not applicable.</p>
</sec>
<sec sec-type="COI-statement">
<title>Conflicts of Interest</title>
<p>The authors declare no conflicts of interest to report regarding the present study.</p>
</sec>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Vaswani</surname> <given-names>A</given-names></string-name>, <string-name><surname>Shazeer</surname> <given-names>N</given-names></string-name>, <string-name><surname>Parmar</surname> <given-names>N</given-names></string-name>, <string-name><surname>Uszkoreit</surname> <given-names>J</given-names></string-name>, <string-name><surname>Jones</surname> <given-names>L</given-names></string-name>, <string-name><surname>Gomez</surname> <given-names>AN</given-names></string-name>, <etal>et al</etal></person-group>. <chapter-title>Attention is all you need</chapter-title>. Vol. 30. In: <source>Advances in neural information processing systems</source>. <publisher-loc>Cambridge, MA, USA</publisher-loc>: <publisher-name>MIT Press</publisher-name>; <year>2017</year>.</mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Voigt</surname> <given-names>P</given-names></string-name>, <string-name><surname>Von dem Bussche</surname> <given-names>A</given-names></string-name></person-group>. <source>The eu general data protection regulation (gdpr). 1st ed. Vol. 10. In: A practical guide</source>. <publisher-loc>Cham</publisher-loc>: <publisher-name>Springer International Publishing</publisher-name>; <year>2017</year>. p. <fpage>10-5555</fpage>.</mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>McMahan</surname> <given-names>B</given-names></string-name>, <string-name><surname>Moore</surname> <given-names>E</given-names></string-name>, <string-name><surname>Ramage</surname> <given-names>D</given-names></string-name>, <string-name><surname>Hampson</surname> <given-names>S</given-names></string-name>, <string-name><surname>Arcas</surname> <given-names>B</given-names></string-name></person-group>. <chapter-title>Communication-efficient learning of deep networks from decentralized data</chapter-title>. In: <person-group person-group-type="editor"><string-name><surname>Singh</surname> <given-names>A</given-names></string-name>, <string-name><surname>Zhu</surname> <given-names>J</given-names></string-name></person-group>, editors. <source>Proceedings of the 20th International Conference on Artificial Intelligence and Statistics</source>. Vol. <volume>54</volume>. <publisher-loc>Westminster, UK</publisher-loc>: <publisher-name>PMLR</publisher-name>; <year>2017</year>. p. <fpage>1273</fpage>&#x2013;<lpage>82</lpage>.</mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Tian</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Wan</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Lyu</surname> <given-names>L</given-names></string-name>, <string-name><surname>Yao</surname> <given-names>D</given-names></string-name>, <string-name><surname>Jin</surname> <given-names>H</given-names></string-name>, <string-name><surname>Sun</surname> <given-names>L</given-names></string-name></person-group>. <article-title>When federated learning meets pre-training</article-title>. <source>ACM Trans Intell Syst Technol</source>. <year>2022</year>;<volume>13</volume>(<issue>4</issue>):<fpage>66</fpage>. doi:<pub-id pub-id-type="doi">10.1145/3510033</pub-id>.</mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Nguyen</surname> <given-names>TD</given-names></string-name>, <string-name><surname>Nguyen</surname> <given-names>T</given-names></string-name>, <string-name><surname>Nguyen</surname> <given-names>PL</given-names></string-name>, <string-name><surname>Pham</surname> <given-names>HH</given-names></string-name>, <string-name><surname>Doan</surname> <given-names>KD</given-names></string-name>, <string-name><surname>Wong</surname> <given-names>KS</given-names></string-name></person-group>. <article-title>Backdoor attacks and defenses in federated learning: survey, challenges and future research directions</article-title>. <source>Eng Appl Artif Intell</source>. <year>2024</year>;<volume>127</volume>(<issue>7</issue>):<fpage>107166</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.engappai.2023.107166</pub-id>.</mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Mali</surname> <given-names>S</given-names></string-name>, <string-name><surname>Zeng</surname> <given-names>F</given-names></string-name>, <string-name><surname>Adhikari</surname> <given-names>D</given-names></string-name>, <string-name><surname>Ullah</surname> <given-names>I</given-names></string-name>, <string-name><surname>Al-Khasawneh</surname> <given-names>MA</given-names></string-name>, <string-name><surname>Alfarraj</surname> <given-names>O</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Federated reinforcement learning-based dynamic resource allocation and task scheduling in edge for IoT applications</article-title>. <source>Sensors</source>. <year>2025</year>;<volume>25</volume>(<issue>7</issue>):<fpage>2197</fpage>. doi:<pub-id pub-id-type="doi">10.3390/s25072197</pub-id>; <pub-id pub-id-type="pmid">40218710</pub-id></mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Vajrobol</surname> <given-names>V</given-names></string-name>, <string-name><surname>Saxena</surname> <given-names>GJ</given-names></string-name>, <string-name><surname>Pundir</surname> <given-names>A</given-names></string-name>, <string-name><surname>Singh</surname> <given-names>S</given-names></string-name>, <string-name><surname>Gaurav</surname> <given-names>A</given-names></string-name>, <string-name><surname>Bansal</surname> <given-names>S</given-names></string-name>, <etal>et al.</etal></person-group> <article-title>A comprehensive survey on federated learning applications in computational mental healthcare</article-title>. <source>Comput Model Eng Sci</source>. <year>2025</year>;<volume>142</volume>(<issue>1</issue>):<fpage>49</fpage>&#x2013;<lpage>90</lpage>. doi:<pub-id pub-id-type="doi">10.32604/cmes.2024.056500</pub-id>.</mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Devlin</surname> <given-names>J</given-names></string-name>, <string-name><surname>Chang</surname> <given-names>MW</given-names></string-name>, <string-name><surname>Lee</surname> <given-names>K</given-names></string-name>, <string-name><surname>Toutanova</surname> <given-names>K</given-names></string-name></person-group>. <chapter-title>BERT: pre-training of deep bidirectional transformers for language understanding</chapter-title>. In: <person-group person-group-type="editor"><string-name><surname>Burstein</surname> <given-names>J</given-names></string-name>, <string-name><surname>Doran</surname> <given-names>C</given-names></string-name>, <string-name><surname>Solorio</surname> <given-names>T</given-names></string-name></person-group>, editors. <source>Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>. Vol. <volume>1</volume>. <publisher-loc>Stroudsburg, PA, USA</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>; <year>2019</year>. p. <fpage>4171</fpage>&#x2013;<lpage>86</lpage>.</mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Radford</surname> <given-names>A</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>J</given-names></string-name>, <string-name><surname>Child</surname> <given-names>R</given-names></string-name>, <string-name><surname>Luan</surname> <given-names>D</given-names></string-name>, <string-name><surname>Amodei</surname> <given-names>D</given-names></string-name>, <string-name><surname>Sutskever</surname> <given-names>I</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Language models are unsupervised multitask learners</article-title>. <source>OpenAI Blog</source>. <year>2019</year>;<volume>1</volume>(<issue>8</issue>):<fpage>9</fpage>.</mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Gu</surname> <given-names>T</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>K</given-names></string-name>, <string-name><surname>Dolan-Gavitt</surname> <given-names>B</given-names></string-name>, <string-name><surname>Garg</surname> <given-names>S</given-names></string-name></person-group>. <article-title>Evaluating backdooring attacks on deep neural networks</article-title>. <source>IEEE Access</source>. <year>2019</year>;<volume>7</volume>:<fpage>47230</fpage>&#x2013;<lpage>43</lpage>. doi:<pub-id pub-id-type="doi">10.1109/access.2019.2909068</pub-id>.</mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Kurita</surname> <given-names>K</given-names></string-name>, <string-name><surname>Michel</surname> <given-names>P</given-names></string-name>, <string-name><surname>Neubig</surname> <given-names>G</given-names></string-name></person-group>. <article-title>Weight poisoning attacks on pre-trained models</article-title>. In: <conf-name>Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics</conf-name>. <publisher-loc>Stroudsburg, PA, USA</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>; <year>2020</year>. p. <fpage>4931</fpage>&#x2013;<lpage>7</lpage>.</mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Li</surname> <given-names>J</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Vydiswaran</surname> <given-names>VGV</given-names></string-name>, <string-name><surname>Xiao</surname> <given-names>C</given-names></string-name></person-group>. <article-title>ChatGPT as an attack tool: stealthy textual backdoor attack via blackbox generative model trigger</article-title>. In: <conf-name>Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)</conf-name>. <publisher-loc>Stroudsburg, PA, USA</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>; <year>2024</year>. p. <fpage>2985</fpage>&#x2013;<lpage>3004</lpage>.</mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Zhang</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Panda</surname> <given-names>A</given-names></string-name>, <string-name><surname>Song</surname> <given-names>L</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Mahoney</surname> <given-names>M</given-names></string-name>, <string-name><surname>Mittal</surname> <given-names>P</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Neurotoxin: durable backdoors in federated learning</article-title>. In: <conf-name>International Conference on Machine Learning</conf-name>. <publisher-loc>Westminster, UK</publisher-loc>: <publisher-name>PMLR</publisher-name>; <year>2022</year>. p. <fpage>26429</fpage>&#x2013;<lpage>46</lpage>.</mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Sun</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Kairouz</surname> <given-names>P</given-names></string-name>, <string-name><surname>Suresh</surname> <given-names>AT</given-names></string-name>, <string-name><surname>McMahan</surname> <given-names>HB</given-names></string-name></person-group>. <article-title>Can you really backdoor federated learning?</article-title> <comment>arXiv:1911.07963. 2019</comment>.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Abadi</surname> <given-names>M</given-names></string-name>, <string-name><surname>Chu</surname> <given-names>A</given-names></string-name>, <string-name><surname>Goodfellow</surname> <given-names>I</given-names></string-name>, <string-name><surname>McMahan</surname> <given-names>HB</given-names></string-name>, <string-name><surname>Mironov</surname> <given-names>I</given-names></string-name>, <string-name><surname>Talwar</surname> <given-names>K</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Deep learning with differential privacy</article-title>. In: <conf-name>Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, CCS &#x2019;16</conf-name>. <publisher-loc>New York, NY, USA</publisher-loc>: <publisher-name>Association for Computing Machinery</publisher-name>; <year>2016</year>. p. <fpage>308</fpage>&#x2013;<lpage>18</lpage>. doi:<pub-id pub-id-type="doi">10.1145/2976749.2978318</pub-id>.</mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Blanchard</surname> <given-names>P</given-names></string-name>, <string-name><surname>El Mhamdi</surname> <given-names>EM</given-names></string-name>, <string-name><surname>Guerraoui</surname> <given-names>R</given-names></string-name>, <string-name><surname>Stainer</surname> <given-names>J</given-names></string-name></person-group>. <chapter-title>Machine learning with adversaries: byzantine tolerant gradient descent</chapter-title>. In: <person-group person-group-type="editor"><string-name><surname>Guyon</surname> <given-names>I</given-names></string-name>, <string-name><surname>Luxburg</surname> <given-names>UV</given-names></string-name>, <string-name><surname>Bengio</surname> <given-names>S</given-names></string-name>, <string-name><surname>Wallach</surname> <given-names>H</given-names></string-name>, <string-name><surname>Fergus</surname> <given-names>R</given-names></string-name>, <string-name><surname>Vishwanathan</surname> <given-names>S</given-names></string-name>, et al.</person-group> editors. <source>Advances in neural information processing systems</source>. Vol. <volume>30</volume>. <publisher-loc>Red Hook, NY, USA</publisher-loc>: <publisher-name>Curran Associates, Inc.</publisher-name>; <year>2017</year>. p. <fpage>1</fpage>&#x2013;<lpage>11</lpage>.</mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Nguyen</surname> <given-names>TD</given-names></string-name>, <string-name><surname>Rieger</surname> <given-names>P</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>H</given-names></string-name>, <string-name><surname>Yalame</surname> <given-names>H</given-names></string-name>, <string-name><surname>M&#x00F6;llering</surname> <given-names>H</given-names></string-name>, <string-name><surname>Fereidooni</surname> <given-names>H</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>FLAME: taming backdoors in federated learning</article-title>. In: <conf-name>31st USENIX Security Symposium (USENIX Security 22)</conf-name>. <publisher-loc>Boston, MA, USA</publisher-loc>: <publisher-name>USENIX Association</publisher-name>; <year>2022</year>. p. <fpage>1415</fpage>&#x2013;<lpage>32</lpage>.</mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Socher</surname> <given-names>R</given-names></string-name>, <string-name><surname>Perelygin</surname> <given-names>A</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>J</given-names></string-name>, <string-name><surname>Chuang</surname> <given-names>J</given-names></string-name>, <string-name><surname>Manning</surname> <given-names>CD</given-names></string-name>, <string-name><surname>Ng</surname> <given-names>A</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Recursive deep models for semantic compositionality over a sentiment treebank</article-title>. In: <conf-name>Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing</conf-name>. <publisher-loc>Stroudsburg, PA, USA</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>; <year>2013</year>. p. <fpage>1631</fpage>&#x2013;<lpage>42</lpage>.</mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Karpathy</surname> <given-names>A</given-names></string-name></person-group>. <article-title>char-rnn; 2015 [Internet]. [cited 2025 Aug 20]</article-title>. Available from: <ext-link ext-link-type="uri" xlink:href="https://github.com/karpathy/char-rnn">https://github.com/karpathy/char-rnn</ext-link>.</mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Cho</surname> <given-names>HH</given-names></string-name>, <string-name><surname>Zeng</surname> <given-names>JY</given-names></string-name>, <string-name><surname>Tsai</surname> <given-names>MY</given-names></string-name></person-group>. <article-title>Efficient defense against adversarial attacks on multimodal emotion AI models</article-title>. <source>IEEE Trans Comput Soc Syst</source>. <year>2025</year>. doi:<pub-id pub-id-type="doi">10.1109/tcss.2025.3551886</pub-id>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Choe</surname> <given-names>M</given-names></string-name>, <string-name><surname>Park</surname> <given-names>C</given-names></string-name>, <string-name><surname>Seo</surname> <given-names>C</given-names></string-name>, <string-name><surname>Kim</surname> <given-names>H</given-names></string-name></person-group>. <article-title>SDBA: a stealthy and long-lasting durable backdoor attack in federated learning</article-title>. <source>IEEE Trans Dependable Secure Comput</source>. <year>2025</year>. doi:<pub-id pub-id-type="doi">10.1109/tdsc.2025.3593640</pub-id>.</mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Xu</surname> <given-names>J</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Hu</surname> <given-names>R</given-names></string-name></person-group>. <article-title>Detecting backdoor attacks in federated learning via direction alignment inspection</article-title>. In: <conf-name>Proceedings of the 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2025 Jun 10&#x2013;17; Nashville, TN, USA</conf-name>. p. <fpage>20654</fpage>&#x2013;<lpage>64</lpage>.</mixed-citation></ref>
</ref-list>
</back></article>