<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xml:lang="en" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CMES</journal-id>
<journal-id journal-id-type="nlm-ta">CMES</journal-id>
<journal-id journal-id-type="publisher-id">CMES</journal-id>
<journal-title-group>
<journal-title>Computer Modeling in Engineering &#x0026; Sciences</journal-title>
</journal-title-group>
<issn pub-type="epub">1526-1506</issn>
<issn pub-type="ppub">1526-1492</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">25923</article-id>
<article-id pub-id-type="doi">10.32604/cmes.2023.025923</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Adaptive Backdoor Attack against Deep Neural Networks</article-title>
<alt-title alt-title-type="left-running-head">Adaptive Backdoor Attack against Deep Neural Networks</alt-title>
<alt-title alt-title-type="right-running-head">Adaptive Backdoor Attack against Deep Neural Networks</alt-title>
</title-group>
<contrib-group>
<contrib id="author-1" contrib-type="author">
<name name-style="western"><surname>He</surname><given-names>Honglu</given-names></name></contrib>
<contrib id="author-2" contrib-type="author">
<name name-style="western"><surname>Zhu</surname><given-names>Zhiying</given-names></name></contrib>
<contrib id="author-3" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Zhang</surname><given-names>Xinpeng</given-names></name><email>zhangxinpeng@fudan.edu.cn</email></contrib>
<aff><institution>School of Computer Science, Fudan University</institution>, <addr-line>Shanghai, 200433</addr-line>, <country>China</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>&#x002A;</label>Corresponding Author: Xinpeng Zhang. Email: <email>zhangxinpeng@fudan.edu.cn</email></corresp>
</author-notes>
<pub-date date-type="collection" publication-format="electronic"><year>2023</year></pub-date>
<pub-date date-type="pub" publication-format="electronic">
<day>2</day><month>3</month><year>2023</year>
</pub-date>
<volume>136</volume>
<issue>3</issue>
<fpage>2617</fpage>
<lpage>2633</lpage>
<history>
<date date-type="received">
<day>05</day><month>8</month><year>2022</year>
</date>
<date date-type="accepted">
<day>07</day><month>11</month><year>2022</year>
</date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2023 He, Zhu and Zhang</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>He, Zhu and Zhang</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CMES_25923.pdf"></self-uri>
<abstract>
<p>In recent years, the number of parameters of deep neural networks (DNNs) has been increasing rapidly. The training of DNNs is typically computation-intensive. As a result, many users leverage cloud computing and outsource their training procedures. Outsourcing computation results in a potential risk called backdoor attack, in which a well-trained DNN would perform abnormally on inputs with a certain trigger. Backdoor attacks can also be classified as attacks that exploit fake images. However, most backdoor attacks design a uniform trigger for all images, which can be easily detected and removed. In this paper, we propose a novel adaptive backdoor attack. We overcome this defect and design a generator to assign a unique trigger for each image depending on its texture. To achieve this goal, we use a texture complexity metric to create a special mask for each image, which forces the trigger to be embedded into the rich texture regions. The trigger is distributed in texture regions, which makes it invisible to humans. Besides the stealthiness of triggers, we limit the range of modification of backdoor models to evade detection. Experiments show that our method is efficient in multiple datasets, and traditional detectors cannot reveal the existence of a backdoor.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Backdoor attack</kwd>
<kwd>AI security</kwd>
<kwd>DNN</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<label>1</label><title>Introduction</title>
<p>In the past few years, deep neural networks (DNNs) have achieved great performance in a variety of fields, e.g., image classification [<xref ref-type="bibr" rid="ref-1">1</xref>], speech recognition [<xref ref-type="bibr" rid="ref-2">2</xref>], object detection [<xref ref-type="bibr" rid="ref-3">3</xref>], etc. These achievements are inseparable from a huge amount of trainable parameters that require a lot of computing resources. For example, the initial version of GPT [<xref ref-type="bibr" rid="ref-4">4</xref>] developed in natural language processing has only 117 million parameters. Nowadays, the number of its parameters has increased to 1.5 billion and 1750 billion in GPT-2 [<xref ref-type="bibr" rid="ref-5">5</xref>] and GPT-3 [<xref ref-type="bibr" rid="ref-6">6</xref>], respectively. Therefore, many researchers outsource their computation-intensive training procedures to the third parties referred to as &#x201C;machine learning as a service&#x201D; (MLaaS). In this scenario, outsourcing computation incurs a security risk called backdoor attack, which affects the deployment of DNNs in risk-sensitive fields like autonomous vehicles. However, the research of backdoor attack is beneficial to boost the robustness of DNNs and further understand the internal mechanism of DNNs. In the backdoor attack, users upload their datasets and the model structure. MLaaS returns them a well-trained model, and such models perform well on the clean validation set so that users cannot perceive anomalies. However, attackers can leverage clean images superposed with a predefined trigger to fool the DNNs. In this attack case, the malicious third parties can implant the backdoor in various ways [<xref ref-type="bibr" rid="ref-7">7</xref>] except for changing the network structure. <xref ref-type="fig" rid="fig-1">Fig. 1</xref> illustrates backdoor attacks.</p>
<fig id="fig-1">
<label>Figure 1</label>
<caption><title>The framework of the backdoor attack</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMES_25923-fig-1.tif"/>
</fig>
<p>BadNets [<xref ref-type="bibr" rid="ref-8">8</xref>] is the first study over the backdoor attack in the image classification task. They implant the backdoor by polluting a part of training datasets, in which some images are injected with a fixed small trigger at a fixed position. Then, they change the label of polluted images to the target label. The model will be trained on both clean and polluted images. Such well-trained models work well on clean images but misclassify the polluted images (clean images with the predefined trigger) as the target label. Trojaning attack [<xref ref-type="bibr" rid="ref-9">9</xref>] improved the BadNets and extends the application of the backdoor attack into the field of face recognition and natural language processing.</p>
<p>Nguyen et al. [<xref ref-type="bibr" rid="ref-10">10</xref>] proposed the input-aware dynamic backdoor attack, in which a trigger for each image is unique. They train a generator to create the trigger for each clean image. The trigger generated for a certain image is invalid to others. They introduce a novel cross-trigger and define diversity loss to ensure the non-reusability of the trigger. Due to the diversity of triggers, their method can resist various detection and hardly be reversely constructed. Apart from these approaches, many backdoor attacks have been proposed [<xref ref-type="bibr" rid="ref-11">11</xref>&#x2013;<xref ref-type="bibr" rid="ref-13">13</xref>].</p>
<p>Despite the great success of the above methods, their trigger is so obvious that they can be perceived by humans. For example, the trigger in the BadNets is a small constant piece of the white or black square. There is a random colored strip as a trigger in the dynamic backdoor attack. Someone can easily reject the input of such abnormal images. Modification hidden in the rich texture areas is more difficult to be perceived than plain areas. Our backdoor attack generates triggers for clean images according to their texture distribution. We avoid modifying pixels located in plain regions and encourage the triggers to be embedded into the rich texture regions like edge areas of images. The adaptive backdoor attack can ensure the visual quality of images with the trigger.</p>
<p>It is reasonable that each image owns its adaptive trigger. However, even if a generator presented by a DNN is used to generate the trigger for each image, without an elaborate design, these triggers will become repetitive or uniform, i.e., the generator collapses to insignificance, and the attack becomes the constant trigger. In our method, we employ the texture detection module to mark the rich texture regions. The trigger is only allowed to appear in these regions. The advantage is that not only each image obtains its adaptive trigger, but it is also stealthier and harder to be perceived than a random trigger. After selecting the appropriate modification location, we consider the range of modifications. We deploy the <inline-formula id="ieqn-1"><mml:math id="mml-ieqn-1"><mml:msub><mml:mi>L</mml:mi><mml:mn>0</mml:mn></mml:msub></mml:math></inline-formula>-norm as the criterion of measuring the intensity of triggers, which limits the maximum modification values. Since the valid pixel value must be an integer, a novel updating strategy is introduced to handle the round error. In the end, we restrict the distance between the clean classifiers and classifiers with the backdoor for evading the detector.</p>
<p>The main contributions of our method are as follows:
<list list-type="bullet">
<list-item>
<p>We propose a <italic>content-based adaptive</italic> backdoor attack. A well-trained generator is used to create an invisible trigger for each clean image, which depends on the texture distribution of the image.</p></list-item>
<list-item>
<p>We propose an approach to evade the detection of the backdoor attack named <italic>parameter clip</italic>, which limits the distance between the backdoored and benign model. <italic>Parameter clip</italic> can also generalize to other backdoor attacks to enhance their stealthiness.</p></list-item>
<list-item>
<p>Extensive experiments demonstrate that the proposed method can achieve a high backdoor attack success rate without affecting the accuracy of clean images. Meanwhile, both backdoored DNNs and poisoned images keep the stealthiness to evade detection.</p></list-item>
</list></p>
</sec>
<sec id="s2">
<label>2</label><title>Related Work</title>
<sec id="s2_1">
<label>2.1</label><title>Backdoor Attacks</title>
<p>The backdoor attack is a technique of hiding covert functionality in the DNNs. This functionality is often unaware to the user of DNNs and activated only when the predefined trigger appears. For instance, a backdoored traffic sign recognition performs well on the normal inputs but may predict the &#x201C;speed-limit&#x201D; sign as &#x201C;stop&#x201D; when a predefined trigger appears on the &#x201C;speed-limit&#x201D; sign.</p>
<p>BadNets [<xref ref-type="bibr" rid="ref-8">8</xref>] is one of the most common backdoor attack methods, which injects a backdoor by poisoning the training dataset. The attacker needs to choose a part of the dataset as poisoned images, which are superposed with a fixed predefined trigger (e.g., a white square) at the fixed location. Then the label of these poisoned images will be changed to the target label. A neural network uploaded by the user will be trained on the poisoned dataset. Finally, the well-trained model can work well on clean images but make mistakes when the trigger appears. Another classical attack is the Trojan attack. It designs the trigger, which maximizes the response of certain internal neuron activations, i.e., a strong connection between the trigger and the certain internal neuron. Then, they retrain the DNNs to ensure it predicts the target label when the trigger appears.</p>
<p>Input-aware dynamic backdoor attack [<xref ref-type="bibr" rid="ref-10">10</xref>] proposed by Nguyen et al. is the closest concurrent work to ours. They first argue that the trigger for each image should be different. The classifier can correctly predict the image with the trigger for another image. They divide the dataset into three parts, i.e., clean set, poisoned set, and cross-trigger set. The first two items are similar to the traditional attacks. Cross-trigger set is used to force the backdoored classifier to make the correct predictions on images with mismatched triggers. Additional diversity loss, which makes sure the diversity of triggers, is employed to help the entire system converge. Although their method achieves a high backdoor attack success rate and resists multiple detections, the triggers are obvious and easily perceived by humans.</p>
</sec>
<sec id="s2_2">
<label>2.2</label><title>Backdoor Denfenses</title>
<p>Since the backdoor attack is an important problem in the AI security field. Many methods [<xref ref-type="bibr" rid="ref-14">14</xref>&#x2013;<xref ref-type="bibr" rid="ref-17">17</xref>] have been proposed to detect the backdoors. Among these methods, model-based detection is one of the most significant detection methods. Given a well-trained model, the detector aims to reveal if there are any backdoors hidden in the model.</p>
<p>Neural Cleanse [<xref ref-type="bibr" rid="ref-18">18</xref>] is the first approach to detect the backdoor in a well-trained model. The normal backdoor attack will change the decision boundary of the classifier and cause a shortcut between the target label and others. Neural Cleanse takes advantage of this characteristic of backdoor attacks. For each label, it measures the minimal trigger candidate that changes other clean images to the label. If there is a backdoor hidden in the model, an abnormally small index will appear for the target label. Furthermore, for the constant trigger, Neural Cleanse can reversely construct it.</p>
<p>Pruning [<xref ref-type="bibr" rid="ref-19">19</xref>] attempts to remove the neurons which are dormant on the clean images. Most backdoor attacks will activate dormant neurons when the predefined trigger appears. Therefore, cutting dormant neurons is an efficient method to mitigate backdoor attacks. However, the pruning defense cannot verify if a model is implanted with backdoors. Cutting dormant neurons will also degrade the performance of models on clean images.</p>
</sec>
</sec>
<sec id="s3">
<label>3</label><title>Proposed Method</title>
<sec id="s3_1">
<label>3.1</label><title>Threat Model</title>
<p>We consider the backdoor attack in the outsourced training scenario, in which a user uploads his dataset and model structure to the third-party. For simplicity, we conduct our idea on the image classification task. The third party returns a well-trained classifier <inline-formula id="ieqn-2"><mml:math id="mml-ieqn-2"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi>&#x03B8;</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> to the user, who will evaluate the classifier on the validation set. The user only accepts the classifier if its accuracy reaches his target accuracy decided by his prior knowledge.</p>
<p><bold>Attack&#x2019;s knowledge.</bold> In our attack scenario, we suppose that attackers can obtain a deep neural network structure and the entire training set uploaded by the user. Attackers control the procedure of training but do not see the validation set. Attackers are also not allowed to change the classifier structure, which will easily be perceived by the user.</p>
<p><bold>Attack&#x2019;s goals.</bold> The attacker aims to implant a backdoor into the classifier, which is only activated when the predefined trigger appears. The classifier can achieve prediction with high accuracy as close to the benign classifier as possible for clean images. Apart from the attack ability, stealthiness is also not trivial for the attacker. Stealthiness should be considered from two aspects, i.e., the stealthiness of poisoned images (clean images with the trigger) and the stealthiness of backdoored classifier. If the trigger is too obvious, it will be removed by humans manually. Therefore, the trace of the trigger in the poisoned image should be as little as possible. In the stealthiness of the backdoored classifier, many detectors can scan the structure and parameters of the classifier directly. The goals of the attack can be summarized as follows:</p>
<p><disp-formula id="ueqn-1"><mml:math id="mml-ueqn-1" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd /><mml:mtd><mml:mi>G</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mtable columnalign="left" rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mrow><mml:mo movablelimits="true" form="prefix">max</mml:mo><mml:mi>p</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi>b</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>p</mml:mi><mml:mi>o</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mi>e</mml:mi><mml:mi>d</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mi>T</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>g</mml:mi><mml:mi>e</mml:mi><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mo movablelimits="true" form="prefix">min</mml:mo><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi>b</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:msub></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:msub></mml:mrow></mml:mrow><mml:mo>|</mml:mo></mml:mrow></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mo movablelimits="true" form="prefix">min</mml:mo><mml:mrow><mml:mrow><mml:mrow><mml:mtext>distance</mml:mtext></mml:mrow></mml:mrow></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi>b</mml:mi></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mo movablelimits="true" form="prefix">min</mml:mo><mml:mrow><mml:mrow><mml:mrow><mml:mtext>distance</mml:mtext></mml:mrow></mml:mrow></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>c</mml:mi><mml:mi>l</mml:mi><mml:mi>e</mml:mi><mml:mi>a</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>p</mml:mi><mml:mi>o</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mi>e</mml:mi><mml:mi>d</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>where <inline-formula id="ieqn-3"><mml:math id="mml-ieqn-3"><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi>b</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula id="ieqn-4"><mml:math id="mml-ieqn-4"><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> are the accuracy of the backdoored classifier and benign one, which is evaluated on the validation set by the user. <inline-formula id="ieqn-5"><mml:math id="mml-ieqn-5"><mml:msub><mml:mi>F</mml:mi><mml:mi>b</mml:mi></mml:msub></mml:math></inline-formula> and <inline-formula id="ieqn-6"><mml:math id="mml-ieqn-6"><mml:msub><mml:mi>F</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:math></inline-formula> represent the backdoored classifier and benign one. <inline-formula id="ieqn-7"><mml:math id="mml-ieqn-7"><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>p</mml:mi><mml:mi>o</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mi>e</mml:mi><mml:mi>d</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula id="ieqn-8"><mml:math id="mml-ieqn-8"><mml:mi>T</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>g</mml:mi><mml:mi>e</mml:mi><mml:mi>t</mml:mi></mml:math></inline-formula> are the clean images with the trigger and the target label, respectively.</p>
</sec>
<sec id="s3_2">
<label>3.2</label><title>Overall of Our Method</title>
<p>To achieve the aforementioned goal, We should simultaneously ensure the stealthiness of both poisoned images and backdoored classifier. <xref ref-type="fig" rid="fig-2">Fig. 2</xref> depicts the framework of our method. Each image obtains its adaptive trigger by the combination of texture detection and generator. We argue that modifying pixels in plain areas of an image is easier to be perceived than rich texture areas. Therefore, we employ the texture detection module, which is responsible for generating the mask of the trigger. It ensures that the trigger is only allowed to appear in the rich texture areas. The generator is used to create the trigger for the entire image. To improve the visual performance, we restrict the distortion of poisoned images caused by the trigger. In other words, we use <inline-formula id="ieqn-9"><mml:math id="mml-ieqn-9"><mml:msub><mml:mi>L</mml:mi><mml:mn>0</mml:mn></mml:msub></mml:math></inline-formula>-norm as the criterion to measure the distance between clean images and poisoned ones. In order to avoid generating shortcuts between the decision boundary of the backdoored classifier, we propose a <italic>parameter clip</italic> mechanism, which guarantees the distance between the backdoored classifier and the benign one.</p>
<fig id="fig-2">
<label>Figure 2</label>
<caption><title>The framework of our method</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMES_25923-fig-2.tif"/>
</fig>
</sec>
<sec id="s3_3">
<label>3.3</label><title>Overall of Our Method</title>
<p>To achieve the aforementioned goal, We should simultaneously ensure the stealthiness of both poisoned images and backdoored classifier. <xref ref-type="fig" rid="fig-2">Fig. 2</xref> depicts the framework of our method. Each image obtains its adaptive trigger by the combination of texture detection and generator. We argue that modifying pixels in plain areas of an image is easier to be perceived than rich texture areas. Therefore, we employ the texture detection module, which is responsible for generating the mask of the trigger. It ensures that the trigger is only allowed to appear in the rich texture areas. The generator is used to create the trigger for the entire image. To improve the visual performance, we restrict the distortion of poisoned images caused by the trigger. In other words, we use <inline-formula id="ieqn-10"><mml:math id="mml-ieqn-10"><mml:msub><mml:mi>L</mml:mi><mml:mn>0</mml:mn></mml:msub></mml:math></inline-formula>-norm as the criterion to measure the distance between clean images and poisoned ones. In order to avoid generating shortcuts between the decision boundary of the backdoored classifier, we propose a <italic>parameter clip</italic> mechanism, which guarantees the distance between the backdoored classifier and the benign one.</p>
</sec>
<sec id="s3_4">
<label>3.4</label><title>Stealthiness of Poisoned Images</title>
<p>For the stealthiness of poisoned images, we employ the combination of texture detection and generator. The texture detection module aims to select the appropriate location where the trigger embeds. We use the smoothness metric HILL [<xref ref-type="bibr" rid="ref-20">20</xref>] to measure the texture richness, which gives each pixel a value, and pixels located in the plain regions will obtain a large value. For a clean image <inline-formula id="ieqn-11"><mml:math id="mml-ieqn-11"><mml:mrow><mml:msub><mml:mi>X</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, the formulaic expression of HILL is defined in <xref ref-type="disp-formula" rid="eqn-1">(1)</xref>.</p>
<p><disp-formula id="eqn-1"><label>(1)</label><mml:math id="mml-eqn-1" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd /><mml:mtd><mml:mi>W</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>&#x2297;</mml:mo><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi>h</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>|</mml:mo></mml:mrow><mml:mo>&#x2297;</mml:mo><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow></mml:mrow></mml:mfrac><mml:mo>&#x2297;</mml:mo><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>where <inline-formula id="ieqn-12"><mml:math id="mml-ieqn-12"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula id="ieqn-13"><mml:math id="mml-ieqn-13"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> are two average filters sized 3 <inline-formula id="ieqn-14"><mml:math id="mml-ieqn-14"><mml:mo>&#x00D7;</mml:mo></mml:math></inline-formula> 3 and 15 <inline-formula id="ieqn-15"><mml:math id="mml-ieqn-15"><mml:mo>&#x00D7;</mml:mo></mml:math></inline-formula> 15, respectively. <inline-formula id="ieqn-16"><mml:math id="mml-ieqn-16"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi>h</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is a high-pass filter used to calculate the residual of clean images. It can be seen as a convolution operation whose kernel parameters are shown in <xref ref-type="disp-formula" rid="eqn-2">(2)</xref>.</p>
<p><disp-formula id="eqn-2"><label>(2)</label><mml:math id="mml-eqn-2" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd /><mml:mtd><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi>h</mml:mi></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mtable columnalign="center center center" rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:mtd><mml:mtd><mml:mn>2</mml:mn></mml:mtd><mml:mtd><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mn>2</mml:mn></mml:mtd><mml:mtd><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mn>4</mml:mn></mml:mrow></mml:mtd><mml:mtd><mml:mn>2</mml:mn></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:mtd><mml:mtd><mml:mn>2</mml:mn></mml:mtd><mml:mtd><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>HILL will assign pixels located in the plain areas large value. Two average filters as low-pass filters are used for numerical smoothing. Then, we set a threshold <italic>T</italic> to binarize the result of HILL as <xref ref-type="disp-formula" rid="eqn-3">(3)</xref>.<disp-formula id="eqn-3"><label>(3)</label><mml:math id="mml-eqn-3" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd /><mml:mtd><mml:mi>M</mml:mi><mml:mi>a</mml:mi><mml:mi>s</mml:mi><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mtable columnalign="left left" rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mn>1</mml:mn></mml:mtd><mml:mtd><mml:mrow><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>&#x003C;=</mml:mo><mml:mi>T</mml:mi></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mn>0</mml:mn></mml:mtd><mml:mtd><mml:mrow><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>&#x003E;</mml:mo><mml:mi>T</mml:mi></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>Edge detection, widely used in common image processing, is another simple texture detection. Most pixels at the edge part of images change drastically. Trigger embedded into such areas is hard to be perceived. In view of this, we can use edge detection to obtain the adaptive mask of the trigger. <xref ref-type="fig" rid="fig-3">Fig. 3</xref> shows some masks generated by the HILL and edge detection (Sobel operator) with multiple threshold <italic>T</italic>. Note that the mask area becomes large as the threshold <italic>T</italic> increases, but it is the opposite in the edge detection. Two averaging filters contribute to that the mask generated by HILL is smoother than edge detection.</p>
<fig id="fig-3">
<label>Figure 3</label>
<caption><title>An illustration for the texture detection</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMES_25923-fig-3.tif"/>
</fig>
<p>The generator is trained to create a pre-trigger for clean images. We use U-Net [<xref ref-type="bibr" rid="ref-21">21</xref>] as the backbone of the generator, which is commonly used as the baseline in medical image segmentation tasks. The size of the output is identical to the input in the U-Net. Excessive modification may cause visual abnormalities. We use the <inline-formula id="ieqn-17"><mml:math id="mml-ieqn-17"><mml:msub><mml:mi>L</mml:mi><mml:mn>0</mml:mn></mml:msub></mml:math></inline-formula>-norm as the criterion, which depends on the maximum modification between two images. In the practical application, we design two different approaches to achieve the restriction of the <inline-formula id="ieqn-18"><mml:math id="mml-ieqn-18"><mml:msub><mml:mi>L</mml:mi><mml:mn>0</mml:mn></mml:msub></mml:math></inline-formula>-norm. <xref ref-type="fig" rid="fig-4">Fig. 4</xref> shows the specific structure of the generator.</p>
<fig id="fig-4">
<label>Figure 4</label>
<caption><title>The details of the generator</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMES_25923-fig-4.tif"/>
</fig>
<p>For simplicity, we can use a truncation function like the HardTanh as an activation function for the output of the last layer. <xref ref-type="disp-formula" rid="eqn-4">(4)</xref> expresses the definition of the HardTanh function.</p>
<p><disp-formula id="eqn-4"><label>(4)</label><mml:math id="mml-eqn-4" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd /><mml:mtd><mml:mi>h</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>d</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>h</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mtable columnalign="left" rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mi>i</mml:mi></mml:mtd><mml:mtd><mml:mrow><mml:mi>i</mml:mi><mml:mi>f</mml:mi></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mi>x</mml:mi><mml:mo>&#x003E;</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mi>i</mml:mi><mml:mi>f</mml:mi></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mi>x</mml:mi><mml:mo>&#x003C;=</mml:mo><mml:mo>&#x2212;</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi>x</mml:mi></mml:mtd><mml:mtd><mml:mi>o</mml:mi><mml:mi>t</mml:mi><mml:mi>h</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mi>w</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>e</mml:mi></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>where <inline-formula id="ieqn-19"><mml:math id="mml-ieqn-19"><mml:mi>i</mml:mi></mml:math></inline-formula> is a hyperparameter to control the maximum modification range. We can minimize the value of the trigger as small as possible with the help of the HardTanh function. For a valid image, pixels must be an integer, and the round error can be neglected when the threshold <inline-formula id="ieqn-20"><mml:math id="mml-ieqn-20"><mml:mi>i</mml:mi></mml:math></inline-formula> is relatively large. However, if modification caused by the trigger is small, e.g., no more than 3, the round error cannot be neglected. We propose a novel approach to meet the demand of the restriction of <inline-formula id="ieqn-21"><mml:math id="mml-ieqn-21"><mml:msub><mml:mi>L</mml:mi><mml:mn>0</mml:mn></mml:msub></mml:math></inline-formula>-norm.</p>
<p>To avoid the round error on the small modification, we hope that the generator outputs integers instead of float-point numbers. Inspired by the image steganography technique [<xref ref-type="bibr" rid="ref-22">22</xref>], which modifies the pixels slightly, we design the generator based on the <italic>Modification Probability Matrices</italic> (MPM). We suppose that the maximum modification is no more than 1 as an example. The U-Net output becomes a pair of MPM, whose elements mean the &#x002B;1 or &#x2212;1 probability of corresponding pixels. The activation function of the last layer of U-Net is expressed as <inline-formula id="ieqn-22"><mml:math id="mml-ieqn-22"><mml:mi>x</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mrow><mml:mtext>Sigmod</mml:mtext></mml:mrow></mml:mrow></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:math></inline-formula> to limit the range of MPM between 0 and 1/2. Then, we use <xref ref-type="disp-formula" rid="eqn-5">(5)</xref> to transform the MPM to the integers as the trigger.<disp-formula id="eqn-5"><label>(5)</label><mml:math id="mml-eqn-5" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd /><mml:mtd><mml:mrow><mml:msub><mml:mi>m</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mtable columnalign="left" rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mi>i</mml:mi><mml:mi>f</mml:mi></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>&#x003C;</mml:mo><mml:msubsup><mml:mi>p</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mn>1</mml:mn></mml:mtd><mml:mtd><mml:mrow><mml:mi>i</mml:mi><mml:mi>f</mml:mi></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>&#x003E;</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:msubsup><mml:mi>p</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mn>0</mml:mn></mml:mtd><mml:mtd><mml:mrow><mml:mi>o</mml:mi><mml:mi>t</mml:mi><mml:mi>h</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mi>w</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>e</mml:mi></mml:mrow></mml:mtd><mml:mtd><mml:mrow></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>where <inline-formula id="ieqn-23"><mml:math id="mml-ieqn-23"><mml:msubsup><mml:mi>p</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula id="ieqn-24"><mml:math id="mml-ieqn-24"><mml:msubsup><mml:mi>p</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup></mml:math></inline-formula> are the elements in the MPM, and <inline-formula id="ieqn-25"><mml:math id="mml-ieqn-25"><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is a random number in the interval of [0,1]. However, <xref ref-type="disp-formula" rid="eqn-5">(5)</xref> is a step function and non-differentiable, which cannot be added to the back-propagation pipeline. We train a neural network named TransformNet to simulate this function in advance, and its structure is shown in <xref ref-type="table" rid="table-1">Table 1</xref>. With the combination of the MPM and TransformNet, we can generate the trigger whose values are in {0,&#x2212;1,1}. In some cases, we can employ more MPM pairs like <inline-formula id="ieqn-26"><mml:math id="mml-ieqn-26"><mml:mo>&#x00B1;</mml:mo></mml:math></inline-formula>1 and <inline-formula id="ieqn-27"><mml:math id="mml-ieqn-27"><mml:mo>&#x00B1;</mml:mo></mml:math></inline-formula>2 to obtain a wider range of the trigger. We name these two methods as HardTanh-based generator and MPM-based generator, respectively.</p>
<table-wrap id="table-1">
<label>Table 1</label>
<caption><title>The structure of simulation for <xref ref-type="disp-formula" rid="eqn-5">Eq. (5)</xref></title>
</caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th>Layer type</th>
<th>Input channel</th>
<th>Output channel</th>
</tr>
</thead>
<tbody>
<tr>
<td>Full connection &#x002B; Relu</td>
<td>3</td>
<td>16</td>
</tr>
<tr>
<td>Full connection &#x002B; Relu</td>
<td>16</td>
<td>32</td>
</tr>
<tr>
<td>Full connection &#x002B; Tanh</td>
<td>32</td>
<td>1</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s3_5">
<label>3.5</label><title>Stealthiness of Backdoored Classifiers</title>
<p>In this subsection, we describe our solution named <italic>parameter clip</italic> to ensure the stealthiness of the backdoored classifier. The normal classifier trained on clean images is used as a reference. The detail of the algorithm is shown in Algorithm 1, where <inline-formula id="ieqn-28"><mml:math id="mml-ieqn-28"><mml:mi>r</mml:mi></mml:math></inline-formula> is hyperparameter to control the distance between the backdoored classifier and the benign one. For example, if we set <inline-formula id="ieqn-29"><mml:math id="mml-ieqn-29"><mml:mi>r</mml:mi></mml:math></inline-formula> as 0.1, it presents that the value of the parameter of the backdoored classifier is no more than 1.1 times or no less than 0.9 times of the benign one. The smaller the value of <inline-formula id="ieqn-30"><mml:math id="mml-ieqn-30"><mml:mi>r</mml:mi></mml:math></inline-formula> is, the closer the distance between the backdoored classifier and the benign one is. <italic>N</italic> is the total number of the parameters. The subscript <inline-formula id="ieqn-31"><mml:math id="mml-ieqn-31"><mml:mi>i</mml:mi></mml:math></inline-formula> represents the <inline-formula id="ieqn-32"><mml:math id="mml-ieqn-32"><mml:mi>i</mml:mi></mml:math></inline-formula>-th parameter in the classifier. We clip the parameters of the backdoored model after updating the parameter on each mini-batch data at every step.</p>
<fig id="fig-14">
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMES_25923-fig-14.tif"/>
</fig>
<p><xref ref-type="fig" rid="fig-5">Fig. 5</xref> illustrates the principle of the parameter clip. We take a simple classification as an example. There are four categories, and the blue triangular part is supposed as the target label by the backdoor attack. As mentioned in the Neural Cleanse [<xref ref-type="bibr" rid="ref-18">18</xref>], the backdoor attack changes the decision boundary and creates shortcuts between the target label and others, i.e., a slight disturbance can make the samples misclassified as the target label. In the upper right corner of <xref ref-type="fig" rid="fig-5">Fig. 5</xref>, the black dotted line and the solid line indicate the decision boundary of the clean classifier and backdoored one. Parameter clip eliminates these shortcuts by limiting the modification range of the backdoored classifier.</p>
<fig id="fig-5">
<label>Figure 5</label>
<caption><title>An illustration for the parameter clip. The black line represents the decision boundary</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMES_25923-fig-5.tif"/>
</fig>
</sec>
<sec id="s3_6">
<label>3.6</label><title>Cost Function</title>
<p>For the final cost function, we use the structural similarity index measure (SSIM) as a regular term to further improve the image visual quality. SSIM is widely used as an image quality metric. The clean images are used as the distortion-free image for reference. The cost function of our method can be expressed as <xref ref-type="disp-formula" rid="eqn-6">(6)</xref></p>
<p><disp-formula id="eqn-6"><label>(6)</label><mml:math id="mml-eqn-6" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd /><mml:mtd><mml:mi>L</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mi>l</mml:mi><mml:mrow><mml:mi>p</mml:mi><mml:mi>o</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mi>e</mml:mi><mml:mi>d</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>+</mml:mo><mml:mi>a</mml:mi><mml:mo>&#x22C5;</mml:mo><mml:mrow><mml:msub><mml:mi>l</mml:mi><mml:mrow><mml:mi>c</mml:mi><mml:mi>l</mml:mi><mml:mi>e</mml:mi><mml:mi>a</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>+</mml:mo><mml:mi>b</mml:mi><mml:mo>&#x22C5;</mml:mo><mml:mrow><mml:msub><mml:mi>l</mml:mi><mml:mrow><mml:mi>s</mml:mi><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>m</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>where <inline-formula id="ieqn-43"><mml:math id="mml-ieqn-43"><mml:mi>a</mml:mi></mml:math></inline-formula> and <inline-formula id="ieqn-44"><mml:math id="mml-ieqn-44"><mml:mi>b</mml:mi></mml:math></inline-formula> are two balance factors. <inline-formula id="ieqn-45"><mml:math id="mml-ieqn-45"><mml:msub><mml:mi>l</mml:mi><mml:mrow><mml:mi>p</mml:mi><mml:mi>o</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mi>e</mml:mi><mml:mi>d</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula id="ieqn-46"><mml:math id="mml-ieqn-46"><mml:msub><mml:mi>l</mml:mi><mml:mrow><mml:mi>c</mml:mi><mml:mi>l</mml:mi><mml:mi>e</mml:mi><mml:mi>a</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> are two cross-entropy loss of clean images and poisoned images, respectively. We update the parameters of the generator and classifier simultaneously to minimize the <xref ref-type="disp-formula" rid="eqn-6">(6)</xref>. Parameter clip will be executed after each updating of the parameters.</p>
</sec>
</sec>
<sec id="s4">
<label>4</label><title>Experiments</title>
<sec id="s4_1">
<label>4.1</label><title>Experimental Setup</title>
<p>To be comparable with previous methods, we conduct experiments on the CIFAR-10 and GTSRB datasets. The size of images in the two datasets is all <inline-formula id="ieqn-47"><mml:math id="mml-ieqn-47"><mml:mn>32</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>32</mml:mn></mml:math></inline-formula>. To evaluate our methods on large-size images, we randomly select ten categories in ImageNet [<xref ref-type="bibr" rid="ref-23">23</xref>] named selected-ImageNet. For each class, there are 1300 and 50 images for training and evaluation, respectively. Details of the three datasets are described in <xref ref-type="table" rid="table-2">Table 2</xref>. As images in the MNIST dataset are almost binary images without any rich textural regions, they are not common in practical application. Our method aims to hide the trigger into the texture adaptively and does not arouse visual abnormalities. As a consequence, we do not evaluate our method on the MNIST dataset. We use Pre-activation ResNet-18 [<xref ref-type="bibr" rid="ref-24">24</xref>] as the classifier for CIFAR-10 and GTSRB datasets. The selected-ImageNet classifier is the fine-tuned model of ResNet-18, which is trained on the original ImageNet. The target label of the backdoor is set to &#x201C;0&#x201D;.</p>
<table-wrap id="table-2">
<label>Table 2</label>
<caption><title>Detailed information of the datasets</title>
</caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th>Dataset</th>
<th>Labels</th>
<th>Size</th>
<th>Classifier</th>
<th>Number of images</th>
</tr>
</thead>
<tbody>
<tr>
<td>CIFAR-10</td>
<td>10</td>
<td><inline-formula id="ieqn-48"><mml:math id="mml-ieqn-48"><mml:mn>32</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>32</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>3</mml:mn></mml:math></inline-formula></td>
<td>PreActRes18</td>
<td>60000</td>
</tr>
<tr>
<td>GTSRB</td>
<td>43</td>
<td><inline-formula id="ieqn-49"><mml:math id="mml-ieqn-49"><mml:mn>32</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>32</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>3</mml:mn></mml:math></inline-formula></td>
<td>PreActRes18</td>
<td>50000</td>
</tr>
<tr>
<td>Selected-ImageNet</td>
<td>10</td>
<td><inline-formula id="ieqn-50"><mml:math id="mml-ieqn-50"><mml:mn>128</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>128</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>3</mml:mn></mml:math></inline-formula></td>
<td>ResNet-18</td>
<td>13500</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s4_2">
<label>4.2</label><title>Visual Evaluation</title>
<p>We set our backdoor attack model as the single-target attack, in which all images superposed with the pre-defined trigger will be identified as the backdoor target label. <xref ref-type="fig" rid="fig-6">Fig. 6</xref> shows the visual quality of our method without parameter clip, which ensures the best visual performance. The images on the left, middle, and right are clean images, malicious images with the trigger, and the trigger, respectively. We employ MPM-based update strategy in the CIFAR-10 and GTSRB, and HardTanh-based update strategy in selected-ImageNet. The combination of <inline-formula id="ieqn-51"><mml:math id="mml-ieqn-51"><mml:mo>&#x00B1;</mml:mo></mml:math></inline-formula>1 and <inline-formula id="ieqn-52"><mml:math id="mml-ieqn-52"><mml:mo>&#x00B1;</mml:mo></mml:math></inline-formula>2 MPM ensures that the maximum modification is less than 3 in the CIFAR-10 and GTSRB. In the selected-ImageNet, we set the maximum modification as 10. These modifications are small enough not to be perceived by humans, and most are located in rich textural regions, which leads to good visual quality.</p>
<fig id="fig-6">
<label>Figure 6</label>
<caption><title>The visualization of our method without parameter clip over three datasets</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMES_25923-fig-6.tif"/>
</fig>
<p><xref ref-type="fig" rid="fig-7">Fig. 7</xref> shows our method using the parameter clip and other methods. Parameter clip restricts the modification range of the classifier, which increases the difficulty of implanting the backdoor into a benign classifier. Therefore, we enhance the strength of the trigger to achieve a good backdoor attacker success rate. We use the Hardtanh-based update strategy and set the threshold to 12 and 40 for CIFAR-10 and GTSRB, respectively. The experimental results indicate the visual quality is slightly inferior to the images without the limation of parameter clip. However, the results are still much better than the BadNets and dynamic backdoor. The triggers in the other two methods are much obvious and easy to be perceived. Users can hardly find the anomaly of poisoned images generated by our method without their corresponding clean images as reference.</p>
<fig id="fig-7">
<label>Figure 7</label>
<caption><title>The visualization of our method with parameter clip</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMES_25923-fig-7.tif"/>
</fig>
</sec>
<sec id="s4_3">
<label>4.3</label><title>Attack Ability</title>
<p>We consider the backdoor attack success rate (BASR) and the impacts on the original accuracy (OA) over normal classification in terms of attack ability. The accuracy of clean images on the classifier without backdoors named OA-C. The detailed information and hyperparameter setting are illustrated in <xref ref-type="table" rid="table-3">Table 3</xref> and corresponding images have been shown in <xref ref-type="sec" rid="s4_1">Subsections 4.1</xref> and <xref ref-type="sec" rid="s4_2">4.2</xref>. For all three datasets and various hyperparameters, the BASR is almost 100%, while the original accuracy over clean images has only slightly dropped.</p>
<table-wrap id="table-3">
<label>Table 3</label>
<caption><title>Detailed information of attack ability</title>
</caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th>Dataset</th>
<th>OA-C(%)</th>
<th>Generator</th>
<th>Hyperparameter</th>
<th>BASR(%)</th>
<th>OA(%)</th>
</tr>
</thead>
<tbody>
<tr>
<td>CIFAR-10</td>
<td>95.02</td>
<td>MPM-based</td>
<td>Combation of <inline-formula id="ieqn-53"><mml:math id="mml-ieqn-53"><mml:mo>&#x00B1;</mml:mo></mml:math></inline-formula> 1 and <inline-formula id="ieqn-54"><mml:math id="mml-ieqn-54"><mml:mo>&#x00B1;</mml:mo></mml:math></inline-formula> 2 MPM ; <inline-formula id="ieqn-55"><mml:math id="mml-ieqn-55"><mml:mi>r</mml:mi></mml:math></inline-formula> &#x003D;&#x002B; <inline-formula id="ieqn-56"><mml:math id="mml-ieqn-56"><mml:mi mathvariant="normal">&#x221E;</mml:mi></mml:math></inline-formula></td>
<td>100</td>
<td>94.59</td>
</tr>
<tr>
<td></td>
<td></td>
<td>HardTanh-based</td>
<td><inline-formula id="ieqn-57"><mml:math id="mml-ieqn-57"><mml:mi>i</mml:mi></mml:math></inline-formula> &#x003D;12 ; <inline-formula id="ieqn-58"><mml:math id="mml-ieqn-58"><mml:mi>r</mml:mi></mml:math></inline-formula> &#x003D; 1e-3</td>
<td>99.07</td>
<td>94.82</td>
</tr>
<tr>
<td>GTSRB</td>
<td>99.50</td>
<td>MPM-based</td>
<td>Combation of <inline-formula id="ieqn-59"><mml:math id="mml-ieqn-59"><mml:mo>&#x00B1;</mml:mo></mml:math></inline-formula> 1 and <inline-formula id="ieqn-60"><mml:math id="mml-ieqn-60"><mml:mo>&#x00B1;</mml:mo></mml:math></inline-formula> 2 MPM; <inline-formula id="ieqn-61"><mml:math id="mml-ieqn-61"><mml:mi>r</mml:mi></mml:math></inline-formula> &#x003D;&#x002B; <inline-formula id="ieqn-62"><mml:math id="mml-ieqn-62"><mml:mi mathvariant="normal">&#x221E;</mml:mi></mml:math></inline-formula></td>
<td>99.79</td>
<td>99.37</td>
</tr>
<tr>
<td></td>
<td></td>
<td>HardTanh-based</td>
<td><inline-formula id="ieqn-63"><mml:math id="mml-ieqn-63"><mml:mi>i</mml:mi></mml:math></inline-formula> &#x003D; 40 ; <inline-formula id="ieqn-64"><mml:math id="mml-ieqn-64"><mml:mi>r</mml:mi></mml:math></inline-formula> &#x003D; 1e-3</td>
<td>99.02</td>
<td>99.36</td>
</tr>
<tr>
<td>Selected-ImageNet</td>
<td>92.66</td>
<td>HardTanh-based</td>
<td><inline-formula id="ieqn-65"><mml:math id="mml-ieqn-65"><mml:mi>i</mml:mi></mml:math></inline-formula> &#x003D; 10 ; <inline-formula id="ieqn-66"><mml:math id="mml-ieqn-66"><mml:mi>r</mml:mi></mml:math></inline-formula> &#x003D;&#x002B; <inline-formula id="ieqn-67"><mml:math id="mml-ieqn-67"><mml:mi mathvariant="normal">&#x221E;</mml:mi></mml:math></inline-formula></td>
<td>99.80</td>
<td>91.60</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>These experiments indicate that our method can make the classifier misclassify for images with their unique trigger and have a good performance on the benign inputs. We notice that with the restriction of parameter clip, the BASR is slightly inferior to the attack without the parameter clip even we have enhanced the strength of the trigger. But the attack success rate still exceeds 99%. For our application scenario, the user can not perceive the anomaly of the well-trained model given by an outsourcing computation provider.</p>
</sec>
<sec id="s4_4">
<label>4.4</label><title>Defense Experiments</title>
<p>We evaluate our attack approaches against the classical model-based defense Neural Cleanse [<xref ref-type="bibr" rid="ref-18">18</xref>] and pruning [<xref ref-type="bibr" rid="ref-19">19</xref>] on the CIFAR-10 and GTSRB dataset. Neural Cleanse is an effective detector to reveal whether a well-trained network contains the backdoor. Common backdoor changes the decision boundary of the classifier, which generates a shortcut between the target label and others. Neural Cleanse measures the minimum modification to modify all clean labels to a certain label. In the classifier containing a backdoor, the pattern of the target label is much smaller than others. Neural Cleanse detects the backdoor by the Anomaly Index metric and sets 2 as the threshold. As shown in <xref ref-type="fig" rid="fig-8">Fig. 8</xref>, our method with parameter clip can pass this detector. The explanation is that each image owns its adaptive trigger, and Neural Cleanse cannot find a universal pattern for the entire dataset.</p>
<fig id="fig-8">
<label>Figure 8</label>
<caption><title>Experimental results of Neural Cleanse</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMES_25923-fig-8.tif"/>
</fig>
<p><xref ref-type="fig" rid="fig-9">Fig. 9</xref> takes the CIFAR-10 as an example and shows some reversed candidate triggers by Neural Cleanse. The trigger of our method cannot be reversely constructed. <xref ref-type="fig" rid="fig-10">Fig. 10</xref> illustrates the minimum modification for each label over CIFAR-10 by Neural Cleanse. We take the label &#x201C;plane&#x201D; as an example, and the ordinate represents the minimum modification for all images required to cause the classifier to misclassify all images as &#x201C;plane&#x201D;. If there is a shortcut for the target label caused by backdoor attacks, the minimum modification for the target label will significantly smaller than other labels. Our method keeps the minimum modification for the target label reasonable and will not arouse abnormally small outliers to be detected as a backdoor.</p>
<fig id="fig-9">
<label>Figure 9</label>
<caption><title>The reversed candidate triggers for each class on CIFAR-10 dataset</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMES_25923-fig-9.tif"/>
</fig><fig id="fig-10">
<label>Figure 10</label>
<caption><title>The minimum modification for each label over CIFAR-10 generated by Neural Cleanse</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMES_25923-fig-10.tif"/>
</fig>
<p>Apart from Neural Cleanse, pruning focuses on neuron analyses. It mitigates and removes the backdoor by pruning the neurons which are inactive on clean images. We also evaluate our method against pruning. <xref ref-type="fig" rid="fig-11">Fig. 11</xref> presents the accuracy of the original task and backdoor attack with respect to the number of neurons on the CIFAR-10 and GTSRB. For both datasets, the accuracy of clean images drops a lot when the backdoor is removed. Especially in the CIFAR-10 dataset, only when the 509th neuron (512 neurons in total) is pruned as the accuracy of backdoor decreases significantly, but the accuracy of clean images drops to 55%.</p>
<fig id="fig-11">
<label>Figure 11</label>
<caption><title>Experimental results for pruning on the CIFAR-10 and GTSRB dataset</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMES_25923-fig-11.tif"/>
</fig>
</sec>
<sec id="s4_5">
<label>4.5</label><title>Ablation Studies</title>
<p>To demonstrate the efficacy of our adaptive mask, we train the classifier without an adaptive mask and using common <inline-formula id="ieqn-68"><mml:math id="mml-ieqn-68"><mml:msub><mml:mi>L</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:math></inline-formula>-norm loss. From <xref ref-type="fig" rid="fig-12">Fig. 12</xref>, we can see that the output trigger of a generator for all images is almost identical. Especially, the trigger is fixed in the upper left corner in the GTSRB dataset. In this case, our method degenerates into BadNets. The universal trigger can easily make shortcuts between different labels, which makes it vulnerable to be detected and reversely constructed by Neural Cleanse. The adaptive mask module is an essential element for creating a unique trigger for each image. Threshold <italic>T</italic> mentioned in <xref ref-type="disp-formula" rid="eqn-4">(4)</xref> is an important hyperparameter used to control the size of masks. If the size of adaptive masks is too small, it may cause the backdoor attack to fail. Therefore, we employ the HILL with <inline-formula id="ieqn-69"><mml:math id="mml-ieqn-69"><mml:mi>T</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:math></inline-formula> and edge detection with <inline-formula id="ieqn-70"><mml:math id="mml-ieqn-70"><mml:mi>T</mml:mi><mml:mo>=</mml:mo><mml:mn>10</mml:mn></mml:math></inline-formula> for GTSRB and CIFAR-10, respectively. Hyperparameter <inline-formula id="ieqn-71"><mml:math id="mml-ieqn-71"><mml:mi>b</mml:mi></mml:math></inline-formula> is an important factor controlling the entire image quality of poisoned images. We set <inline-formula id="ieqn-72"><mml:math id="mml-ieqn-72"><mml:mi>b</mml:mi></mml:math></inline-formula> as 2 when the maximum modification range is set to 40 in GTSRB. For other cases, the maximum modification range is relatively small, and we set hyperparameter <inline-formula id="ieqn-73"><mml:math id="mml-ieqn-73"><mml:mi>b</mml:mi></mml:math></inline-formula> as 0.</p>
<fig id="fig-12">
<label>Figure 12</label>
<caption><title>Experimental results for the backdoor attack without adaptive mask module</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMES_25923-fig-12.tif"/>
</fig>
<p>Besides the adaptive mask module, parameter clip is an important item of countering detectors in our method. Parameter clip makes sure that the classifier implanted with the backdoor can evade detection. We remove the parameter clip and limit the intensity of the trigger to less than 3. The visual performance has been shown in <xref ref-type="fig" rid="fig-6">Fig. 6</xref>. In this case, the anomaly index given by Neural Cleanse becomes 8, which is much larger than the threshold 2.</p>

<p>We analyze how the hyperparameter <inline-formula id="ieqn-74"><mml:math id="mml-ieqn-74"><mml:mi>a</mml:mi></mml:math></inline-formula> affects the backdoor attack success rate and accuracy over the clean images. We train the classifier on the CIFAR-10 and GTSRB with <inline-formula id="ieqn-75"><mml:math id="mml-ieqn-75"><mml:mi>a</mml:mi></mml:math></inline-formula> varying from 0.2 to 1.0. The accuracy of both clean and malicious images is almost unchanged with different <inline-formula id="ieqn-76"><mml:math id="mml-ieqn-76"><mml:mi>a</mml:mi></mml:math></inline-formula>. As shown in <xref ref-type="fig" rid="fig-13">Fig. 13</xref>, our method is robust to selecting <inline-formula id="ieqn-77"><mml:math id="mml-ieqn-77"><mml:mi>a</mml:mi></mml:math></inline-formula>. We set <inline-formula id="ieqn-78"><mml:math id="mml-ieqn-78"><mml:mi>a</mml:mi></mml:math></inline-formula> as 1 in all experiments.</p>
<fig id="fig-13">
<label>Figure 13</label>
<caption><title>Poisoned and clean images accuracy on the CIFAR10 and GTSRB when changing <inline-formula id="ieqn-79"><mml:math id="mml-ieqn-79"><mml:mi>a</mml:mi></mml:math></inline-formula></title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMES_25923-fig-13.tif"/>
</fig>
</sec>
</sec>
<sec id="s5">
<label>5</label><title>Conclusions</title>
<p>In this paper, we propose a content-based adaptive backdoor attack. This is the first work generating the adaptive trigger for each clean image. The backdoored classifier performs well on the clean images and is totally fooled when the trigger appears. Our method achieves high attack ability and stealthiness of both the backdoored classifier and poisoned images compared with existing attacks. This work reveals an insidious backdoor attack, which brings a great challenge to the AI security field. In the future, we will extend our methods to other applications.</p>
</sec>
</body>
<back>
<sec><title>Funding Statement</title>
<p>The authors received no specific funding for this study.</p>
</sec>
<sec sec-type="COI-statement"><title>Conflicts of Interest</title>
<p>The authors declare that they have no conflicts of interest to report regarding the present study.</p>
</sec>
<ref-list content-type="authoryear"><title>References</title>
<ref id="ref-1"><label>1.</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>He</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Zhang</surname>, <given-names>X.</given-names></string-name>, <string-name><surname>Ren</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Sun</surname>, <given-names>J.</given-names></string-name></person-group> (<year>2016</year>). <article-title>Deep residual learning for image recognition</article-title>. <conf-name>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</conf-name>, <publisher-loc>Las Vegas, USA</publisher-loc>.</mixed-citation></ref>
<ref id="ref-2"><label>2.</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Vaswani</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Shazeer</surname>, <given-names>N.</given-names></string-name>, <string-name><surname>Parmar</surname>, <given-names>N.</given-names></string-name>, <string-name><surname>Uszkoreit</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Jones</surname>, <given-names>L.</given-names></string-name> <etal>et al.</etal></person-group> (<year>2017</year>). <article-title>Attention is all you need</article-title>. <source>Advances in Neural Information Processing systems</source>. <comment>arXiv preprint arXiv:1706.03762</comment>.</mixed-citation></ref>
<ref id="ref-3"><label>3.</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>He</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Gkioxari</surname>, <given-names>G.</given-names></string-name>, <string-name><surname>Doll&#x00E1;r</surname>, <given-names>P.</given-names></string-name>, <string-name><surname>Girshick</surname>, <given-names>R. B.</given-names></string-name></person-group> (<year>2017</year>). <article-title>Mask R-CNN</article-title>. <conf-name>IEEE Transactions on Pattern Analysis and Machine Intelligence</conf-name><italic>,</italic> <volume>42</volume><italic>,</italic> <fpage>386</fpage>&#x2013;<lpage>397</lpage>.</mixed-citation></ref>
<ref id="ref-4"><label>4.</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Radford</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Narasimhan</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Salimans</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Sutskever</surname>, <given-names>I.</given-names></string-name></person-group> (<year>2018</year>). <article-title>Improving language understanding by generative pre-training</article-title>. <source>Computer Science</source>.</mixed-citation></ref>
<ref id="ref-5"><label>5.</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Radford</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Wu</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Child</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Luan</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Amodei</surname>, <given-names>D.</given-names></string-name> <etal>et al.</etal></person-group> (<year>2019</year>). <article-title>Language models are unsupervised multitask learners</article-title>. <source>OpenAI Blog</source><italic>,</italic> <volume>1</volume><italic>(</italic><issue>8</issue><italic>),</italic> <fpage>9</fpage>.</mixed-citation></ref>
<ref id="ref-6"><label>6.</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Brown</surname>, <given-names>T. B.</given-names></string-name>, <string-name><surname>Mann</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Ryder</surname>, <given-names>N.</given-names></string-name>, <string-name><surname>Subbiah</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Kaplan</surname>, <given-names>J.</given-names></string-name> <etal>et al.</etal></person-group> (<year>2020</year>). <article-title>Language models are few-shot learners</article-title>. <comment>arXiv preprint arXiv: 2005.14165</comment>.</mixed-citation></ref>
<ref id="ref-7"><label>7.</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Alfeld</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Zhu</surname>, <given-names>X.</given-names></string-name>, <string-name><surname>Barford</surname>, <given-names>P.</given-names></string-name></person-group> (<year>2016</year>). <article-title>Data poisoning attacks against autoregressive models</article-title>. <conf-name>AAAI Conference on Artificial Intelligence</conf-name><italic>,</italic> <publisher-loc>Phoenix, Arizona</publisher-loc>.</mixed-citation></ref>
<ref id="ref-8"><label>8.</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Gu</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Dolan-Gavitt</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Garg</surname>, <given-names>S.</given-names></string-name></person-group> (<year>2017</year>). <article-title>Badnets: Identifying vulnerabilities in the machine learning model supply chain</article-title>. <comment>arXiv preprint arXiv: 1708.06733</comment>.</mixed-citation></ref>
<ref id="ref-9"><label>9.</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Liu</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Ma</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Aafer</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Lee</surname>, <given-names>W. C.</given-names></string-name>, <string-name><surname>Zhai</surname>, <given-names>J.</given-names></string-name> <etal>et al.</etal></person-group> (<year>2018</year>). <article-title>Trojaning attack on neural networks</article-title>. <source>NDSS</source>.</mixed-citation></ref>
<ref id="ref-10"><label>10.</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Nguyen</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Tran</surname>, <given-names>A.</given-names></string-name></person-group> (<year>2020</year>). <article-title>Input-aware dynamic backdoor attack</article-title>. <comment>arXiv preprint arXiv: 2010.08138</comment>.</mixed-citation></ref>
<ref id="ref-11"><label>11.</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Shafahi</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Huang</surname>, <given-names>W. R.</given-names></string-name>, <string-name><surname>Najibi</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Suciu</surname>, <given-names>O.</given-names></string-name>, <string-name><surname>Studer</surname>, <given-names>C.</given-names></string-name> <etal>et al.</etal></person-group> (<year>2018</year>). <article-title>Poison frogs! Targeted clean-label poisoning attacks on neural networks</article-title>. <comment>arXiv preprint arXiv: 1804.00792</comment>.</mixed-citation></ref>
<ref id="ref-12"><label>12.</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Xue</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>He</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Wang</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Liu</surname>, <given-names>W.</given-names></string-name></person-group> (<year>2022</year>). <article-title>One-to-n &#x0026; n-to-one: Two advanced backdoor attacks against deep learning models</article-title>. <source>IEEE Transactions on Dependable and Secure Computing</source><italic>,</italic> <volume>19</volume><italic>(</italic><issue>3</issue><italic>),</italic> <fpage>1562</fpage>&#x2013;<lpage>1578</lpage>. <pub-id pub-id-type="doi">10.1109/TDSC.2020.3028448</pub-id></mixed-citation></ref>
<ref id="ref-13"><label>13.</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Saha</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Subramanya</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Pirsiavash</surname>, <given-names>H.</given-names></string-name></person-group> (<year>2020</year>). <article-title>Hidden trigger backdoor attacks</article-title>. <conf-name>Proceedings of the AAAI Conference on Artificial Intelligence</conf-name><italic>,</italic> vol. <volume>34</volume>, pp. <fpage>11957</fpage>&#x2013;<lpage>11965</lpage>.</mixed-citation></ref>
<ref id="ref-14"><label>14.</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Tran</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Li</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Madry</surname>, <given-names>A.</given-names></string-name></person-group> (<year>2018</year>). <article-title>Spectral signatures in backdoor attacks</article-title>. <source>NeurIPS</source>.</mixed-citation></ref>
<ref id="ref-15"><label>15.</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Cheng</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Xu</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Liu</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Chen</surname>, <given-names>P. Y.</given-names></string-name>, <string-name><surname>Zhao</surname>, <given-names>P.</given-names></string-name> <etal>et al.</etal></person-group> (<year>2020</year>). <article-title>Defending against backdoor attack on deep neural networks</article-title>. <comment>arXiv preprint arXiv: 2002.12162</comment>.</mixed-citation></ref>
<ref id="ref-16"><label>16.</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Selvaraju</surname>, <given-names>R. R.</given-names></string-name>, <string-name><surname>Cogswell</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Das</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Vedantam</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Parikh</surname>, <given-names>D.</given-names></string-name> <etal>et al.</etal></person-group> (<year>2017</year>). <article-title>Grad-CAM: Visual explanations from deep networks via gradient-based localization</article-title>. <conf-name>Proceedings of the IEEE International Conference on Computer Vision</conf-name>, pp. <fpage>618</fpage>&#x2013;<lpage>629</lpage>. <publisher-loc>Venice</publisher-loc>.</mixed-citation></ref>
<ref id="ref-17"><label>17.</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Zhu</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Ning</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Wang</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Xin</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Wu</surname>, <given-names>H.</given-names></string-name></person-group> (<year>2020</year>). <article-title>Gangsweep: Sweep out neural backdoors by gan</article-title>. <conf-name>Proceedings of the 28th ACM International Conference on Multimedia</conf-name>, pp. <fpage>3173</fpage>&#x2013;<lpage>3181</lpage>. <publisher-loc>Seattle, USA</publisher-loc>.</mixed-citation></ref>
<ref id="ref-18"><label>18.</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Wang</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Yao</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Shan</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Li</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Viswanath</surname>, <given-names>B.</given-names></string-name> <etal>et al.</etal></person-group> (<year>2019</year>). <article-title>Neural cleanse: Identifying and mitigating backdoor attacks in neural networks</article-title>. <conf-name>2019 IEEE Symposium on Security and Privacy (SP)</conf-name>, pp. <fpage>707</fpage>&#x2013;<lpage>723</lpage>. <publisher-name>IEEE</publisher-name>, <publisher-loc>San Francisco, USA</publisher-loc>.</mixed-citation></ref>
<ref id="ref-19"><label>19.</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Liu</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Dolan-Gavitt</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Garg</surname>, <given-names>S.</given-names></string-name></person-group> (<year>2018</year>). <article-title>Fine-pruning: Defending against backdooring attacks on deep neural networks</article-title>. <conf-name>International Symposium on Research in Attacks, Intrusions, and Defenses</conf-name><italic>,</italic> pp. <fpage>707</fpage>&#x2013;<lpage>723</lpage>. <publisher-name>Springer</publisher-name>, <publisher-loc>Heraklion, Greece</publisher-loc>.</mixed-citation></ref>
<ref id="ref-20"><label>20.</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Li</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Wang</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Huang</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Li</surname>, <given-names>X.</given-names></string-name></person-group> (<year>2014</year>). <article-title>A new cost function for spatial image steganography</article-title>. <conf-name>2014 IEEE International Conference on Image Processing (ICIP)</conf-name>, pp. <fpage>4206</fpage>&#x2013;<lpage>4210</lpage>. <publisher-name>IEEE</publisher-name>, <publisher-loc>Paris</publisher-loc>.</mixed-citation></ref>
<ref id="ref-21"><label>21.</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Ronneberger</surname>, <given-names>O.</given-names></string-name>, <string-name><surname>Fischer</surname>, <given-names>P.</given-names></string-name>, <string-name><surname>Brox</surname>, <given-names>T.</given-names></string-name></person-group> (<year>2015</year>). <article-title>U-net: Convolutional networks for biomedical image segmentation</article-title>. <conf-name>International Conference on Medical Image Computing and Computer-Assisted Intervention</conf-name>, pp. <fpage>234</fpage>&#x2013;<lpage>241</lpage>. <publisher-name>Springer</publisher-name>, <publisher-loc>Munich, Germany</publisher-loc>.</mixed-citation></ref>
<ref id="ref-22"><label>22.</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhong</surname>, <given-names>N.</given-names></string-name>, <string-name><surname>Qian</surname>, <given-names>Z.</given-names></string-name>, <string-name><surname>Wang</surname>, <given-names>Z.</given-names></string-name>, <string-name><surname>Zhang</surname>, <given-names>X.</given-names></string-name>, <string-name><surname>Li</surname>, <given-names>X.</given-names></string-name></person-group> (<year>2020</year>). <article-title>Batch steganography via generative network</article-title>. <source>IEEE Transactions on Circuits and Systems for Video Technology</source><italic>,</italic> <volume>31</volume><italic>(</italic><issue>1</issue><italic>),</italic> <fpage>88</fpage>&#x2013;<lpage>97</lpage>.</mixed-citation></ref>
<ref id="ref-23"><label>23.</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Deng</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Dong</surname>, <given-names>W.</given-names></string-name>, <string-name><surname>Socher</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Li</surname>, <given-names>L. J.</given-names></string-name>, <string-name><surname>Li</surname>, <given-names>K.</given-names></string-name> <etal>et al.</etal></person-group> (<year>2009</year>). <article-title>Imagenet: A large-scale hierarchical image database</article-title>. <conf-name>2009 IEEE Conference on Computer Vision and Pattern Recognition</conf-name>, pp. <fpage>248</fpage>&#x2013;<lpage>255</lpage>. <publisher-name>IEEE</publisher-name>, <publisher-loc>Miami Beach, FL, USA, Weather Forecast</publisher-loc>.</mixed-citation></ref>
<ref id="ref-24"><label>24.</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>He</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Zhang</surname>, <given-names>X.</given-names></string-name>, <string-name><surname>Ren</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Sun</surname>, <given-names>J.</given-names></string-name></person-group> (<year>2016</year>). <article-title>Identity mappings in deep residual networks</article-title>. <conf-name>European Conference on Computer Vision</conf-name>, pp. <fpage>630</fpage>&#x2013;<lpage>645</lpage>. <publisher-name>Springer</publisher-name>, <publisher-loc>Amsterdam, The Netherlands</publisher-loc>.</mixed-citation></ref>
</ref-list>
</back>
</article>