<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xml:lang="en" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">IASC</journal-id>
<journal-id journal-id-type="nlm-ta">IASC</journal-id>
<journal-id journal-id-type="publisher-id">IASC</journal-id>
<journal-title-group>
<journal-title>Intelligent Automation &#x0026; Soft Computing</journal-title>
</journal-title-group>
<issn pub-type="epub">2326-005X</issn>
<issn pub-type="ppub">1079-8587</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">39600</article-id>
<article-id pub-id-type="doi">10.32604/iasc.2023.039600</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Attentive Neighborhood Feature Augmentation for Semi-supervised Learning</article-title>
<alt-title alt-title-type="left-running-head">Attentive Neighborhood Feature Augmentation for Semi-supervised Learning</alt-title>
<alt-title alt-title-type="right-running-head">Attentive Neighborhood Feature Augmentation for Semi-supervised Learning</alt-title>
</title-group>
<contrib-group>
<contrib id="author-1" contrib-type="author">
<name name-style="western"><surname>Liu</surname><given-names>Qi</given-names></name><xref ref-type="aff" rid="aff-1">1</xref><xref ref-type="aff" rid="aff-2">2</xref></contrib>
<contrib id="author-2" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Li</surname><given-names>Jing</given-names></name><xref ref-type="aff" rid="aff-1">1</xref><xref ref-type="aff" rid="aff-2">2</xref><email>lijing@gzhu.edu.cn</email></contrib>
<contrib id="author-3" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Wang</surname><given-names>Xianmin</given-names></name><xref ref-type="aff" rid="aff-1">1</xref><email>xianmin@gzhu.edu.cn</email></contrib>
<contrib id="author-4" contrib-type="author">
<name name-style="western"><surname>Zhao</surname><given-names>Wenpeng</given-names></name><xref ref-type="aff" rid="aff-1">1</xref></contrib>
<aff id="aff-1"><label>1</label><institution>School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou</institution>, <addr-line>510002</addr-line>, <country>China</country></aff>
<aff id="aff-2"><label>2</label><institution>Fujian Provincial Key Laboratory of Information Processing and Intelligent Control, Minjiang University</institution>, <addr-line>Fuzhou, 350121</addr-line>, <country>China</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>&#x002A;</label>Corresponding Authors: Jing Li. Email: <email>lijing@gzhu.edu.cn</email>; Xianmin Wang. Email: <email>xianmin@gzhu.edu.cn</email></corresp>
</author-notes>
<pub-date date-type="collection" publication-format="electronic">
<year>2023</year></pub-date>
<pub-date date-type="pub" publication-format="electronic"><day>23</day>
<month>6</month>
<year>2023</year></pub-date>
<volume>37</volume>
<issue>2</issue>
<fpage>1753</fpage>
<lpage>1771</lpage>
<history>
<date date-type="received"><day>07</day><month>2</month><year>2023</year></date>
<date date-type="accepted"><day>14</day><month>4</month><year>2023</year></date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2023 Liu et al.</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Liu et al.</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_IASC_39600.pdf"></self-uri>
<abstract>
<p>Recent state-of-the-art semi-supervised learning (SSL) methods usually use data augmentations as core components. Such methods, however, are limited to simple transformations such as the augmentations under the instance&#x2019;s naive representations or the augmentations under the instance&#x2019;s semantic representations. To tackle this problem, we offer a unique insight into data augmentations and propose a novel data-augmentation-based semi-supervised learning method, called Attentive Neighborhood Feature Augmentation (ANFA). The motivation of our method lies in the observation that the relationship between the given feature and its neighborhood may contribute to constructing more reliable transformations for the data, and further facilitating the classifier to distinguish the ambiguous features from the low-dense regions. Specially, we first project the labeled and unlabeled data points into an embedding space and then construct a neighbor graph that serves as a similarity measure based on the similar representations in the embedding space. Then, we employ an attention mechanism to transform the target features into augmented ones based on the neighbor graph. Finally, we formulate a novel semi-supervised loss by encouraging the predictions of the interpolations of augmented features to be consistent with the corresponding interpolations of the predictions of the target features. We carried out experiments on SVHN and CIFAR-10 benchmark datasets and the experimental results demonstrate that our method outperforms the state-of-the-art methods when the number of labeled examples is limited.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Semi-supervised learning</kwd>
<kwd>attention mechanism</kwd>
<kwd>feature augmentation</kwd>
<kwd>consistency regularization</kwd>
</kwd-group>
<funding-group>
<award-group id="awg1">
<funding-source>National Natural Science Foundation of China</funding-source>
<award-id>62072127</award-id>
<award-id>62002076</award-id>
<award-id>61906049</award-id>
</award-group>
<award-group id="awg2">
<funding-source>Natural Science Foundation of Guangdong Province</funding-source>
<award-id>2023A1515011774</award-id>
<award-id>2020A1515010423</award-id>
</award-group>
<award-group id="awg3">
<funding-source>CNKLSTISS</funding-source>
<award-id>6142111180404</award-id>
</award-group>
<award-group id="awg4">
<funding-source>Science and Technology Program of Guangzhou</funding-source>
<award-id>202002030131</award-id>
</award-group>
<award-group id="awg5">
<funding-source>Guangdong basic and applied basic research fund joint fund Youth Fund</funding-source>
<award-id>2019A1515110213</award-id>
</award-group>
<award-group id="awg6">
<funding-source>Open Fund Project of Fujian Provincial Key Laboratory of Information Processing and Intelligent Control (Minjiang University)</funding-source>
<award-id>MJUKF-IPIC202101</award-id>
</award-group>
<award-group id="awg7">
<funding-source>Natural Science Foundation of Guangdong Province</funding-source>
<award-id>2020A1515010423</award-id>
</award-group>
<award-group id="awg8	">
<funding-source>Scientific research project for Guangzhou University</funding-source>
<award-id>RP2022003</award-id>
</award-group>
</funding-group>
</article-meta>
</front>
<body>
<sec id="s1"><label>1</label><title>Introduction</title>
<p>Deep neural networks have achieved favorable performance on a wide variety of tasks [<xref ref-type="bibr" rid="ref-1">1</xref>&#x2013;<xref ref-type="bibr" rid="ref-5">5</xref>]. Training deep neural networks commonly requires a large amount of labeled training data. However, since collecting labeled data necessarily involves expert knowledge, labeled data is usually unavailable for many learning tasks. To address this problem, numerous semi-supervised learning (SSL) methods have been developed, which exploit abundant unlabeled data effectively to improve the performance of deep models and relieve the pressure brought by the lack of labeled data.</p>
<p>Existing SSL methods are mainly based on a low-density separation assumption, that is, the decision boundary learned by the model is supported to lie in low-density regions of the instances. Consistency regularization is a typical measure to implement the low-density separation assumption, which has been widely used on many benchmarks. The main idea of consistency regularization is to enforce the model to produce the same output distribution for an input instance and its perturbed version. The conventional consistency-regularization-based methods mainly focus on how to construct effective perturbations. For instance, Laine et al. [<xref ref-type="bibr" rid="ref-6">6</xref>] generated different perturbations by two network models to make them predict agreement. Miyato et al. [<xref ref-type="bibr" rid="ref-7">7</xref>] produced the worst perturbations according to the adversarial direction when adversarial training [<xref ref-type="bibr" rid="ref-8">8</xref>,<xref ref-type="bibr" rid="ref-9">9</xref>], and then enforced the outputs from the original example and its perturbed version to be consistent.</p>
<p>Recently, data augmentation has quickly turned into the mainstream technique of consistency regularization in SSL due to its powerful capability of expanding the examples&#x2019; feature representations. The essence of data augmentation is to expand the feature representations from the given training dataset. To this end, numerous data-augmentation-based SSL methods are developed. For instance, Verma et al. [<xref ref-type="bibr" rid="ref-10">10</xref>] proposed an interpolation consistency training (ICT) algorithm to train deep neural networks in the semi-supervised learning paradigm. This algorithm enforced the prediction at an interpolation of unlabeled points to be consistent with the interpolation of the predictions at those points. Xie et al. [<xref ref-type="bibr" rid="ref-11">11</xref>] presented a new perspective on how to effectively noise unlabeled examples. This work verifies that the quality of noising produced by advanced data augmentation methods is very important for semi-supervised learning. Berthelot et al. [<xref ref-type="bibr" rid="ref-12">12</xref>] presented a MixMatch approach to guess the low-entropy labels for data-augmented unlabeled examples and mixes labeled and unlabeled data using the MixUp strategy. Sohn et al. [<xref ref-type="bibr" rid="ref-13">13</xref>] presented the FixMatch algorithm to simplify existing SSL approaches. The model&#x2019;s predictions on weakly-augmented unlabeled pictures are used to construct pseudo-labels, which are then trained to predict the pseudo-label when fed a strongly-augmented version of the same image.</p>
<p>The aforementioned methods commonly generate augmented instances on their naive representations, which are unable to derive abstract semantic representations for the learning of semi-supervised models. Accordingly, inspired by the idea of feature fusion [<xref ref-type="bibr" rid="ref-14">14</xref>&#x2013;<xref ref-type="bibr" rid="ref-18">18</xref>], several works focus on augmenting data by merging the feature representations of the instances from the semantic layer. Verma et al. [<xref ref-type="bibr" rid="ref-19">19</xref>] proposed a manifold mixup method to encourage neural networks to predict less confidently on interpolations of hidden representations. This method leveraged semantic interpolations as an additional training signal, obtaining neural networks with smoother decision boundaries at multiple levels of representation. Upchurch et al. [<xref ref-type="bibr" rid="ref-20">20</xref>] proposed a deep feature interpolation (DFI) method for automatic high-resolution image transformation. DFI can be used as a new baseline to evaluate more complex algorithms and provides a practical answer to the question of which image transformation tasks are still challenging after the advent of deep learning. Kuo et al. [<xref ref-type="bibr" rid="ref-21">21</xref>] proposed a novel learned feature-based refinement and augmentation method which produces a varied set of complex transformations. The transformations, combined with traditional image-based augmentation, can be used as part of the consistency-based regularization loss.</p>
<p>The existing feature fusion methods boost the capability of the feature representations to some extent [<xref ref-type="bibr" rid="ref-22">22</xref>]. However, they only consider the information of a single given example when merging the features. The overlooking of the relationship between the given feature and its neighborhood may lead to false predictions for the unlabeled examples and further limit the performance of SSL. To clarify this phenomenon and put forward our motivation, we take a simple example as shown in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>. As shown in <xref ref-type="fig" rid="fig-1">Fig. 1a</xref>, we can see that in low-density feature embedding regions, it is difficult for the classifier to distinguish the ambiguous unlabeled features. Here, the ambiguous unlabeled features are the features that are derived from the unlabeled examples and have approximately identical margins to the boundaries in the embedding spaces. The ambiguous unlabeled features have similar representations, thus they may yield similar outputs from the SSL model and then generate unreliable pseudo labels for the unlabeled examples. This fact leads to false decision boundaries during the training process. Whereas, as shown in <xref ref-type="fig" rid="fig-1">Fig. 1b</xref>, if the neighborhood of the given feature is considered, the representation of the feature can be strengthened and refined. Based on the cluster characteristics of the neighborhood, the ambiguous unlabeled features have discriminative representations, which contributes to yielding more reliable pseudo labels for the training. Therefore, it is reasonable to generate diverse and abstract transformations by exploiting the neighborhood information of examples on their semantic feature spaces. To this end, we use the self-attention [<xref ref-type="bibr" rid="ref-23">23</xref>] mechanism to aggregate the neighbor features, and then apply a neighbor graph to refine and augment the target features. By creating such a neighborhood graph, it is possible to obtain more discriminative feature representations, which help to produce more trustworthy decision boundaries for the SSL model and more private pseudo labels (as shown in <xref ref-type="fig" rid="fig-1">Fig. 1c</xref>).</p>
<fig id="fig-1"><label>Figure 1</label><caption><title>A simple case to clarity the motivation of our method. The blue and red circles represent represents unlabeled samples that have been divided into different clusters by the SSL model. Triangles represent labeled samples. The gray circles represent feature representations that are difficult to be discriminated by the classifier. (a) illustrates that the ambiguous unlabeled samples are difficult to be classified by their representations. (b) indicates the use of ANFA to aggregate neighboring samples, the thicker the line, the greater the attention weight. (c) shows that the SSL model can correctly classify unlabeled samples with refined features</title></caption><graphic mimetype="image" mime-subtype="tif" xlink:href="IASC_39600-fig-1.tif"/></fig>
<p>According to the foregoing analysis, this paper proposes a novel feature augmentation framework called Attentive Neighborhood Feature Augmentation (ANFA) for SSL. First, given labeled and unlabeled data examples, we project them to an embedding space and construct a neighborhood graph based on the similarity of representations on their embedding spaces. Second, we refined the features via weighting the neighbor representations of the target features, where the weights are adaptively acquired relying upon the similarity between the target features and the neighborhood graph. Finally, we mix up the target and refined features to obtain the interpolated features and then propose a novel consistency regularization loss that encourages the predictions of the interpolated features to be consistent with their corresponding interpolated pseudo-labels. Moreover, we test our method on standard SSL datasets such as SVHN [<xref ref-type="bibr" rid="ref-24">24</xref>] and CIFAR-10 [<xref ref-type="bibr" rid="ref-25">25</xref>] and neural network architectures CNN-13 and WRN28-2 [<xref ref-type="bibr" rid="ref-26">26</xref>], and the experimental results demonstrate that our approach outperforms the baseline methods.</p>
<p>This paper is organized as follows. First, we survey the related work and analyze their advantage and disadvantage in Section 2. Then, we elaborate the proposed method in Section 3. Next, we conduct experiments and analyze the results in Section 5.</p>
</sec>
<sec id="s2"><label>2</label><title>Related Work</title>
<p>In the past, many semi-supervised deep learning methods have been developed. In this section, we focus on some related works, including the consistency regularization methods, augmentation methods, and the attention scheme.</p>
<sec id="s2_1"><label>2.1</label><title>Consistency Regularization Methods</title>
<p>Current state-of-the-art SSL methods mostly use this technique. The key idea of consistent regularization methods is that the model should be robust to local perturbations in the input space, which requires the deep neural network to be consistent with the original samples and the prediction results after adding small perturbations. In image classification tasks, the approach is to make the model&#x2019;s predictions invariant to texture or geometric changes in the image.</p>
<p>Different consistency regularization techniques differ in how they choose perturbations <inline-formula id="ieqn-1"><mml:math id="mml-ieqn-1"><mml:mi>&#x03B4;</mml:mi></mml:math></inline-formula>. One simple alternative is to use random perturbations <inline-formula id="ieqn-2"><mml:math id="mml-ieqn-2"><mml:mi>&#x03B4;</mml:mi></mml:math></inline-formula>, which is to add Gaussian noise to the image. However, random perturbation is inefficient in high dimensions because only a small fraction of the input perturbation can push the decision boundary to low-density regions. To alleviate this problem, Virtual Adversarial Training [<xref ref-type="bibr" rid="ref-7">7</xref>] searches for adversarial perturbation directions that maximize the change in model predictions. This involves computing the gradient of the classifier input [<xref ref-type="bibr" rid="ref-27">27</xref>&#x2013;<xref ref-type="bibr" rid="ref-29">29</xref>], which can be very expensive for large neural network models. In addition to adding perturbations to the image, we can also add perturbations to the model. Laine et al. simply implemented this approach by training two perturbed neural network models. They used dropout [<xref ref-type="bibr" rid="ref-30">30</xref>] to randomly drop a part of the network parameters as a perturbation process. In supervised learning, de et al. proposed the Mixup [<xref ref-type="bibr" rid="ref-31">31</xref>] method, which encourages the model&#x2019;s prediction of a linear combination of two samples to be close to the linear combination of their labels, and interpolates and obtains different samples between the two samples to enhance the generalization ability of the model. Verma et al. [<xref ref-type="bibr" rid="ref-10">10</xref>] proposed interpolation consistency training (ICT) to introduce Mixup into semi-supervised learning by using pseudo-labels of unlabeled data. ICT encourages predictions on interpolated sample pairs to be consistent with their interpolated predictions. Wei et al. [<xref ref-type="bibr" rid="ref-32">32</xref>] proposed FMCmatch to further develop the method of sample mixing enhancement and improved the Cutout and Mixup methods to generate samples to effectively smooth the training space. However, simply cutting and mixing in the image space may produce meaningless samples. Introducing Noise, which makes the image out of the low-dimensional manifold in the high-dimensional embedding space. Chen et al. [<xref ref-type="bibr" rid="ref-33">33</xref>] proposed attention-based label consistency regularization, which uses channel and sample attention to describe the similarity of different samples, maintaining label consistency across samples and enhancing the smoothness of label prediction between data. However, this approach is limited to the similarity of samples in the same batch and cannot describe the similarity in global samples.</p>
<p>Recently, a series of methods that combine consistent regularization techniques with other semi-supervised learning methods have achieved the best performance, such as MixMatch [<xref ref-type="bibr" rid="ref-12">12</xref>], ReMixMatch [<xref ref-type="bibr" rid="ref-34">34</xref>], and FixMacth [<xref ref-type="bibr" rid="ref-13">13</xref>], using strong data augmentation to create perturbations, while also using pseudo-labels, entropy Minimization, sharpening, and other techniques improve the confidence of the model. At the same time, several works have improved some graph-based methods to better extract intrinsic features from raw data. Yang et al. [<xref ref-type="bibr" rid="ref-35">35</xref>] used self-paced regularization to better factorize matrices and introduced adaptive graphs using dynamic neighbor assignment to find low-dimensional manifolds. Chen et al. [<xref ref-type="bibr" rid="ref-36">36</xref>] improved the Graph non-negative matrix factorization (GNMF) method, introduced the <inline-formula id="ieqn-3"><mml:math id="mml-ieqn-3"><mml:msub><mml:mi>l</mml:mi><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> norm to enhance the sparsity of factorized matrices and improved the robustness of feature extraction using GNMF.</p>
<p>We summarize the advantages and disadvantages of some consistency regularization methods shown in <xref ref-type="table" rid="table-1">Table 1</xref>.</p>
<table-wrap id="table-1"><label>Table 1</label><caption><title>Key findings and limitations of some typical consistency regularization methods</title></caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">Method</th>
<th align="left">Key findings</th>
<th align="left">Limitations</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">TE [<xref ref-type="bibr" rid="ref-1">1</xref>]</td>
<td align="left">Better prediction by ensembling the outputs of the network in previous epochs.</td>
<td align="left">Expensive calculation in a huge dataset.</td>
</tr>
<tr>
<td align="left">VAT [<xref ref-type="bibr" rid="ref-7">7</xref>]</td>
<td align="left">Better generalization by learning adversarial perturbations.</td>
<td align="left">Additional backpropagation to compute the adversarial direction.</td>
</tr>
<tr>
<td align="left">ICT [<xref ref-type="bibr" rid="ref-10">10</xref>]</td>
<td align="left">Reduce overfitting to labeled points under high confidence.</td>
<td align="left">Random interpolation may generate unreal samples leading to prediction bias.</td>
</tr>
<tr>
<td align="left">MixMatch [<xref ref-type="bibr" rid="ref-12">12</xref>]</td>
<td align="left">Unifying the dominant approaches of semi-supervised learning.</td>
<td align="left">Multiple forward and back propagation calculations.</td>
</tr>
<tr>
<td align="left">FeatMatch [<xref ref-type="bibr" rid="ref-21">21</xref>]</td>
<td align="left">Better feature learning by exploiting category information.</td>
<td align="left">Neighborhood information is ignored during feature learning.</td>
</tr>
<tr>
<td align="left">FMCmatch [<xref ref-type="bibr" rid="ref-32">32</xref>]</td>
<td align="left">Smoothed the training space using more diverse image transformations.</td>
<td align="left">Random Cutout and Mixup introduce noise.</td>
</tr>
<tr>
<td align="left">ALC [<xref ref-type="bibr" rid="ref-33">33</xref>]</td>
<td align="left">Smoothed label predictions across data using channel and sample attention.</td>
<td align="left">Similarity measurement limited to batch samples.</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s2_2"><label>2.2</label><title>Data Augmentation</title>
<p>For SSL with the deep model, most recent works incorporate different data augmentation methods into their baseline models to achieve higher performance. Data augmentation alleviates the problem of limited data by performing diverse but reasonable transformations on the data and has been widely used in the training of deep models [<xref ref-type="bibr" rid="ref-37">37</xref>]. Data augmentation increases data diversity and prevents overfitting in the training of deep models. Simple data augmentation methods include random flips, blurs, transitions, geometric transformations, changing the contrast and color of images, and so on. In addition, complex augmentation operations also exist. Mixup enforces interpolation smoothness between every two training samples by generating new training samples through a convex combination of two images and their corresponding labels. It has been shown that models trained with Mixup are robust to out-of-distribution data and facilitate the uncertainty calibration of the network. In recent years, an SSL data augmentation strategy for strong image processing has attracted attention. In image classification, unsupervised data Augmentation (UDA) [<xref ref-type="bibr" rid="ref-11">11</xref>] uses AutoAugment [<xref ref-type="bibr" rid="ref-38">38</xref>], which uses reinforcement learning [<xref ref-type="bibr" rid="ref-39">39</xref>,<xref ref-type="bibr" rid="ref-40">40</xref>] to search for the best combination of different image augmentation operations based on the confidence of a validated model. In addition, the CTAugment proposed by Remixmatch [<xref ref-type="bibr" rid="ref-34">34</xref>] and the RandAugment [<xref ref-type="bibr" rid="ref-41">41</xref>] used in Fixmatch [<xref ref-type="bibr" rid="ref-13">13</xref>] use different strategies to maximize the effect of data enhancement.</p>
<p>We summarize the key findings and limitations of some data augmentation methods shown in <xref ref-type="table" rid="table-2">Table 2</xref>.</p>
<table-wrap id="table-2"><label>Table 2</label><caption><title>Key findings and limitations of some typical data augmentation methods</title></caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">Method</th>
<th align="left">Key findings</th>
<th align="left">Limitations</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">Mixup [<xref ref-type="bibr" rid="ref-31">31</xref>]</td>
<td align="left">A linear combination between two samples and their corresponding labels can improve generalization.</td>
<td align="left">Simple interpolation may produce meaningless samples.</td>
</tr>
<tr>
<td align="left">AutoAugment [<xref ref-type="bibr" rid="ref-38">38</xref>]</td>
<td align="left">Automatically search for the best data augmentation policy.</td>
<td align="left">Using reinforcement learning as a search algorithm requires additional training.</td>
</tr>
<tr>
<td align="left">CTAugment [<xref ref-type="bibr" rid="ref-34">34</xref>]</td>
<td align="left">Using control theory to dynamically infer the magnitude of the transition during training.</td>
<td align="left">Dynamic updates require additional computational cost</td>
</tr>
<tr>
<td align="left">RandAugment [<xref ref-type="bibr" rid="ref-41">41</xref>]</td>
<td align="left">Only two augmentation parameters are needed the number and magnitude of augmentation transformations.</td>
<td align="left">For different data sets, two augmentation parameters still need to be determined, which still has a large experimental cost.</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s2_3"><label>2.3</label><title>Attention</title>
<p>Vaswani et al. [<xref ref-type="bibr" rid="ref-23">23</xref>] define scaled dot-product attention as an operation that maps a query and a set of key-value pairs to an output that computes a dot product of the query and key and scales it, using a softmax function for normalization and computing attention weights. It can be expressed as follows:
<disp-formula id="eqn-1"><label>(1)</label><mml:math id="mml-eqn-1" display="block"><mml:mrow><mml:mtext>Attention</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>Q</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mtext>K</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mtext>V</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mtext>softmax</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mfrac><mml:msup><mml:mrow><mml:mtext>QK</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>T</mml:mtext></mml:mrow></mml:mrow></mml:msup><mml:msqrt><mml:msub><mml:mrow><mml:mtext>d</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>k</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:msqrt></mml:mfrac><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mtext>V</mml:mtext></mml:mrow></mml:math></disp-formula>where <inline-formula id="ieqn-4"><mml:math id="mml-ieqn-4"><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> denote the dimension of keys. Attention mechanisms can pay more attention to the characteristics of more attention to task correlation between input information, reduce the attention to irrelevant characteristics, and even filter out irrelevant features, thereby improving the efficiency and accuracy of task processing.</p>
<p>In recent years, attention mechanisms have been successfully applied to various computer vision tasks [<xref ref-type="bibr" rid="ref-42">42</xref>,<xref ref-type="bibr" rid="ref-43">43</xref>]. SENet [<xref ref-type="bibr" rid="ref-44">44</xref>] obtains the weight of each channel of the input feature layer and uses its weight to make the network focus on more important information [<xref ref-type="bibr" rid="ref-45">45</xref>]. Residual attention networks [<xref ref-type="bibr" rid="ref-46">46</xref>] are built by stacking attention modules that generate attention-aware features. As the modules go deeper, the attention-aware functions from different modules change adaptively. CBAM [<xref ref-type="bibr" rid="ref-47">47</xref>] sequentially infers the attention map along two independent dimensions of channel and space and then multiplies the attention map with the input feature map for adaptive feature refinement [<xref ref-type="bibr" rid="ref-48">48</xref>].</p>
<p>We summarize the key findings and limitations of some attention methods shown in <xref ref-type="table" rid="table-3">Table 3</xref>.</p>
<table-wrap id="table-3"><label>Table 3</label><caption><title>Key findings and limitations of some attention methods</title></caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">Method</th>
<th align="left">Key findings</th>
<th align="left">Limitations</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">SENet [<xref ref-type="bibr" rid="ref-44">44</xref>]</td>
<td align="left">Automatically obtain the importance of each channel.</td>
<td align="left">Ignoring the importance of spatial information.</td>
</tr>
<tr>
<td align="left">Residual attention networks [<xref ref-type="bibr" rid="ref-46">46</xref>]</td>
<td align="left">Multiple attention modules can be stacked.</td>
<td align="left">Can only effectively capture local information, but cannot establish remote channel dependencies.</td>
</tr>
<tr>
<td align="left">CBAM [<xref ref-type="bibr" rid="ref-47">47</xref>]</td>
<td align="left">Simultaneously calculate the attention map of the two dimensions of channel and space.</td>
<td align="left">Only consider the calculation of the local area, ignoring the information of other global areas.</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
<sec id="s3"><label>3</label><title>Methodology</title>
<p>In this section, we present our work for semi-supervised deep learning. A glimpse of our approach is shown in <xref ref-type="fig" rid="fig-2">Fig. 2</xref>. Our approach consists of three parts, namely neighbor graph representation, feature augmentation, and consistency regularization. We first construct a neighborhood feature graph that represents the relationship between the target feature and its neighbors. Then, based on the neighborhood feature graph, we augment the features by attention mechanism. Finally, we propose a new loss that encourages the prediction at an interpolation of features to be consistent with the interpolation of the predictions at those features.</p>
<fig id="fig-2"><label>Figure 2</label><caption><title>The pipeline of attentive neighborhood feature augmentation for semi-supervised learning</title></caption><graphic mimetype="image" mime-subtype="tif" xlink:href="IASC_39600-fig-2.tif"/></fig>
<sec id="s3_1"><label>3.1</label><title>Preliminary</title>
<p>In SSL tasks, we given a labeled dataset <inline-formula id="ieqn-5"><mml:math id="mml-ieqn-5"><mml:msub><mml:mrow><mml:mtext>D</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>l</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mrow><mml:mo>{</mml:mo><mml:msub><mml:mrow><mml:mtext>x</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>i</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mtext>y</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>i</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>}</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:mtext>i</mml:mtext></mml:mrow><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mrow><mml:mtext>L</mml:mtext></mml:mrow></mml:mrow></mml:msubsup><mml:mtext>&#x00A0;</mml:mtext></mml:math></inline-formula> and a unlabel dataset <inline-formula id="ieqn-6"><mml:math id="mml-ieqn-6"><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:mi>u</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mrow><mml:mo>{</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>}</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>U</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>, where <italic>L</italic> and <italic>U</italic> denote the number of <inline-formula id="ieqn-7"><mml:math id="mml-ieqn-7"><mml:msub><mml:mrow><mml:mtext>D</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>l</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula id="ieqn-8"><mml:math id="mml-ieqn-8"><mml:msub><mml:mrow><mml:mtext>D</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>u</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula>. Formally, a feature extractor <inline-formula id="ieqn-9"><mml:math id="mml-ieqn-9"><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x03B8;</mml:mi></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mo>&#x22C5;</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> with parameter <inline-formula id="ieqn-10"><mml:math id="mml-ieqn-10"><mml:mrow><mml:mi mathvariant="normal">&#x03B8;</mml:mi></mml:mrow><mml:mtext>&#x00A0;</mml:mtext></mml:math></inline-formula> is used to extract input image features <inline-formula id="ieqn-11"><mml:math id="mml-ieqn-11"><mml:msub><mml:mrow><mml:mtext>f</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>i</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mtext>f</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x03B8;</mml:mi></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mrow><mml:mtext>x</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>i</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, a classifier <inline-formula id="ieqn-12"><mml:math id="mml-ieqn-12"><mml:msub><mml:mrow><mml:mtext>h</mml:mtext></mml:mrow><mml:mrow><mml:mi>&#x03D5;</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, and the memory bank <inline-formula id="ieqn-13"><mml:math id="mml-ieqn-13"><mml:mrow><mml:mrow><mml:mi>&#x02133;</mml:mi></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mrow><mml:mo>{</mml:mo><mml:msub><mml:mrow><mml:mtext>f</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>i</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mover><mml:mrow><mml:mtext>y</mml:mtext></mml:mrow><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mrow><mml:mtext>i</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>}</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:mtext>i</mml:mtext></mml:mrow><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mrow><mml:mtext>k</mml:mtext></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula>, where <inline-formula id="ieqn-14"><mml:math id="mml-ieqn-14"><mml:msub><mml:mrow><mml:mtext>f</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>i</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> is the extract features of input sample <inline-formula id="ieqn-15"><mml:math id="mml-ieqn-15"><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, for labeled data, <inline-formula id="ieqn-16"><mml:math id="mml-ieqn-16"><mml:msub><mml:mrow><mml:mover><mml:mi>y</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is the ground-truth label, while for the unlabeled, <inline-formula id="ieqn-17"><mml:math id="mml-ieqn-17"><mml:msub><mml:mrow><mml:mover><mml:mi>y</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is pseudo-label, and <italic>k</italic> is the size of <inline-formula id="ieqn-18"><mml:math id="mml-ieqn-18"><mml:mrow><mml:mrow><mml:mi>&#x02133;</mml:mi></mml:mrow></mml:mrow></mml:math></inline-formula>.</p>
</sec>
<sec id="s3_2"><label>3.2</label><title>Neighborhood Feature Graph</title>
<p>In order to efficiently leverage the knowledge of neighbors for regularization, we propose to construct a graph among the samples and their neighborhoods in the feature semantic space. To select suitable neighbors from the dataset, we propose to use <inline-formula id="ieqn-19"><mml:math id="mml-ieqn-19"><mml:mi>k</mml:mi><mml:mrow><mml:mtext>-</mml:mtext></mml:mrow><mml:mrow><mml:mtext>nearest</mml:mtext></mml:mrow></mml:math></inline-formula> neighbor representation in the feature space to extract neighbors for each sample.</p>
<p>we first extract the feature <inline-formula id="ieqn-20"><mml:math id="mml-ieqn-20"><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> and label predictions <inline-formula id="ieqn-21"><mml:math id="mml-ieqn-21"><mml:mrow><mml:mover><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x005E;</mml:mo></mml:mover></mml:mrow></mml:math></inline-formula> for unlabeled sample <inline-formula id="ieqn-22"><mml:math id="mml-ieqn-22"><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> at each iteration of the training loop, and collected and recorded them in a memory bank <inline-formula id="ieqn-23"><mml:math id="mml-ieqn-23"><mml:mrow><mml:mrow><mml:mi>&#x02133;</mml:mi></mml:mrow></mml:mrow></mml:math></inline-formula> as <inline-formula id="ieqn-24"><mml:math id="mml-ieqn-24"><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mover><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> pairs. We first pre-train a feature extractor, and then use the extracted features and pseudo-labels to initialize the memory bank. During each forward pass in the training loop, we separate the features and pseudo-labels and push them into the memory bank. Since the training of the model will influence the extracted features, we update the features corresponding to the current training sample after each iteration. To gain a more accurate prediction, we use target prediction generated by the teacher model [<xref ref-type="bibr" rid="ref-49">49</xref>]. Based on the features in the memory bank, we calculate the cosine similarity between the features and construct a similarity matrix <inline-formula id="ieqn-25"><mml:math id="mml-ieqn-25"><mml:mrow><mml:mtext mathvariant="bold">S</mml:mtext></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mi>R</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> with
<disp-formula id="eqn-2"><label>(2)</label><mml:math id="mml-eqn-2" display="block"><mml:msub><mml:mi>S</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x03B8;</mml:mi></mml:mrow></mml:mrow></mml:msub><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x03B8;</mml:mi></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x03B8;</mml:mi></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:msub><mml:mo fence="false" stretchy="false">|</mml:mo><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>|</mml:mo></mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x03B8;</mml:mi></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:msub><mml:mo fence="false" stretchy="false">|</mml:mo><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:math></disp-formula>where <inline-formula id="ieqn-26"><mml:math id="mml-ieqn-26"><mml:msub><mml:mrow><mml:mtext>S</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>ij</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:mtext>S</mml:mtext></mml:mrow></mml:math></inline-formula> is a measurement of the similarity between the samples <inline-formula id="ieqn-27"><mml:math id="mml-ieqn-27"><mml:msub><mml:mrow><mml:mtext>x</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>i</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula id="ieqn-28"><mml:math id="mml-ieqn-28"><mml:msub><mml:mrow><mml:mtext>x</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>j</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula>. <italic>n</italic> is the number of training samples. Compared with other similarity metrics, such as Euclidean distance, we find that cosine similarity has a better performance. Higher similarity indicates the two samples are closer in the feature space, so we choose the <italic>k</italic> samples with the highest similarity as neighbors for an input sample and construct a neighborhood graph <inline-formula id="ieqn-29"><mml:math id="mml-ieqn-29"><mml:mrow><mml:mrow><mml:mi>&#x1D4A9;</mml:mi></mml:mrow></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mtext>R</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>n</mml:mtext></mml:mrow><mml:mo>&#x00D7;</mml:mo><mml:mrow><mml:mtext>n</mml:mtext></mml:mrow></mml:mrow></mml:msup></mml:math></inline-formula> as follows:
<disp-formula id="eqn-3"><label>(3)</label><mml:math id="mml-eqn-3" display="block"><mml:msub><mml:mrow><mml:mrow><mml:mi>&#x1D4A9;</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mtable columnalign="left left" rowspacing=".2em" columnspacing="1em" displaystyle="false"><mml:mtr><mml:mtd><mml:msub><mml:mi>S</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mtext>&#x00A0;&#x00A0;</mml:mtext><mml:msub><mml:mi>S</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2265;</mml:mo><mml:msubsup><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msubsup></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mn>0</mml:mn><mml:mtext>&#x00A0;</mml:mtext><mml:msub><mml:mi>S</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x003C;</mml:mo><mml:msubsup><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msubsup></mml:mtd></mml:mtr></mml:mtable><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo></mml:mrow></mml:math></disp-formula>where <inline-formula id="ieqn-30"><mml:math id="mml-ieqn-30"><mml:msub><mml:mrow><mml:mrow><mml:mi>&#x1D4A9;</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is the weight of node <italic>i</italic> (sample <inline-formula id="ieqn-31"><mml:math id="mml-ieqn-31"><mml:msub><mml:mrow><mml:mtext>x</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>i</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula>) and <italic>j</italic> (sample <inline-formula id="ieqn-32"><mml:math id="mml-ieqn-32"><mml:msub><mml:mrow><mml:mtext>x</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>j</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula>). <inline-formula id="ieqn-33"><mml:math id="mml-ieqn-33"><mml:msubsup><mml:mrow><mml:mrow><mml:mtext>S</mml:mtext></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:mtext>i</mml:mtext></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:mtext>k</mml:mtext></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> is denoted <inline-formula id="ieqn-34"><mml:math id="mml-ieqn-34"><mml:mrow><mml:mtext>k</mml:mtext></mml:mrow><mml:mrow><mml:mtext>-</mml:mtext></mml:mrow><mml:mrow><mml:mtext>th&#xA0;</mml:mtext></mml:mrow></mml:math></inline-formula> value in the <inline-formula id="ieqn-35"><mml:math id="mml-ieqn-35"><mml:mrow><mml:mtext>i</mml:mtext></mml:mrow><mml:mrow><mml:mtext>-</mml:mtext></mml:mrow><mml:mrow><mml:mtext>th&#xA0;</mml:mtext></mml:mrow></mml:math></inline-formula> row of <italic>S</italic> where the values of elements in <inline-formula id="ieqn-36"><mml:math id="mml-ieqn-36"><mml:mrow><mml:mtext>i</mml:mtext></mml:mrow><mml:mrow><mml:mtext>-</mml:mtext></mml:mrow><mml:mrow><mml:mtext>th&#xA0;</mml:mtext></mml:mrow></mml:math></inline-formula> row of <italic>S</italic> are ranked in ascending order from small to large. The embedding of a sample can take advantage of a neighborhood graph to exploit more abundant information. When we go over the whole dataset, we use the features saved in the memory bank to calculate the global similarity matrix and build a neighbor graph through the <inline-formula id="ieqn-37"><mml:math id="mml-ieqn-37"><mml:mi>k</mml:mi><mml:mrow><mml:mtext>-</mml:mtext></mml:mrow><mml:mrow><mml:mtext>nearest</mml:mtext></mml:mrow></mml:math></inline-formula> neighbor algorithm.</p>
</sec>
<sec id="s3_3"><label>3.3</label><title>Feature Augmentation</title>
<p>With a neighborhood feature graph built by the process described above, we propose a learned feature augmentation module via self-attention to improve target feature embedding by aggregating the neighborhood features. The proposed module refines input image features in the feature space by leveraging important neighborhood information.</p>
<p>Formally, Given a neighborhood feature graph <inline-formula id="ieqn-38"><mml:math id="mml-ieqn-38"><mml:mrow><mml:mrow><mml:mi>&#x1D4A9;</mml:mi></mml:mrow></mml:mrow></mml:math></inline-formula>, for an input sample with extracted feature <inline-formula id="ieqn-39"><mml:math id="mml-ieqn-39"><mml:msub><mml:mrow><mml:mtext>f</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>x</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> and the <inline-formula id="ieqn-40"><mml:math id="mml-ieqn-40"><mml:mrow><mml:mtext>i</mml:mtext></mml:mrow><mml:mrow><mml:mtext>-</mml:mtext></mml:mrow><mml:mrow><mml:mtext>th</mml:mtext></mml:mrow></mml:math></inline-formula> neighbor feature <inline-formula id="ieqn-41"><mml:math id="mml-ieqn-41"><mml:msub><mml:mrow><mml:mtext>f</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>x</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mtext>i</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula>. we linearly project them into an embedding space as:
<disp-formula id="eqn-4"><label>(4)</label><mml:math id="mml-eqn-4" display="block"><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>&#x03D5;</mml:mi><mml:mrow><mml:mi>a</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo>;</mml:mo><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>a</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mspace width="1em" /></mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>&#x03D5;</mml:mi><mml:mrow><mml:mi>b</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>;</mml:mo><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>b</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula>where <inline-formula id="ieqn-42"><mml:math id="mml-ieqn-42"><mml:msub><mml:mrow><mml:mtext>w</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>a</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula id="ieqn-43"><mml:math id="mml-ieqn-43"><mml:msub><mml:mrow><mml:mtext>w</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>b</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> are the learned parameters of FC layer <inline-formula id="ieqn-44"><mml:math id="mml-ieqn-44"><mml:mtext>&#x00A0;</mml:mtext><mml:msub><mml:mi>&#x03D5;</mml:mi><mml:mrow><mml:mi>a</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula id="ieqn-45"><mml:math id="mml-ieqn-45"><mml:msub><mml:mi>&#x03D5;</mml:mi><mml:mrow><mml:mi>b</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, respectively. We define the attention function using a softmax function as:
<disp-formula id="eqn-5"><label>(5)</label><mml:math id="mml-eqn-5" display="block"><mml:mi>w</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>exp</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:msubsup><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msubsup><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msubsup><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msubsup><mml:mi>exp</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:msubsup><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msubsup><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:math></disp-formula></p>
<p>In detail, we first compute the dot product similarity between <inline-formula id="ieqn-46"><mml:math id="mml-ieqn-46"><mml:msub><mml:mrow><mml:mtext>p</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>x</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula id="ieqn-47"><mml:math id="mml-ieqn-47"><mml:msub><mml:mrow><mml:mtext>p</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>x</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mtext>i</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula>, and get the final attention weights by normalizing the similarity with the softmax operation. Then, we aggregate neighborhood information for input sample feature augmentation can be denoted as:
<disp-formula id="eqn-6"><label>(6)</label><mml:math id="mml-eqn-6" display="block"><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="normal">&#x03C8;</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msubsup><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msubsup><mml:mi>w</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula>where <inline-formula id="ieqn-48"><mml:math id="mml-ieqn-48"><mml:msub><mml:mrow><mml:mi mathvariant="normal">&#x03C8;</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mtext>t</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> is a non-linear transformation. In this work, <inline-formula id="ieqn-49"><mml:math id="mml-ieqn-49"><mml:msub><mml:mrow><mml:mi mathvariant="normal">&#x03C8;</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mtext>t</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> is implemented by a Multi-Layer Perceptrons (MLP) layer, this layer contains two-layer with ReLU, i.e., FC-ReLU-FC-ReLU. <xref ref-type="fig" rid="fig-3">Fig. 3</xref> shows the detailed architecture of the proposed module.</p>
<fig id="fig-3"><label>Figure 3</label><caption><title>Illustration of the proposed attentive neighborhood feature augmentation module</title></caption><graphic mimetype="image" mime-subtype="tif" xlink:href="IASC_39600-fig-3.tif"/></fig>
</sec>
<sec id="s3_4"><label>3.4</label><title>Consistency Regularization</title>
<p>We obtain refined features by aggregating neighborhood information via the module described above. To relate refined features containing knowledge of neighbors to each other, we employ the Mixup strategy, which encourages predictions based on linear combinations of two features to approximate linear combinations of their pseudo-labels.</p>
<p>Formally, given two random refined features <inline-formula id="ieqn-50"><mml:math id="mml-ieqn-50"><mml:msub><mml:mrow><mml:mtext>F</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>i</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula>, <inline-formula id="ieqn-51"><mml:math id="mml-ieqn-51"><mml:msub><mml:mrow><mml:mtext>F</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>j</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> and their pseudo labels <inline-formula id="ieqn-52"><mml:math id="mml-ieqn-52"><mml:msub><mml:mrow><mml:mtext>y</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>i</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula>, <inline-formula id="ieqn-53"><mml:math id="mml-ieqn-53"><mml:msub><mml:mrow><mml:mtext>y</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>j</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula>, the Mixup can be written as follows:
<disp-formula id="eqn-7"><label>(7)</label><mml:math id="mml-eqn-7" display="block"><mml:mtable columnalign="left" rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mover><mml:mi mathvariant="bold-italic">F</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mi mathvariant="normal">&#x03BB;</mml:mi></mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:mi mathvariant="normal">&#x03BB;</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mover><mml:mi>y</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mi mathvariant="normal">&#x03BB;</mml:mi></mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:mi mathvariant="normal">&#x03BB;</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>where <inline-formula id="ieqn-54"><mml:math id="mml-ieqn-54"><mml:msub><mml:mrow><mml:mover><mml:mi mathvariant="bold-italic">F</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is the interpolation between the refined feature of <inline-formula id="ieqn-55"><mml:math id="mml-ieqn-55"><mml:msub><mml:mrow><mml:mtext>F</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>i</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula id="ieqn-56"><mml:math id="mml-ieqn-56"><mml:msub><mml:mrow><mml:mtext>F</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>j</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula>, and <inline-formula id="ieqn-57"><mml:math id="mml-ieqn-57"><mml:mrow><mml:mi mathvariant="normal">&#x03BB;</mml:mi></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula> is sampled from the distribution <inline-formula id="ieqn-58"><mml:math id="mml-ieqn-58"><mml:mrow><mml:mtext>Beta</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi mathvariant="normal">&#x03B1;</mml:mi></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mi mathvariant="normal">&#x03B1;</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>.</p>
<p>The goal of Feature Mixup Model is minimizing the divergence between the model prediction on the interpolated feature <inline-formula id="ieqn-59"><mml:math id="mml-ieqn-59"><mml:msub><mml:mrow><mml:mtext>h</mml:mtext></mml:mrow><mml:mrow><mml:mi>&#x03D5;</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mrow><mml:mtext>F</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>x</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> and the soft label <inline-formula id="ieqn-60"><mml:math id="mml-ieqn-60"><mml:msub><mml:mrow><mml:mover><mml:mi>y</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, which on an unlabeled minibatch <inline-formula id="ieqn-61"><mml:math id="mml-ieqn-61"><mml:msub><mml:mrow><mml:mtext>B</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>u</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> of size <inline-formula id="ieqn-62"><mml:math id="mml-ieqn-62"><mml:mrow><mml:mtext>U&#xA0;</mml:mtext></mml:mrow></mml:math></inline-formula> can be formulated as:
<disp-formula id="eqn-8"><label>(8)</label><mml:math id="mml-eqn-8" display="block"><mml:msub><mml:mrow><mml:mrow><mml:mi>&#x02112;</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mo>|</mml:mo><mml:mi>U</mml:mi><mml:mo>|</mml:mo></mml:mrow></mml:mfrac><mml:msub><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x2208;</mml:mo><mml:msub><mml:mrow><mml:mover><mml:mi>B</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>u</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo fence="false" stretchy="false">|</mml:mo><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>&#x03D5;</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mrow><mml:mover><mml:mi mathvariant="bold-italic">F</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mrow><mml:mover><mml:mi>y</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:msup><mml:mo fence="false" stretchy="false">|</mml:mo><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:math></disp-formula></p>
</sec>
<sec id="s3_5"><label>3.5</label><title>Loss Function</title>
<p>Given a labeled data minibatch <inline-formula id="ieqn-63"><mml:math id="mml-ieqn-63"><mml:msub><mml:mrow><mml:mtext>B</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>l</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> of size <italic>L</italic> and the unlabeled data minibatch <inline-formula id="ieqn-64"><mml:math id="mml-ieqn-64"><mml:msub><mml:mrow><mml:mtext>B</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>u</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> of size <italic>U</italic>. The loss function for our approach consists of two terms: a supervised loss <inline-formula id="ieqn-65"><mml:math id="mml-ieqn-65"><mml:mtext>&#x00A0;</mml:mtext><mml:msub><mml:mrow><mml:mrow><mml:mi>&#x02112;</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>s</mml:mi><mml:mi>u</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> applied to labeled data and a consistency regularization term <inline-formula id="ieqn-66"><mml:math id="mml-ieqn-66"><mml:msub><mml:mrow><mml:mrow><mml:mi>&#x02112;</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>x</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>. Specifically, for labeled data <italic>x</italic> with label <italic>y</italic>, the cross-entropy loss can be applied <inline-formula id="ieqn-67"><mml:math id="mml-ieqn-67"><mml:msub><mml:mrow><mml:mrow><mml:mi>&#x02112;</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>s</mml:mi><mml:mi>u</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is the cross-entropy loss [<xref ref-type="bibr" rid="ref-50">50</xref>] on labeled data <inline-formula id="ieqn-68"><mml:math id="mml-ieqn-68"><mml:mi>x</mml:mi></mml:math></inline-formula>:
<disp-formula id="eqn-9"><label>(9)</label><mml:math id="mml-eqn-9" display="block"><mml:msub><mml:mrow><mml:mrow><mml:mi>&#x02112;</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>s</mml:mi><mml:mi>u</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:mtext>L</mml:mtext></mml:mrow><mml:mo>|</mml:mo></mml:mrow></mml:mfrac><mml:msub><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x2208;</mml:mo><mml:msub><mml:mi>B</mml:mi><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mrow><mml:mtext>H</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>&#x03D5;</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">F</mml:mtext></mml:mrow><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula>
where <italic>y</italic> is the label of <italic>x</italic> and <inline-formula id="ieqn-69"><mml:math id="mml-ieqn-69"><mml:msub><mml:mrow><mml:mtext>F</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>x</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> is an augmented feature.</p>
<p>Therefore, the total loss can be written as:
<disp-formula id="eqn-10"><label>(10)</label><mml:math id="mml-eqn-10" display="block"><mml:mrow><mml:mrow><mml:mi>&#x02112;</mml:mi></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mi>&#x02112;</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>s</mml:mi><mml:mi>u</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mrow><mml:mi mathvariant="normal">&#x03B1;</mml:mi></mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mi>&#x02112;</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>x</mml:mi></mml:mrow></mml:msub></mml:math></disp-formula>where <inline-formula id="ieqn-70"><mml:math id="mml-ieqn-70"><mml:mrow><mml:mi mathvariant="normal">&#x03B1;</mml:mi></mml:mrow></mml:math></inline-formula> is weight for consistency regularization term.</p>
<p>Our propose method for SSL is summarized in Algorithm 1.
</p>
<fig id="fig-7">
<graphic mimetype="image" mime-subtype="tif" xlink:href="IASC_39600-fig-7.tif"/>
</fig>
<p><bold>Algorithm 1:</bold> The proposed Attentive Neighborhood Feature Augmentation (ANFA) Algorithm for semi-supervised learning</p>
</sec>
<sec id="s3_6"><label>3.6</label><title>Complexity Analysis</title>
<p>Because we need to build a global neighborhood graph, the computational complexity and memory overhead for our proposed method will unavoidably rise. We must specifically pre-train the feature extractor on labeled data before calculating the similarity matrix. We retrieve the neighborhood of each test sample from the memory bank created in the training phase and directly construct the neighborhood subgraph in the test phase. Although additional calculations are required, the convergence rate of our proposed method is much quicker than that of strong enhancement-based methods such as FixMatch and ReMixMatch, which typically require thousands of training epochs, whereas our method only requires 500 training epochs to converge.</p>
</sec>
</sec>
<sec id="s4"><label>4</label><title>Experiments</title>
<p>In this section, we evaluate the proposed framework on commonly used SSL benchmark datasets, CIFAR-10 [<xref ref-type="bibr" rid="ref-25">25</xref>] and SVHN [<xref ref-type="bibr" rid="ref-24">24</xref>], and discuss the experimental results. We report the error rates are averaged over 5 runs with different seeds for data splitting. Specifically, we first briefly introduce the SSL benchmark datasets. Then, we show the implementation details of our proposed framework. In the end, we conduct ablation studies to validate the effectiveness of our proposed framework for SSL.</p>
<sec id="s4_1"><label>4.1</label><title>Datasets</title>
<sec id="s4_1_1"><label>4.1.1</label><title>SVHN</title>
<p>SVHN is a street view house numbers dataset, which has 73,257 training samples and 26,032 test samples from 10 number classes. The samples are <inline-formula id="ieqn-105"><mml:math id="mml-ieqn-105"><mml:mn>32</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>32</mml:mn></mml:math></inline-formula> pixel RGB images. In SVHN, each sample is a number in a street View house number, and the class represents the identity of the digit in the image. Following the standard approach in SSL, we randomly select a certain number of training samples as labeled data and discard the labels of the remaining data as unlabeled data. In SVHN, we randomly select 25, 50, 100 labeled samples from each class as the labeled samples, respectively. The batch size is set to 64 for labeled data and 128 for unlabeled data.</p>
</sec>
<sec id="s4_1_2"><label>4.1.2</label><title>CIFAR10</title>
<p>CIFAR10 is a natural image dataset, which has 50,000 training samples and 10,000 test samples belonging to 10 natural classes. The samples are RGB images of <inline-formula id="ieqn-106"><mml:math id="mml-ieqn-106"><mml:mn>32</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>32</mml:mn></mml:math></inline-formula> size. The images in the CIFAR10 dataset are from real natural objects with large differences between categories and a certain degree of recognition difficulty, which is a classic dataset in image classification tasks. For the semi-supervised experiment, we randomly select 25, 50, 100 labeled samples from each class as the labeled samples, respectively. The batch size is set to 64 for labeled data and 128 for unlabeled data.</p>
</sec>
</sec>
<sec id="s4_2"><label>4.2</label><title>Implementations</title>
<p><bold>Data Augmentation.</bold> We adopt standard data augmentation and data normalization in the preprocessing phase following our baselines. On the CIFAR10 dataset, we first augment the training data by random horizontal flipping and random translation (in the range of [&#x2212;2, 2] pixels), and then apply global contrast normalization and ZCA normalization based on statistics of all training samples. On the SVHN dataset, we first augment the training data with random translations. Inspired by [<xref ref-type="bibr" rid="ref-11">11</xref>,<xref ref-type="bibr" rid="ref-13">13</xref>], we also employ RandAugment [<xref ref-type="bibr" rid="ref-41">41</xref>] strategy to augment the training samples, which gives us a strong baseline.</p>
<p><bold>Model Architecture</bold>. We conduct our experiments using a 13-layer CNN network and Wide-Resnet-28-2 architectures. For CNN-13, we adopt the exactly same 13-layer convolution neural network architecture as in [<xref ref-type="bibr" rid="ref-10">10</xref>], which eliminates the dropout layers compared to the variants in other SSL methods. The Wide-Resnet-28-2 architecture [<xref ref-type="bibr" rid="ref-51">51</xref>] is a specific residual network architecture, with extensive hyperparameter search to compare the performance of various consistency-based semi-supervised algorithms, which has been adopted as the standard benchmark architecture in recent state-of-the-art SSL methods.</p>
<p><bold>Training</bold>. We use an SGD optimizer with a momentum of 0.9 and a weight decay factor <inline-formula id="ieqn-107"><mml:math id="mml-ieqn-107"><mml:mn>1</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:msup><mml:mn>10</mml:mn><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mn>4</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula>; the batch size is 64 for labeled data and 128 for unlabeled data. We conduct a hyperparameter search over the hyperparameters introduced by our method: the value of the consistency coefficient <inline-formula id="ieqn-108"><mml:math id="mml-ieqn-108"><mml:mrow><mml:mi mathvariant="normal">&#x03B1;</mml:mi></mml:mrow></mml:math></inline-formula> (we searched over the values in &#x007B;0.1, 0.2, 0.5, 1.0&#x007D;). During the training, we set an initial learning rate of 0.1 and then decayed using the cosine annealing strategy and obtain the final results after 500 epochs. We adopt standard data augmentation such as random cropping and horizontal flipping. As our method relied on the feature representation to build the neighborhood feature graph, we pre-train the model only on labeled training samples for 10 epochs.</p>
</sec>
<sec id="s4_3"><label>4.3</label><title>Results</title>
<p>We show our results on the CIFAR10 and SVHN datasets in <xref ref-type="table" rid="table-4">Tables 4</xref> and <xref ref-type="table" rid="table-5">5</xref> and we have the following observations.</p>
<table-wrap id="table-4"><label>Table 4</label><caption><title>Comparison of our ANFA with state-of-the-art methods on CIFAR-10</title></caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">Method</th>
<th align="center" colspan="2">CNN-13</th>
<th align="center" colspan="2">WRN-28-2</th>
</tr>
<tr>
<th align="left"></th>
<th align="center">1000 </th>
<th align="center">4000</th>
<th align="center">1000</th>
<th align="center">4000</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">PI-Model [<xref ref-type="bibr" rid="ref-1">1</xref>]</td>
<td align="left"><bold>&#x2013;</bold></td>
<td align="left">12.36&#x2009;&#x00B1;&#x2009;0.31</td>
<td align="left">23.07&#x2009;&#x00B1;&#x2009;0.66</td>
<td align="left">17.41&#x2009;&#x00B1;&#x2009;0.37</td>
</tr>
<tr>
<td align="left">TE [<xref ref-type="bibr" rid="ref-1">1</xref>]</td>
<td align="left"><bold>&#x2013;</bold></td>
<td align="left">12.16&#x2009;&#x00B1;&#x2009;0.24</td>
<td align="left"><bold>&#x2013;</bold></td>
<td align="left"><bold>&#x2013;</bold></td>
</tr>
<tr>
<td align="left">MeanTeacher [<xref ref-type="bibr" rid="ref-49">49</xref>]</td>
<td align="left">21.55&#x2009;&#x00B1;&#x2009;1.48</td>
<td align="left">12.31&#x2009;&#x00B1;&#x2009;0.28</td>
<td align="left">17.32&#x2009;&#x00B1;&#x2009;4.00</td>
<td align="left">10.36&#x2009;&#x00B1;&#x2009;0.25</td>
</tr>
<tr>
<td align="left">SNTG [<xref ref-type="bibr" rid="ref-52">52</xref>]</td>
<td align="left">18.41&#x2009;&#x00B1;&#x2009;0.52</td>
<td align="left">10.93&#x2009;&#x00B1;&#x2009;0.14</td>
<td align="left"><bold>&#x2013;</bold></td>
<td align="left"><bold>&#x2013;</bold></td>
</tr>
<tr>
<td align="left">VAT [<xref ref-type="bibr" rid="ref-7">7</xref>]</td>
<td align="left"><bold>&#x2013;</bold></td>
<td align="left">10.55</td>
<td align="left">18.68&#x2009;&#x00B1;&#x2009;0.40</td>
<td align="left">11.05&#x2009;&#x00B1;&#x2009;0.31</td>
</tr>
<tr>
<td align="left">ICT [<xref ref-type="bibr" rid="ref-10">10</xref>]</td>
<td align="left">15.48&#x2009;&#x00B1;&#x2009;0.78</td>
<td align="left">7.29&#x2009;&#x00B1;&#x2009;0.02</td>
<td align="left"><bold>&#x2013;</bold></td>
<td align="left">7.66&#x2009;&#x00B1;&#x2009;0.17</td>
</tr>
<tr>
<td align="left">PLCB [<xref ref-type="bibr" rid="ref-53">53</xref>]</td>
<td align="left">6.85&#x2009;&#x00B1;&#x2009;0.15</td>
<td align="left">5.97&#x2009;&#x00B1;&#x2009;0.15</td>
<td align="left"><bold>&#x2013;</bold></td>
<td align="left">6.28&#x2009;&#x00B1;&#x2009;0.30</td>
</tr>
<tr>
<td align="left">MixMatch [<xref ref-type="bibr" rid="ref-12">12</xref>]</td>
<td align="left"><bold>&#x2013;</bold></td>
<td align="left">6.84</td>
<td align="left">7.75&#x2009;&#x00B1;&#x2009;0.32</td>
<td align="left">6.24&#x2009;&#x00B1;&#x2009;0.06</td>
</tr>
<tr>
<td align="left">UDA [<xref ref-type="bibr" rid="ref-11">11</xref>]</td>
<td align="left"><bold>&#x2013;</bold></td>
<td align="left"><bold>&#x2013;</bold></td>
<td align="left">6.39&#x2009;&#x00B1;&#x2009;0.32</td>
<td align="left">5.27&#x2009;&#x00B1;&#x2009;0.11</td>
</tr>
<tr>
<td align="left">DMT [<xref ref-type="bibr" rid="ref-54">54</xref>]</td>
<td align="left"><bold>&#x2013;</bold></td>
<td align="left"><bold>&#x2013;</bold></td>
<td align="left">8.49&#x2009;&#x00B1;&#x2009;0.90</td>
<td align="left">5.79&#x2009;&#x00B1;&#x2009;0.19</td>
</tr>
<tr>
<td align="left">SimPLE [<xref ref-type="bibr" rid="ref-55">55</xref>]</td>
<td align="left"><bold>&#x2013;</bold></td>
<td align="left"><bold>&#x2013;</bold></td>
<td align="left"><bold>5.16</bold></td>
<td align="left"><bold>5.05</bold></td>
</tr>
<tr>
<td align="left">DNLL [<xref ref-type="bibr" rid="ref-56">56</xref>]</td>
<td align="left">12.13</td>
<td align="left">7.94</td>
<td align="left">7.97</td>
<td align="left">5.71</td>
</tr>
<tr>
<td align="left">ANFA(Ours)</td>
<td align="left"><bold>6.70&#x2009;&#x00B1;&#x2009;0.13</bold></td>
<td align="left"><bold>5.33&#x2009;&#x00B1;&#x2009;0.05</bold></td>
<td align="left">6.52&#x2009;&#x00B1;&#x2009;0.10</td>
<td align="left">5.57&#x2009;&#x00B1;&#x2009;0.15</td>
</tr>
</tbody>
</table>
</table-wrap><table-wrap id="table-5"><label>Table 5</label><caption><title>Comparison of our ANFA with state-of-the-art methods on SVHN</title></caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">Method</th>
<th align="center" colspan="3">CNN-13</th>
<th align="center" colspan="3">WRN-28-2</th>
</tr>
<tr>
<th align="left"></th>
<th align="center">250</th>
<th align="center">500</th>
<th align="center">1000</th>
<th align="center">250</th>
<th align="center">500</th>
<th align="center">1000</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">PI-Model [<xref ref-type="bibr" rid="ref-1">1</xref>]</td>
<td align="left"><bold>&#x2013;</bold></td>
<td align="left">6.65&#x2009;&#x00B1;&#x2009;0.53</td>
<td align="left">4.82&#x2009;&#x00B1;&#x2009;0.17</td>
<td align="left">18.96&#x2009;&#x00B1;&#x2009;1.92</td>
<td align="left"><bold>&#x2013;</bold></td>
<td align="left">7.54&#x2009;&#x00B1;&#x2009;0.06</td>
</tr>
<tr>
<td align="left">TE [<xref ref-type="bibr" rid="ref-1">1</xref>]</td>
<td align="left"><bold>&#x2013;</bold></td>
<td align="left">5.12&#x2009;&#x00B1;&#x2009;0.13</td>
<td align="left">4.42&#x2009;&#x00B1;&#x2009;0.16</td>
<td align="left"><bold>&#x2013;</bold></td>
<td align="left"><bold>&#x2013;</bold></td>
<td align="left"><bold>&#x2013;</bold></td>
</tr>
<tr>
<td align="left">MT [<xref ref-type="bibr" rid="ref-49">49</xref>]</td>
<td align="left"><bold>&#x2013;</bold></td>
<td align="left">21.55&#x2009;&#x00B1;&#x2009;1.48</td>
<td align="left">12.31&#x2009;&#x00B1;&#x2009;0.28</td>
<td align="left">6.45&#x2009;&#x00B1;&#x2009;2.43</td>
<td align="left">3.82&#x2009;&#x00B1;&#x2009;0.17</td>
<td align="left">3.75&#x2009;&#x00B1;&#x2009;0.10</td>
</tr>
<tr>
<td align="left">SNTG [<xref ref-type="bibr" rid="ref-52">52</xref>]</td>
<td align="left">4.29&#x2009;&#x00B1;&#x2009;0.23</td>
<td align="left">3.99&#x2009;&#x00B1;&#x2009;0.24</td>
<td align="left">3.86&#x2009;&#x00B1;&#x2009;0.27</td>
<td align="left"><bold>&#x2013;</bold></td>
<td align="left"><bold>&#x2013;</bold></td>
<td align="left"><bold>&#x2013;</bold></td>
</tr>
<tr>
<td align="left">VAT [<xref ref-type="bibr" rid="ref-7">7</xref>]</td>
<td align="left"><bold>&#x2013;</bold></td>
<td align="left"><bold>&#x2013;</bold></td>
<td align="left"><bold>&#x2013;</bold></td>
<td align="left">8.41&#x2009;&#x00B1;&#x2009;1.01</td>
<td align="left">7.44&#x2009;&#x00B1;&#x2009;0.79</td>
<td align="left">5.98&#x2009;&#x00B1;&#x2009;0.21</td>
</tr>
<tr>
<td align="left">ICT [<xref ref-type="bibr" rid="ref-10">10</xref>]</td>
<td align="left">4.78&#x2009;&#x00B1;&#x2009;0.68</td>
<td align="left">4.23&#x2009;&#x00B1;&#x2009;0.15</td>
<td align="left">3.89&#x2009;&#x00B1;&#x2009;0.04</td>
<td align="left"><bold>&#x2013;</bold></td>
<td align="left"><bold>&#x2013;</bold></td>
<td align="left"><bold>&#x2013;</bold></td>
</tr>
<tr>
<td align="left">PLCB [<xref ref-type="bibr" rid="ref-53">53</xref>]</td>
<td align="left">3.66&#x2009;&#x00B1;&#x2009;0.12</td>
<td align="left">3.64&#x2009;&#x00B1;&#x2009;0.04</td>
<td align="left">3.55&#x2009;&#x00B1;&#x2009;0.08</td>
<td align="left"><bold>&#x2013;</bold></td>
<td align="left"><bold>&#x2013;</bold></td>
<td align="left"><bold>&#x2013;</bold></td>
</tr>
<tr>
<td align="left">MixMatch [<xref ref-type="bibr" rid="ref-12">12</xref>]</td>
<td align="left">3.59</td>
<td align="left"><bold>&#x2013;</bold></td>
<td align="left">3.39</td>
<td align="left">3.78&#x2009;&#x00B1;&#x2009;0.26</td>
<td align="left"><bold>3.64&#x2009;&#x00B1;&#x2009;0.46</bold></td>
<td align="left">3.27&#x2009;&#x00B1;&#x2009;0.31</td>
</tr>
<tr>
<td align="left">SimPLE [<xref ref-type="bibr" rid="ref-55">55</xref>]</td>
<td align="left"><bold>&#x2013;</bold></td>
<td align="left"><bold>&#x2013;</bold></td>
<td align="left">3.96&#x2009;&#x00B1;&#x2009;0.10</td>
<td align="left"><bold>&#x2013;</bold></td>
<td align="left"><bold>&#x2013;</bold></td>
<td align="left">2.75&#x2009;&#x00B1;&#x2009;0.15</td>
</tr>
<tr>
<td align="left">FixMatch [<xref ref-type="bibr" rid="ref-13">13</xref>]</td>
<td align="left"><bold>&#x2013;</bold></td>
<td align="left"><bold>&#x2013;</bold></td>
<td align="left">&#x2013;</td>
<td align="left"><bold>2.64&#x2009;&#x00B1;&#x2009;0.64</bold></td>
<td align="left"><bold>&#x2013;</bold></td>
<td align="left"><bold>2.36&#x2009;&#x00B1;&#x2009;0.19</bold></td>
</tr>
<tr>
<td align="left">ANFA(Ours)</td>
<td align="left"><bold>3.41&#x2009;&#x00B1;&#x2009;0.12</bold></td>
<td align="left"><bold>3.39&#x2009;&#x00B1;&#x2009;0.07</bold></td>
<td align="left"><bold>3.20&#x2009;&#x00B1;&#x2009;0.08</bold></td>
<td align="left">3.56&#x2009;&#x00B1;&#x2009;0.11</td>
<td align="left"><bold>3.45&#x2009;&#x00B1;&#x2009;0.21</bold></td>
<td align="left">3.12&#x2009;&#x00B1;&#x2009;0.05</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>For CIFAR10, our method achieves comparable results with state-of-the-art methods. It is worth mentioning that current methods with leading performance methods on the CIFAR-10 need require thousands of training epochs. In contrast, our approach converges more easily. Meanwhile, our method outperforms all the baselines under the CNN-13 architecture with 1 and 4&#x2005;k labeled training samples.</p>
<p>For SVHN, this is much easier than the task on CIFAR-10 and the baselines already achieve a quite high accuracy. Nonetheless, our method still demonstrates a clear improvement over all the baselines across different numbers of labeled data. In particular, our method outperforms all of the baselines under the CNN-13 architecture with 250, 500, and 1&#x2005;K labeled training data, which already beats the results of all baselines with 500 labeled samples.</p>
</sec>
<sec id="s4_4"><label>4.4</label><title>Ablation Study</title>
<p><bold>Comparison with other attention functions.</bold> In the proposed method, we investigate the impact of various attention functions, and we choose classical attention functions for the experiments: dot-product attention, additive attention, hard attention, and multi-head attention. The experimental findings on the CIFAR-10 dataset are shown in <xref ref-type="fig" rid="fig-4">Fig. 4</xref>. We can see that the proposed method has the same performance when using the additive attention function as when using the dot product attention function, but the calculation is faster when using the dot product attention function because it can be computed using highly optimized matrix multiplication. At the same time, when using the hard attention function, performance is slightly lower because using the one-hot weight loses some local information. Multi-head attention performs slightly better than dot-product attention, but it requires more memory and calculations. In conclusion, we employ dot-product attention, which has slightly lower performance but lower computational overhead.</p>
<fig id="fig-4"><label>Figure 4</label><caption><title>Comparison with other attention functions on CIFAR-10</title></caption><graphic mimetype="image" mime-subtype="tif" xlink:href="IASC_39600-fig-4.tif"/></fig>
<p><bold>Effectiveness of Attentive Aggregation.</bold> We propose an attention-based feature augmentation module that aggregates the neighboring features to enhance the features of the target instance, which improves the performance of the model. To show the effectiveness of attention-based aggregation, we compare the proposed attentive aggregation with the average feature aggregate method, which is the most straightforward strategy for summarizing features. We adopt ICT as the baseline model and conduct experiments on the CIFAR10 dataset with 4000 labeled samples. We conduct a baseline experiment providing the comparison results in <xref ref-type="fig" rid="fig-5">Fig. 5</xref>. We can observe that the attention-based neighborhood feature augmentation module improves the performances of the ICT model (from 91.4&#x0025; to 93.2&#x0025;), and the neighborhood information helps the model to learn discriminative feature embeddings. Meanwhile, attention-based aggregation performs better than average aggregation and attentive aggregation converges faster because the adaptive weight learned by attention fully captures the neighborhood information.</p>
<fig id="fig-5"><label>Figure 5</label><caption><title>Test classification accuracy on CIFAR-10</title></caption><graphic mimetype="image" mime-subtype="tif" xlink:href="IASC_39600-fig-5.tif"/></fig>
<p><bold>Evaluation of the Neighborhood Feature Graph Size.</bold> We find that different numbers of neighbors affect the performance of the experiment. Our previous experiments on CIFAR10 fixed the size of the neighbor graph to 16. Here we explore different neighbor graph sizes for our attentive neighborhood feature augmentation. Specifically, we conduct experiments with different neighbor graph sizes on the CIFAR10 and SVHN datasets, respectively, and present the results in <xref ref-type="fig" rid="fig-6">Fig. 6</xref>. It can be seen from the figure that the final performance will be reduced if the number of neighbors is too large or too small. This may be explained by the fact that a too-small number of neighbors will not obtain sufficient neighbor information, while a too-large number of neighbors will introduce irrelevant neighbors, which may weaken the effectiveness of neighborhood aggregation and thus impair the target features [<xref ref-type="bibr" rid="ref-57">57</xref>].</p>
<fig id="fig-6"><label>Figure 6</label><caption><title>Evaluation of number of neighbor graph size on CIFAR-10(a) and SVHN(b)</title></caption><graphic mimetype="image" mime-subtype="tif" xlink:href="IASC_39600-fig-6.tif"/></fig>
<p><bold>Combination of Augmentation Strategy.</bold> Since our method employs a data augmentation strategy, we will further investigate the impact of commonly used pixel-based data augmentation strategies on the performance of the proposed method. We conduct ablation experiments on CIFAR10 datasets with WRN-28-2 architecture to study the influence of strong augmentation policies (RandAugment) and Mixup on experimental performance. The results are shown in <xref ref-type="table" rid="table-6">Table 6</xref>. As we can see, excellent data augmentation techniques give a boost to our approach. Our method can be well combined with other pixel-based augmentation strategies, as various transpositions can provide richer neighborhood information and drive our model to learn better feature representations for refinement.</p>
<table-wrap id="table-6"><label>Table 6</label><caption><title>Comparison of our ANFA with data augmentation on CIFAR-10</title></caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">Ablation</th>
<th align="left">4000 labeled</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">ANFA w/o data augmentation</td>
<td align="left">91.56</td>
</tr>
<tr>
<td align="left">ANFA with Mixup</td>
<td align="left">93.01</td>
</tr>
<tr>
<td align="left">ANFA with RandAugment</td>
<td align="left">94.43</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
<sec id="s5"><label>5</label><title>Conclusion</title>
<p>In this paper, we propose a novel data augmentation method for semi-supervised learning by exploiting neighborhood information of a given instance in its semantic feature. First, for the target instance, we construct a neighbor graph based on a similarity matrix calculated by its neighbor features in the semantic layer. Second, we refine the target features with an attention-based module according to the neighbor graph. Finally, we mix up the target features and their corresponding predictions and promote a novel consistency loss as the consistency regularization. We conducted experiments on SVHN and CIFAR10 datasets. The experimental results demonstrate that our proposal is superior to the state-of-the-art SSL methods under CNN-13 neural architecture when the number of label examples is small. Moreover, the attention-based module in our method can be combined with some mainstream semi-supervised learning methods to further improve the SSL performance. Note that it might be time-consuming to create the neighborhood graph in our method when the number of training examples is large. Thus, for future work, we will focus on reducing the time complexity of constructing the neighborhood graph by exploring a parallel computation strategy. In addition, we will consider the scenario where the training dataset is unbalanced.</p>
</sec>
</body>
<back>
<sec><title>Funding Statement</title>
<p>This work was supported by the National Natural Science Foundation of China (Nos. 62072127, 62002076, 61906049), Natural Science Foundation of Guangdong Province (Nos. 2023A1515011774, 2020A1515010423), Project 6142111180404 supported by CNKLSTISS, Science and Technology Program of Guangzhou, China (No. 202002030131), Guangdong basic and applied basic research fund joint fund Youth Fund (No. 2019A1515110213), Open Fund Project of Fujian Provincial Key Laboratory of Information Processing and Intelligent Control (Minjiang University) (No. MJUKF-IPIC202101), Natural Science Foundation of Guangdong Province No. 2020A1515010423), Scientific research project for Guangzhou University (No. RP2022003).</p></sec>
<sec sec-type="COI-statement"><title>Conflicts of Interest</title>
<p>The authors declare that they have no conflicts of interest to report regarding the present study.</p></sec>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Z.</given-names> <surname>Zhou</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Ding</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>E.</given-names> <surname>Mohammadi</surname></string-name>, <string-name><given-names>G.</given-names> <surname>Liu</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Sequential order-aware coding-based robust subspace clustering for human action recognition in untrimmed videos</article-title>,&#x201D; <source>IEEE Transactions on Image Processing</source>, vol. <volume>32</volume>, pp. <fpage>13</fpage>&#x2013;<lpage>28</lpage>, <year>2023</year>; <pub-id pub-id-type="pmid">36459602</pub-id></mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Wu</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Li</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Lam</surname></string-name></person-group>, &#x201C;<article-title>Blockchain-based secure key management for mobile edge computing</article-title>,&#x201D; <source>IEEE Transactions&#x2002;on Mobile Computing</source>, vol. <volume>22</volume>, no. <issue>1</issue>, pp. <fpage>100</fpage>&#x2013;<lpage>114</lpage>, <year>2023</year>.</mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>W.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Zhou</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Wei</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Li</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>A spatiotemporal and motion information extraction network for action recognition</article-title>,&#x201D; <source>Wireless Networks</source>, vol. <volume>29</volume>, pp. <fpage>1</fpage>&#x2013;<lpage>17</lpage>, <year>2023</year>.</mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>X.</given-names> <surname>Wei</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Zhou</surname></string-name> and <string-name><given-names>X.</given-names> <surname>Wang</surname></string-name></person-group>, &#x201C;<article-title>Contrastive distortion-level learning-based no-reference image-quality assessment</article-title>,&#x201D; <source>International Journal of Intelligent Systems</source>, vol. <volume>37</volume>, no. <issue>11</issue>, pp. <fpage>8730</fpage>&#x2013;<lpage>8746</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>W.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Wang</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Li</surname></string-name></person-group>, &#x201C;<article-title>Enhancing blockchain-based filtration mechanism via IPFS for collaborative intrusion detection in IoT networks</article-title>,&#x201D; <source>Journal of Systems Architecture</source>, vol. <volume>127</volume>, pp. <fpage>102510</fpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Laine</surname></string-name> and <string-name><given-names>T.</given-names> <surname>Aila</surname></string-name></person-group>, &#x201C;<article-title>Temporal ensembling for semi-supervised learning</article-title>,&#x201D; <comment>arXiv preprint arXiv:1610.02242</comment>, <year>2017</year>.</mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>T.</given-names> <surname>Miyato</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Maeda</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Koyama</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Ishii</surname></string-name></person-group>, &#x201C;<article-title>Virtual adversarial training: A regularization method for supervised and semi-supervised learning</article-title>,&#x201D; <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>, vol. <volume>41</volume>, no. <issue>8</issue>, pp. <fpage>1979</fpage>&#x2013;<lpage>1993</lpage>, <year>2018</year>; <pub-id pub-id-type="pmid">30040630</pub-id></mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>X.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>Q.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>W.</given-names> <surname>Zhao</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Li</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Generative adversarial training for supervised and semi-supervised learning</article-title>,&#x201D; <source>Frontiers Neurorobotics</source>, vol. <volume>16</volume>, pp. <fpage>859610</fpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Z.</given-names> <surname>Luo</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Zhu</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Fang</surname></string-name>, <string-name><given-names>G.</given-names> <surname>Kou</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Hou</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>An effective and practical gradient inversion attack</article-title>,&#x201D; <source>International Journal of Intelligent Systems</source>, vol. <volume>37</volume>, no. <issue>11</issue>, pp. <fpage>9373</fpage>&#x2013;<lpage>9389</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>V.</given-names> <surname>Verma</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Kawaguchi</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Lamb</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Kannala</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Bengio</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Interpolation consistency training for semi-supervised learning</article-title>,&#x201D; <source>Neural Networks</source>, vol. <volume>145</volume>, pp. <fpage>90</fpage>&#x2013;<lpage>106</lpage>, <year>2022</year>; <pub-id pub-id-type="pmid">34735894</pub-id></mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Q.</given-names> <surname>Xie</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Dai</surname></string-name>, <string-name><given-names>E.</given-names> <surname>Hovy</surname></string-name>, <string-name><given-names>T.</given-names> <surname>Luong</surname></string-name> and <string-name><given-names>Q. V.</given-names> <surname>Le</surname></string-name></person-group>, &#x201C;<article-title>Unsupervised data augmentation for consistency training</article-title>,&#x201D; <source>Advances in Neural Information Processing Systems</source>, vol. <volume>33</volume>, pp. <fpage>6256</fpage>&#x2013;<lpage>6268</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>D.</given-names> <surname>Berthelot</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Carlini</surname></string-name>, <string-name><given-names>I.</given-names> <surname>Goodfellow</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Oliver</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Papernot</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Mixmatch: A holistic approach to semi-supervised learning</article-title>,&#x201D; <source>Advances in Neural Information Processing Systems</source>, vol. <volume>32</volume>, pp. <fpage>1</fpage>&#x2013;<lpage>11</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>K.</given-names> <surname>Sohn</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Berthelot</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Carlini</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Fixmatch: Simplifying semi-supervised learning with consistency and confidence</article-title>,&#x201D; <source>Advances in Neural Information Processing Systems</source>, vol. <volume>33</volume>, pp. <fpage>596</fpage>&#x2013;<lpage>608</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Y.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Xia</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Zou</surname></string-name> and <string-name><given-names>K.</given-names> <surname>Yang</surname></string-name></person-group>, &#x201C;<article-title>FFTI: Image inpainting algorithm via features fusion and Two-steps inpainting</article-title>,&#x201D; <source>Journal of Visual Communication and Image Representation</source>, vol. <volume>93</volume>, pp. <fpage>103776</fpage>, <year>2023</year>.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>R.</given-names> <surname>Xia</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Chen</surname></string-name> and <string-name><given-names>B.</given-names> <surname>Ren</surname></string-name></person-group>, &#x201C;<article-title>Improved anti-occlusion object tracking algorithm using unscented rauch-tung-striebel smoother and kernel correlation filter</article-title>,&#x201D; <source>Journal of King Saud University-Computer and Information Sciences</source>, vol. <volume>34</volume>, no. <issue>8</issue>, pp. <fpage>6008</fpage>&#x2013;<lpage>6018</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Y.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Xia</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Yang</surname></string-name> and <string-name><given-names>K.</given-names> <surname>Zou</surname></string-name></person-group>, &#x201C;<article-title>MFFN: Image super-resolution via multi-level features fusion network</article-title>,&#x201D; <source>The Visual Computer</source>, vol. <volume>39</volume>, pp. <fpage>1</fpage>&#x2013;<lpage>16</lpage>, <year>2023</year>.</mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Y.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>V.</given-names> <surname>Phonevilay</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Gu</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Xia</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Image super-resolution reconstruction based on feature map attention mechanism</article-title>,&#x201D; <source>Applied Intelligence</source>, vol. <volume>51</volume>, pp. <fpage>4367</fpage>&#x2013;<lpage>4380</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>X.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Kuang</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Chen</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Oblivious transfer for privacy-preserving in VANET&#x2019;s feature matching</article-title>,&#x201D; <source>IEEE Transactions on Intelligent Transportation Systems</source>, vol. <volume>22</volume>, no. <issue>7</issue>, pp. <fpage>4359</fpage>&#x2013;<lpage>4366</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>V.</given-names> <surname>Verma</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Lamb</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Beckham</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Najafi</surname></string-name>, <string-name><given-names>I.</given-names> <surname>Mitilagkas</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Manifold mixup: Better representations by interpolating hidden states</article-title>,&#x201D; in <conf-name>Int. Conf. on Machine Learning</conf-name>, <conf-loc>Long Beach, CA, USA</conf-loc>, pp. <fpage>6438</fpage>&#x2013;<lpage>6447</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>P.</given-names> <surname>Upchurch</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Gardner</surname></string-name>, <string-name><given-names>G.</given-names> <surname>Pleiss</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Pless</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Snavely</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Deep feature interpolation for image content changes</article-title>,&#x201D; in <conf-name>Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition</conf-name>, <conf-loc>Honolulu, HI, USA</conf-loc>, pp. <fpage>7064</fpage>&#x2013;<lpage>7073</lpage>, <year>2017</year>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>C.</given-names> <surname>Kuo</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Ma</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Huang</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Kira</surname></string-name>, <string-name><given-names>G.</given-names> <surname>Tech</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Featmatch: Feature-based augmentation for semi-supervised learning</article-title>,&#x201D; in <conf-name>Computer Vision&#x2013;ECCV 2020: 16th European Conf., Glasgow, UK, August 23&#x2013;28, 2020, Proc., Part XVIII 16</conf-name>, <conf-loc>Springer Int. Publishing</conf-loc>, pp. <fpage>479</fpage>&#x2013;<lpage>495</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>X.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Yin</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>P.</given-names> <surname>Ding</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Liu</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>TrajectoryCNN: A new spatio-temporal feature learning network for human motion prediction</article-title>,&#x201D; <source>IEEE Transactions on Circuits and Systems for Video Technology</source>, vol. <volume>31</volume>, no. <issue>6</issue>, pp. <fpage>2133</fpage>&#x2013;<lpage>2146</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-23"><label>[23]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Vaswani</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Shazeer</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Parmar</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Uszkoreit</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Jones</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Attention is all you need</article-title>,&#x201D; <source>Advances in Neural Information Processing Systems</source>, vol. <volume>30</volume>, pp. <fpage>5998</fpage>&#x2013;<lpage>6008</lpage>, <year>2017</year>.</mixed-citation></ref>
<ref id="ref-24"><label>[24]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><given-names>Y.</given-names> <surname>Netzer</surname></string-name>, <string-name><given-names>T.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Coates</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Bissacco</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Wu</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Reading digits in natural images with unsupervised feature learning</article-title>,&#x201D; <year>2011</year>.</mixed-citation></ref>
<ref id="ref-25"><label>[25]</label><mixed-citation publication-type="thesis"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Krizhevsky</surname></string-name> and <string-name><given-names>G.</given-names> <surname>Hinton</surname></string-name></person-group>, &#x201C;<article-title>Learning multiple layers of features from tiny images</article-title>,&#x201D; <source>Master&#x2019;s Thesis</source>, <publisher-name>University of Tront</publisher-name>, pp. <fpage>32</fpage>&#x2013;<lpage>33</lpage>, <year>2009</year>.</mixed-citation></ref>
<ref id="ref-26"><label>[26]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Zagoruyko</surname></string-name> and <string-name><given-names>N.</given-names> <surname>Komodakis</surname></string-name></person-group>, &#x201C;<article-title>Wide residual networks</article-title>,&#x201D; <comment>arXiv preprint arXiv:1605.07146</comment>, <year>2016</year>.</mixed-citation></ref>
<ref id="ref-27"><label>[27]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>X.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Kuang</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Tan</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Li</surname></string-name></person-group>, &#x201C;<article-title>The security of machine learning in an adversarial setting: A survey</article-title>,&#x201D; <source>Journal of Parallel and Distributed Computing</source>, vol. <volume>130</volume>, pp. <fpage>12</fpage>&#x2013;<lpage>23</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-28"><label>[28]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>T.</given-names> <surname>Huang</surname></string-name>, <string-name><given-names>Q.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Hou</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Wang</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Adversarial attacks on deep-learning-based SAR image target recognition</article-title>,&#x201D; <source>Journal of Network and Computer Applications</source>, vol. <volume>162</volume>, no. <issue>12</issue>, pp. <fpage>102632</fpage>. <year>2020</year>.</mixed-citation></ref>
<ref id="ref-29"><label>[29]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Lai</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Huo</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Hou</surname></string-name> and <string-name><given-names>X.</given-names> <surname>Wang</surname></string-name></person-group>, &#x201C;<article-title>A universal detection method for adversarial examples and fake images</article-title>,&#x201D; <source>Sensors</source>, vol. <volume>22</volume>, no. <issue>9</issue>, pp. <fpage>3445</fpage>, <year>2022</year>; <pub-id pub-id-type="pmid">35591134</pub-id></mixed-citation></ref>
<ref id="ref-30"><label>[30]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>N.</given-names> <surname>Srivastava</surname></string-name>, <string-name><given-names>G.</given-names> <surname>Hinton</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Krizhevsky</surname></string-name>, <string-name><given-names>I.</given-names> <surname>Sutskever</surname></string-name> and <string-name><given-names>R.</given-names> <surname>Salakhutdinov</surname></string-name></person-group>, &#x201C;<article-title>Dropout: A simple way to prevent neural networks from overfitting</article-title>,&#x201D; <source>The Journal of Machine Learning Research</source>, vol. <volume>15</volume>, no. <issue>1</issue>, pp. <fpage>1929</fpage>&#x2013;<lpage>1958</lpage>, <year>2014</year>.</mixed-citation></ref>
<ref id="ref-31"><label>[31]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><given-names>H.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Cisse</surname></string-name>, <string-name><given-names>Y. N.</given-names> <surname>Dauphin</surname></string-name> and <string-name><given-names>D.</given-names> <surname>Lopez-Paz</surname></string-name></person-group>, &#x201C;<article-title>Mixup: Beyond empirical risk minimization</article-title>,&#x201D; <comment>arXiv preprint arXiv:1710.09412</comment>, <year>2017</year>.</mixed-citation></ref>
<ref id="ref-32"><label>[32]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>X.</given-names> <surname>Wei</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Wei</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Kong</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Lu</surname></string-name>, <string-name><given-names>W.</given-names> <surname>Xing</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>FMixcutmatch for semi-supervised deep learning</article-title>,&#x201D; <source>Neural Networks</source>, vol. <volume>133</volume>, pp. <fpage>166</fpage>&#x2013;<lpage>176</lpage>, <year>2021</year>; <pub-id pub-id-type="pmid">33217685</pub-id></mixed-citation></ref>
<ref id="ref-33"><label>[33]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Yang</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Ling</surname></string-name></person-group>, &#x201C;<article-title>Attention-based label consistency for semi-supervised deep learning based image classification</article-title>,&#x201D; <source>Neurocomputing</source>, vol. <volume>453</volume>, pp. <fpage>731</fpage>&#x2013;<lpage>741</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-34"><label>[34]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><given-names>D.</given-names> <surname>Berthelot</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Carlini</surname></string-name>, <string-name><given-names>E. D.</given-names> <surname>Cubuk</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Kurakin</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Zhang</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Remixmatch: Semi-supervised learning with distribution alignment and augmentation anchoring</article-title>,&#x201D; <comment>arXiv preprint arXiv:1911.09785</comment>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-35"><label>[35]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>X.</given-names> <surname>Yang</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Che</surname></string-name>, <string-name><given-names>M. F.</given-names> <surname>Leung</surname></string-name> and <string-name><given-names>C.</given-names> <surname>Liu</surname></string-name></person-group>, &#x201C;<article-title>Adaptive graph nonnegative matrix factorization with the self-paced regularization</article-title>,&#x201D; <source>Applied Intelligence</source>, vol. <volume>52</volume>, pp. <fpage>1</fpage>&#x2013;<lpage>18</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-36"><label>[36]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>K.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Che</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Li</surname></string-name> and <string-name><given-names>M. F.</given-names> <surname>Leung</surname></string-name></person-group>, &#x201C;<article-title>Graph non-negative matrix factorization with alternative smoothed L 0 regularizations</article-title>,&#x201D; <source>Neural Computing and Applications</source>, vol. <volume>34</volume>, pp. <fpage>1</fpage>&#x2013;<lpage>15</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-37"><label>[37]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>F.</given-names> <surname>Ou</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>G.</given-names> <surname>Zhu</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Kwong</surname></string-name></person-group>, &#x201C;<article-title>A novel rank learning based no-reference image quality assessment method</article-title>,&#x201D; <source>IEEE Transactions on Multimedia</source>, vol. <volume>24</volume>, pp. <fpage>4197</fpage>&#x2013;<lpage>4211</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-38"><label>[38]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>E. D.</given-names> <surname>Cubuk</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Zoph</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Mane</surname></string-name>, <string-name><given-names>V.</given-names> <surname>Vasudevan</surname></string-name>, <string-name><given-names>Q. V.</given-names> <surname>Le</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Autoaugment: Learning augmentation strategies from data</article-title>,&#x201D; in <conf-name>Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition</conf-name>, <conf-loc>Long Beach, CA, USA</conf-loc>, pp. <fpage>113</fpage>&#x2013;<lpage>123</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-39"><label>[39]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>X.</given-names> <surname>Lu</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Jie</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Lin</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Xiao</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Li</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Reinforcement learning based energy efficient robot relay for unmanned aerial vehicles against smart jamming</article-title>,&#x201D; <source>Science China Information Sciences</source>, vol. <volume>65</volume>, no. <issue>1</issue>, pp. <fpage>112304</fpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-40"><label>[40]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><given-names>B.</given-names> <surname>Zoph</surname></string-name> and <string-name><given-names>Q. V.</given-names> <surname>Le</surname></string-name></person-group>, &#x201C;<article-title>Neural architecture search with reinforcement learning</article-title>,&#x201D; in <source>Proc. of Int. Conf. on Learning Representations</source>, <conf-loc>Toulon, France</conf-loc>, pp. <fpage>1</fpage>&#x2013;<lpage>16</lpage>, <year>2017</year>.</mixed-citation></ref>
<ref id="ref-41"><label>[41]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>E. D.</given-names> <surname>Cubuk</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Zoph</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Shlens</surname></string-name> and <string-name><given-names>Q. V.</given-names> <surname>Le</surname></string-name></person-group>, &#x201C;<article-title>Randaugment: Practical automated data augmentation with a reduced search space</article-title>,&#x201D; in <conf-name>Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition Workshops</conf-name>, <conf-loc>Seattle, WA, USA</conf-loc>, pp. <fpage>702</fpage>&#x2013;<lpage>703</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-42"><label>[42]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>C.</given-names> <surname>Ji</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Zhu</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>W.</given-names> <surname>Zhai</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Zong</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Task-aware swapping for efficient DNN inference on DRAM-constrained edge systems</article-title>,&#x201D; <source>International Journal of Intelligent Systems</source>, vol. <volume>37</volume>, no. <issue>11</issue>, pp. <fpage>8155</fpage>&#x2013;<lpage>8169</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-43"><label>[43]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>N.</given-names> <surname>Jiang</surname></string-name>, <string-name><given-names>W.</given-names> <surname>Jie</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Liu</surname></string-name> and <string-name><given-names>D.</given-names> <surname>Jin</surname></string-name></person-group>, &#x201C;<article-title>GATrust: A multi-aspect graph attention network model for trust assessment in OSNs</article-title>,&#x201D; <source>IEEE Transactions on Knowledge and Data Engineering</source>, vol. <volume>32</volume>, pp. <fpage>1</fpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-44"><label>[44]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Hu</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Li</surname></string-name> and <string-name><given-names>G.</given-names> <surname>Sun</surname></string-name></person-group>, &#x201C;<article-title>Squeeze-and-excitation networks</article-title>,&#x201D; in <conf-name>Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition</conf-name>, <conf-loc>Salt Lake City, UT, USA</conf-loc>, pp. <fpage>7132</fpage>&#x2013;<lpage>7141</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-45"><label>[45]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>X.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Zheng</surname></string-name>, <string-name><given-names>P.</given-names> <surname>Cheng</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Li</surname></string-name> and <string-name><given-names>L.</given-names> <surname>You</surname></string-name></person-group>, &#x201C;<article-title>MACT: A multi-channel anonymous consensus based on Tor</article-title>,&#x201D; <source>World Wide Web</source>, vol. <volume>25</volume>, pp. <fpage>1</fpage>&#x2013;<lpage>25</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-46"><label>[46]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>F.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Jiang</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Qian</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Yang</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Li</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Residual attention network for image classification</article-title>,&#x201D; in <conf-name>Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition</conf-name>, <conf-loc>Honolulu, HI, USA</conf-loc>, pp. <fpage>3156</fpage>&#x2013;<lpage>3164</lpage>, <year>2017</year>.</mixed-citation></ref>
<ref id="ref-47"><label>[47]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Woo</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Park</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Lee</surname></string-name> and <string-name><given-names>I.</given-names> <surname>Kweon</surname></string-name></person-group>, &#x201C;<article-title>Cbam: Convolutional block attention module</article-title>,&#x201D; in <conf-name>Proc. of the European Conf. on Computer Vision (ECCV)</conf-name>, <conf-loc>Munich, Germany</conf-loc>, pp. <fpage>3</fpage>&#x2013;<lpage>19</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-48"><label>[48]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Y.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>Q.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>T.</given-names> <surname>Li</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Energy trading scheme based on consortium blockchain and game theory</article-title>,&#x201D; <source>Computer Standards &#x0026; Interfaces</source>, vol. <volume>84</volume>, pp. <fpage>103699</fpage>, <year>2023</year>.</mixed-citation></ref>
<ref id="ref-49"><label>[49]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Tarvainen</surname></string-name> and <string-name><given-names>H.</given-names> <surname>Valpola</surname></string-name></person-group>, &#x201C;<article-title>Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results</article-title>,&#x201D; <source>Advances in Neural Information Processing Systems</source>, vol. <volume>30</volume>, pp. <fpage>1</fpage>&#x2013;<lpage>10</lpage>, <year>2017</year>.</mixed-citation></ref>
<ref id="ref-50"><label>[50]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Y.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Ma</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Zhang</surname></string-name> and <string-name><given-names>H.</given-names> <surname>Zhou</surname></string-name></person-group>, &#x201C;<article-title>DE-RSTC: A rational secure two-party computation protocol based on direction entropy</article-title>,&#x201D; <source>Int. J. International Journal of Intelligent Systems</source>, vol. <volume>37</volume>, no. <issue>11</issue>, pp. <fpage>8947</fpage>&#x2013;<lpage>8967</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-51"><label>[51]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Oliver</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Odena</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Raffel</surname></string-name>, <string-name><given-names>E. D.</given-names> <surname>Cubuk</surname></string-name> and <string-name><given-names>I. J.</given-names> <surname>Goodfellow</surname></string-name></person-group>, &#x201C;<article-title>Realistic evaluation of deep semi-supervised learning algorithms</article-title>,&#x201D; <source>Advances in Neural Information Processing Systems</source>, vol. <volume>31</volume>, pp. <fpage>1</fpage>&#x2013;<lpage>12</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-52"><label>[52]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>Y.</given-names> <surname>Luo</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Zhu</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Ren</surname></string-name> and <string-name><given-names>B.</given-names> <surname>Zhang</surname></string-name></person-group>, &#x201C;<article-title>Smooth neighbors on teacher graphs for semi-supervised learning</article-title>,&#x201D; in <conf-name>Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition</conf-name>, <conf-loc>Salt Lake City, UT, USA</conf-loc>, pp. <fpage>8896</fpage>&#x2013;<lpage>8905</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-53"><label>[53]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>E.</given-names> <surname>Arazo</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Ortego</surname></string-name>, <string-name><given-names>P.</given-names> <surname>Albert</surname></string-name>, <string-name><given-names>N. E.</given-names> <surname>O&#x2019; Connor</surname></string-name> and <string-name><given-names>K.</given-names> <surname>McGuinness</surname></string-name></person-group>, &#x201C;<article-title>Pseudo-labeling and confirmation bias in deep semi-supervised learning</article-title>,&#x201D; <conf-name>Int. Joint Conf. on Neural Networks</conf-name>, <conf-loc>Glasgow, United Kingdom</conf-loc>, pp. <fpage>1</fpage>&#x2013;<lpage>8</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-54"><label>[54]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Z.</given-names> <surname>Feng</surname></string-name>, <string-name><given-names>Q.</given-names> <surname>Zhou</surname></string-name>, <string-name><given-names>Q.</given-names> <surname>Gu</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Tan</surname></string-name>, <string-name><given-names>G.</given-names> <surname>Cheng</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Dmt: Dynamic mutual training for semi-supervised learning</article-title>,&#x201D; <source>Pattern Recognition</source>, vol. <volume>30</volume>, pp. <fpage>108777</fpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-55"><label>[55]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>Z.</given-names> <surname>Hu</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Yang</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Hu</surname></string-name> and <string-name><given-names>R.</given-names> <surname>Nevatia</surname></string-name></person-group>, &#x201C;<article-title>Simple: Similar pseudo label exploitation for semi-supervised classification</article-title>,&#x201D; in <conf-name>Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition</conf-name>, <conf-loc>Nashville, TN, USA</conf-loc>, pp. <fpage>15099</fpage>&#x2013;<lpage>15108</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-56"><label>[56]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>H.</given-names> <surname>Xu</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Xiao</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Hao</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Dong</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Qiu</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Semi-supervised learning with pseudo-negative labels for image classification</article-title>,&#x201D; <source>Knowledge-Based Systems</source>, vol. <volume>260</volume>, pp. <fpage>110166</fpage>, <year>2023</year>.</mixed-citation></ref>
<ref id="ref-57"><label>[57]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>W.</given-names> <surname>Tang</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Barni</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Li</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Huang</surname></string-name></person-group>, &#x201C;<article-title>Improving cost learning for JPEG steganography by exploiting JPEG domain knowledge</article-title>,&#x201D; <source>IEEE Transactions on Circuits and Systems for Video Technology</source>, vol. <volume>32</volume>, no. <issue>6</issue>, pp. <fpage>4081</fpage>&#x2013;<lpage>4095</lpage>, <year>2021</year>.</mixed-citation></ref>
</ref-list>
</back>
</article>