<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">JNM</journal-id>
<journal-id journal-id-type="nlm-ta">JNM</journal-id>
<journal-id journal-id-type="publisher-id">JNM</journal-id>
<journal-title-group>
<journal-title>Journal of New Media</journal-title>
</journal-title-group>
<issn pub-type="epub">2579-0129</issn>
<issn pub-type="ppub">2579-0110</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">32447</article-id>
<article-id pub-id-type="doi">10.32604/jnm.2022.032447</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Build Gaussian Distribution Under Deep Features for Anomaly Detection and Localization</article-title>
<alt-title alt-title-type="left-running-head">Build Gaussian Distribution Under Deep Features for Anomaly Detection and Localization</alt-title>
<alt-title alt-title-type="right-running-head">Build Gaussian Distribution Under Deep Features for Anomaly Detection and Localization</alt-title>
</title-group>
<contrib-group content-type="authors">
<contrib id="author-1" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Wang</surname><given-names>Mei</given-names></name><xref ref-type="aff" rid="aff-1">1</xref><email>18801585101@163.com</email></contrib>
<contrib id="author-2" contrib-type="author">
<name name-style="western"><surname>Xu</surname><given-names>Hao</given-names></name><xref ref-type="aff" rid="aff-2">2</xref></contrib>
<contrib id="author-3" contrib-type="author">
<name name-style="western"><surname>Chen</surname><given-names>Yadang</given-names></name><xref ref-type="aff" rid="aff-1">1</xref></contrib>
<aff id="aff-1"><label>1</label><institution>School of Computer &#x0026; Software, Nanjing University of Information Science and Technology</institution>, <addr-line>Nanjing, 210044</addr-line>, <country>China</country></aff>
<aff id="aff-2"><label>2</label><institution>College of Computer Science and Engineering, Chongqing University of Technology</institution>, <addr-line>Chongqing, 400054</addr-line>, <country>China</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>&#x002A;</label>Corresponding Author: Mei Wang. Email: <email>18801585101@163.com</email></corresp>
</author-notes>
<pub-date pub-type="epub" date-type="pub" iso-8601-date="2022-12-9"><day>9</day>
<month>12</month>
<year>2022</year></pub-date>
<volume>4</volume>
<issue>4</issue>
<fpage>179</fpage>
<lpage>190</lpage>
<history>
<date date-type="received"><day>18</day><month>5</month><year>2022</year></date>
<date date-type="accepted"><day>20</day><month>6</month><year>2022</year></date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2022 Wang et al.</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Wang et al.</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_JNM_32447.pdf"></self-uri>
<abstract>
<p>Anomaly detection in images has attracted a lot of attention in the field of computer vision. It aims at identifying images that deviate from the norm and segmenting the defect within images. However, anomalous samples are difficult to collect comprehensively, and labeled data is costly to obtain in many practical scenarios. We proposes a simple framework for unsupervised anomaly detection. Specifically, the proposed method directly employs CNN pre-trained on ImageNet to extract deep features from normal images and reduce dimensionality based on Principal Components Analysis (PCA), then build the distribution of normal features via the multivariate Gaussian (MVG), and determine whether the test image is an abnormal image according to Mahalanobis distance. We further investigate which features are most effective in detecting anomalies. Extensive experiments on the MVTec anomaly detection dataset show that the proposed method achieves 98.6&#x0025; AUROC in image-level anomaly detection and outperforms previous methods by a large margin.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Anomaly detection</kwd>
<kwd>dimensionality reduction</kwd>
<kwd>multivariate gaussian</kwd>
<kwd>visual inspection</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1"><label>1</label><title>Introduction</title>
<p>Anomaly detection (AD) is of utmost importance for numerous tasks in the field of computer vision, and it aims at precisely identifying abnormal images and often be seen as a binary classification task. AD has great significance and application value in the field of industrial inspection [<xref ref-type="bibr" rid="ref-1">1</xref>&#x2013;<xref ref-type="bibr" rid="ref-3">3</xref>], medical image analysis [<xref ref-type="bibr" rid="ref-4">4</xref>], and surveillance [<xref ref-type="bibr" rid="ref-5">5</xref>,<xref ref-type="bibr" rid="ref-6">6</xref>]. However, defective samples are difficult to collect comprehensively, and labeled data is costly to obtain in many practical scenarios. Previous studies solved this challenge by following an unsupervised learning paradigm, such as one-class support vector machine (OC-SVM) [<xref ref-type="bibr" rid="ref-7">7</xref>,<xref ref-type="bibr" rid="ref-8">8</xref>]. However, these solutions are very sensitive to the feature space used. Hence, DeepSVDD [<xref ref-type="bibr" rid="ref-9">9</xref>] comes into existence and first introduces one-class classification to AD. Recently, there are some self-supervised learning methods [<xref ref-type="bibr" rid="ref-10">10</xref>,<xref ref-type="bibr" rid="ref-11">11</xref>] that design good pretext tasks to help the model extract more discriminative features.</p>
<p>Another feasible solution for AD is to use generative models, like AutoEncoder (AE) [<xref ref-type="bibr" rid="ref-12">12</xref>], and Generative Adversarial Network (GAN) [<xref ref-type="bibr" rid="ref-13">13</xref>,<xref ref-type="bibr" rid="ref-14">14</xref>]. These methods do not need pre-trained models and extra training data and detect anomalies by the reconstruction errors of the test images. However, these approaches treat each image as a whole, omitting the learning of local information, and the generalization characteristics of AE may reconstruct the abnormal inputs well, causing the anomaly detection task to fail. Hence, this kind of approach is not widely used.</p>
<p>Deep learning methods have been introduced for AD, and a property of neural network is that representations learned on vast datasets can be transferred to data-poor tasks, which is very convenient for industrial anomaly detection. Recent improvements thus leverage the pre-trained deep CNNs (e.g., ResNet18) to extract general features of normal images and build the distribution for anomaly detection.</p>
<p>We propose an unsupervised anomaly detection method that only uses anomaly-free images in training time, making our method very attractive for industrial anomaly detection. In addition, we extract deep features from a pre-trained network and model the distribution via MVG, then we use Mahalanobis distance [<xref ref-type="bibr" rid="ref-15">15</xref>] to detect defects. To alleviate the bias of the pre-trained network towards the task of natural image classification, we adopt mid-level feature representation. We also reduce redundancy in the extracted features to get critical features for AD and shortening running time. Finally, we evaluate the proposed method on the challenging MVTec anomaly detection dataset [<xref ref-type="bibr" rid="ref-3">3</xref>] and achieve an image-level anomaly detection AUROC score of 98.6&#x0025;, a pixel-level anomaly detection AUROC score of 96.6&#x0025;.</p>
</sec>
<sec id="s2"><label>2</label><title>Related Work</title>
<p>In the current research literature, the existing methods of anomaly detection can be roughly divided into three categories: the reconstruction-based method, the classification-based method, and the distribution-based method.</p>
<sec id="s2_1"><label>2.1</label><title>Classification-based Methods</title>
<p>Classification-based anomaly detection methods aim to find a separating manifold between normal data and the rest of the input space. One paradigm is one-class Support Vector Machine (oc-SVM) [<xref ref-type="bibr" rid="ref-7">7</xref>], it trains a classifier to perform this separation. One of its most successful variants is support vector data description (SVDD) [<xref ref-type="bibr" rid="ref-16">16</xref>], it is a long-standing algorithm for AD tasks and it finds the minimal sphere which contains at least a given fraction of the data. DeepSVDD [<xref ref-type="bibr" rid="ref-9">9</xref>] trains a neural network by minimizing the volume of a hypersphere that encloses the network representations of the data. However, DeepSVDD requires the training of a neural network, the feature center needs to be designated by hand in the feature space, and model degradation occurs easily. PatchSVDD [<xref ref-type="bibr" rid="ref-10">10</xref>] extends SVDD to a patch-based method using self-supervised learning [<xref ref-type="bibr" rid="ref-17">17</xref>], it achieves more accurate anomaly localization performance, but the assumption that adjacent patch features are aggregated during training is not reasonable.</p>
</sec>
<sec id="s2_2"><label>2.2</label><title>Reconstruction-based Methods</title>
<p>The most common anomaly detection methods are reconstruction-based, and this method is based on the assumption that each normal sample can be reconstructed accurately, and the reconstruction of the abnormal image will have a great reconstruction loss. A typical reconstruction-based method is based on AutoEncoders (AE) [<xref ref-type="bibr" rid="ref-18">18</xref>,<xref ref-type="bibr" rid="ref-19">19</xref>,<xref ref-type="bibr" rid="ref-20">20</xref>]. Reference [<xref ref-type="bibr" rid="ref-21">21</xref>] designs a deep yet efficient convolutional autoencoder and detect anomalous regions within images via feature reconstruction. Deep generative models based on generative adversarial network (GAN) [<xref ref-type="bibr" rid="ref-22">22</xref>] also can be used in this way. Furthermore, GAN-based methods have more appropriate anomaly score metrics, such as the output of the discriminator [<xref ref-type="bibr" rid="ref-23">23</xref>] and the latent space distance [<xref ref-type="bibr" rid="ref-24">24</xref>]. In order to improve the reconstruction quality of the image, [<xref ref-type="bibr" rid="ref-25">25</xref>] proposes to construct GAN ensemble for anomaly detection, as the GAN ensemble often outperforms a single GAN. However, these methods treat the image as a whole, it may be difficult for the generator to reconstruct images, leading to poor results in anomaly detection.</p>
</sec>
<sec id="s2_3"><label>2.3</label><title>Distribution-based Methods</title>
<p>Distribution-based methods build a density estimation model for the distribution of normal data. Kernel density estimation [<xref ref-type="bibr" rid="ref-26">26</xref>], Gaussian, and nearest neighbors are all be seen as distributional. Recently, a pre-trained network is used to extract deep features, and the reuse of pre-trained features has also been widely used in anomaly detection [<xref ref-type="bibr" rid="ref-2">2</xref>,<xref ref-type="bibr" rid="ref-27">27</xref>]. These methods achieve great performance in detecting anomalous images but suffer from a critical drawback: real image data rarely follows simple parametric distributional assumptions. Student-teacher knowledge distillation [<xref ref-type="bibr" rid="ref-28">28</xref>] or normalizing flows [<xref ref-type="bibr" rid="ref-29">29</xref>] are used to learn bijective transformations between data distributions. Since flow-based methods have no dimensional reduction, the computation cost is significant.</p>
</sec>
</sec>
<sec id="s3"><label>3</label><title>Method</title>
<p>The proposed method is based on such assumptions that discriminative features do not necessarily vary enormously within the normal data in the anomaly detection task. The pipeline of the proposed method for unsupervised anomaly detection is depicted in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>.</p>
<fig id="fig-1"><label>Figure 1</label><caption><title>Overview of the proposed method</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="JNM_32447-fig-1.png"/></fig>
<p>To coincide with the existing literature, we denote the pre-trained extractor by <inline-formula id="ieqn-1"><mml:math id="mml-ieqn-1"><mml:mi>&#x03D5;</mml:mi></mml:math></inline-formula> and define <inline-formula id="ieqn-2"><mml:math id="mml-ieqn-2"><mml:msub><mml:mrow><mml:mtext>X</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>N</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> to denote the set of normal images at training time. Accordingly, <inline-formula id="ieqn-3"><mml:math id="mml-ieqn-3"><mml:msub><mml:mrow><mml:mtext>X</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>T</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> denotes the set of test images, containing both normal and abnormal samples <inline-formula id="ieqn-4"><mml:math id="mml-ieqn-4"><mml:mo stretchy="false">(</mml:mo><mml:mi mathvariant="normal">&#x2200;</mml:mi><mml:msub><mml:mrow><mml:mtext>x</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>t</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>&#x2208;</mml:mo><mml:msub><mml:mrow><mml:mtext>X</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>T</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>:</mml:mo></mml:mrow><mml:msub><mml:mrow><mml:mtext>y</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mtext>x</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>t</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:msub><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn><mml:mo>}</mml:mo></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>,</mml:mo><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mrow><mml:mtext>y</mml:mtext></mml:mrow></mml:math></inline-formula> is the label of the test image.</p>
<sec id="s3_1"><label>3.1</label><title>Feature Extration</title>
<p>The network pre-trained on a large-scale dataset ensures the extraction of universal features. Hence, to avoid ponderous neural network optimization, we adopt the pre-trained EfficientNet [<xref ref-type="bibr" rid="ref-30">30</xref>] as the backbone and use <inline-formula id="ieqn-5"><mml:math id="mml-ieqn-5"><mml:mn>3</mml:mn><mml:mrow><mml:mo>&#x223C;</mml:mo></mml:mrow><mml:mn>7</mml:mn></mml:math></inline-formula> (index from 0) layers to alleviate the bias to image classification. Specifically, we denote its final output of intermediate layer <italic>l</italic> with <inline-formula id="ieqn-6"><mml:math id="mml-ieqn-6"><mml:msub><mml:mi>&#x03D5;</mml:mi><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:mo>:</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi>c</mml:mi><mml:mrow><mml:mi>l</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo>&#x00D7;</mml:mo><mml:mrow><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>l</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo>&#x00D7;</mml:mo><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>l</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:msup><mml:mo stretchy="false">&#x2192;</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi>c</mml:mi><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>&#x00D7;</mml:mo><mml:mrow><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>&#x00D7;</mml:mo><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:msup></mml:math></inline-formula>.</p>
</sec>
<sec id="s3_2"><label>3.2</label><title>Dimensionality Reduction</title>
<p>A huge number of features have been extracted by <inline-formula id="ieqn-7"><mml:math id="mml-ieqn-7"><mml:mi>&#x03D5;</mml:mi></mml:math></inline-formula>, but the features may carry redundant information, hence, finding discriminative features between normal and anomalous images is critical for AD. For a good trade-off between efficiency and the cost of time, we propose applying the opposite operation in the principal component analysis (PCA) by retaining those principal components with the least variance (i.e., those with the smallest eigenvalues). For convenience, we use OPCA to indicate the opposite operation of PCA in the following. The PCA-based dimension reduction method discards dimensions that carry less information and retains the main features that are useful for anomaly detection.</p>
</sec>
<sec id="s3_3"><label>3.3</label><title>Fit the Gaussian Distribution</title>
<p>The probability density function of multivariate Gaussian distribution (MVG) is given by:
<disp-formula id="eqn-1"><label>(1)</label><mml:math id="mml-eqn-1" display="block"><mml:msub><mml:mi>&#x03C6;</mml:mi><mml:mrow><mml:mi>&#x03BC;</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi mathvariant="normal">&#x03A3;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>:=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:msqrt><mml:mo stretchy="false">(</mml:mo><mml:mn>2</mml:mn><mml:mi>&#x03C0;</mml:mi><mml:msup><mml:mo stretchy="false">)</mml:mo><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msup><mml:mrow><mml:mo>|</mml:mo><mml:mo movablelimits="true" form="prefix">det</mml:mo><mml:mi mathvariant="normal">&#x03A3;</mml:mi><mml:mo>|</mml:mo></mml:mrow></mml:msqrt></mml:mfrac><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:mfrac><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mi>&#x03BC;</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mrow><mml:mtext>T</mml:mtext></mml:mrow></mml:mrow></mml:msup><mml:mrow><mml:msup><mml:mi mathvariant="normal">&#x03A3;</mml:mi><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mi>&#x03BC;</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:mo>,</mml:mo></mml:math></disp-formula>with <italic>n</italic> being the number of dimensions. The MVG parameters comprise the mean vector, <inline-formula id="ieqn-8"><mml:math id="mml-ieqn-8"><mml:mi>&#x03BC;</mml:mi><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>D</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>, and the symmetric covariance matrix, <inline-formula id="ieqn-9"><mml:math id="mml-ieqn-9"><mml:mi mathvariant="normal">&#x03A3;</mml:mi><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>D</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>D</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>, and <inline-formula id="ieqn-10"><mml:math id="mml-ieqn-10"><mml:mi mathvariant="normal">&#x03A3;</mml:mi></mml:math></inline-formula> must be positive definite.</p>
<p>We learn the parameters of multivariate Gaussian distribution from the output of different layers of the backbone. Since the sample covariance matrix is only well-conditioned when the number of dimensions <italic>n</italic> is much lower than the number of samples <inline-formula id="ieqn-11"><mml:math id="mml-ieqn-11"><mml:mi>m</mml:mi></mml:math></inline-formula>, we use the empirical mean <inline-formula id="ieqn-12"><mml:math id="mml-ieqn-12"><mml:mi>&#x03BC;</mml:mi></mml:math></inline-formula> and estimate <inline-formula id="ieqn-13"><mml:math id="mml-ieqn-13"><mml:mi mathvariant="normal">&#x03A3;</mml:mi></mml:math></inline-formula> using shrinkage as proposed by Ledoit&#x00A0;et&#x00A0;al.&#x00A0;[<xref ref-type="bibr" rid="ref-31">31</xref>]. We approximate both mean <inline-formula id="ieqn-14"><mml:math id="mml-ieqn-14"><mml:mi>&#x03BC;</mml:mi></mml:math></inline-formula> and covariance matrix <inline-formula id="ieqn-15"><mml:math id="mml-ieqn-15"><mml:mi mathvariant="normal">&#x03A3;</mml:mi></mml:math></inline-formula> empirically from normal data <inline-formula id="ieqn-16"><mml:math id="mml-ieqn-16"><mml:msub><mml:mrow><mml:mi mathvariant="bold">x</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula>, &#x2026;, <inline-formula id="ieqn-17"><mml:math id="mml-ieqn-17"><mml:msub><mml:mrow><mml:mi mathvariant="bold">x</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>D</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> based on the sample covariance:
<disp-formula id="eqn-2"><label>(2)</label><mml:math id="mml-eqn-2" display="block"><mml:mrow><mml:mover><mml:mi mathvariant="normal">&#x03A3;</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mi>m</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:mfrac><mml:msubsup><mml:mrow><mml:mo>&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msubsup><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="bold">x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:mover><mml:mrow><mml:mi mathvariant="bold">x</mml:mi></mml:mrow><mml:mo accent="false">&#x00AF;</mml:mo></mml:mover><mml:mo>)</mml:mo></mml:mrow><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="bold">x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:mover><mml:mrow><mml:mi mathvariant="bold">x</mml:mi></mml:mrow><mml:mo accent="false">&#x00AF;</mml:mo></mml:mover><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mrow><mml:mtext>T</mml:mtext></mml:mrow></mml:mrow></mml:msup><mml:mo>,</mml:mo></mml:math></disp-formula>where <inline-formula id="ieqn-18"><mml:math id="mml-ieqn-18"><mml:mover><mml:mrow><mml:mi mathvariant="bold">x</mml:mi></mml:mrow><mml:mo accent="false">&#x00AF;</mml:mo></mml:mover></mml:math></inline-formula> denotes the empirical mean of the observations and <inline-formula id="ieqn-19"><mml:math id="mml-ieqn-19"><mml:mrow><mml:mover><mml:mi mathvariant="normal">&#x03A3;</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow></mml:math></inline-formula> denotes the empirical covariance matrix.</p>
</sec>
<sec id="s3_4"><label>3.4</label><title>Anomaly Scoring</title>
<p>Under the distribution with mean <inline-formula id="ieqn-20"><mml:math id="mml-ieqn-20"><mml:mi>&#x03BC;</mml:mi></mml:math></inline-formula> and covariance <inline-formula id="ieqn-21"><mml:math id="mml-ieqn-21"><mml:mi mathvariant="normal">&#x03A3;</mml:mi></mml:math></inline-formula>, Mahalanobis distance [<xref ref-type="bibr" rid="ref-15">15</xref>] is used to get a distance measure for a particular point <inline-formula id="ieqn-22"><mml:math id="mml-ieqn-22"><mml:mi>x</mml:mi><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>D</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>, which is defined as:
<disp-formula id="eqn-3"><label>(3)</label><mml:math id="mml-eqn-3" display="block"><mml:mi>M</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msqrt><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mi>&#x03BC;</mml:mi><mml:msup><mml:mo stretchy="false">)</mml:mo><mml:mrow><mml:mrow><mml:mtext>T</mml:mtext></mml:mrow></mml:mrow></mml:msup><mml:msup><mml:mi mathvariant="normal">&#x03A3;</mml:mi><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mi>&#x03BC;</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msqrt><mml:mo>.</mml:mo></mml:math></disp-formula></p>
<p>To make the image-level anomaly score <italic>S</italic> more robust for test images, We measure the distance between the intermediate outputs of each layer of the network by <xref ref-type="disp-formula" rid="eqn-4">Eq. (4)</xref> and then perform a simple summation to combine the anomaly scores of different layers. The scoring function for the test image <inline-formula id="ieqn-23"><mml:math id="mml-ieqn-23"><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is show in <xref ref-type="disp-formula" rid="eqn-4">Eq. (4)</xref>, where L is the total number of layers of <inline-formula id="ieqn-24"><mml:math id="mml-ieqn-24"><mml:mi>&#x03D5;</mml:mi></mml:math></inline-formula>.
<disp-formula id="eqn-4"><label>(4)</label><mml:math id="mml-eqn-4" display="block"><mml:mi>S</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>:=</mml:mo><mml:msubsup><mml:mrow><mml:mo>&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>l</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>L</mml:mi></mml:mrow></mml:msubsup><mml:msub><mml:mi>M</mml:mi><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:mo>.</mml:mo></mml:math></disp-formula></p>
<p>Anomaly localization is also an important criterion to estimate the validity of the method, and it hopes to detect anomalous pixels. Since intermediate features maintain spatial dimensions, we propose to apply Mahalanobis distance [<xref ref-type="bibr" rid="ref-15">15</xref>] to the features of the intermediate layer by <xref ref-type="disp-formula" rid="eqn-3">Eq. (3)</xref>, yielding a matrix <inline-formula id="ieqn-25"><mml:math id="mml-ieqn-25"><mml:msub><mml:mi>S</mml:mi><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> of size <inline-formula id="ieqn-26"><mml:math id="mml-ieqn-26"><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:mo>&#x00D7;</mml:mo><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, and upsample the spatial anomaly scores of each layer using bilinear interpolation and denote their unweighted mean as the final anomaly score per pixel.</p>
</sec>
</sec>
<sec id="s4"><label>4</label><title>Experiments</title>
<sec id="s4_1"><label>4.1</label><title>Experimental Setup</title>
<p><bold>Dataset.</bold> We evaluate the proposed method on the MVTec AD dataset (MVTec). MVTec contains 15 categories of industrial products (10 for objects and 5 for textures) with a total of 5354 images. The MVTec follows the standard protocol where no anomalous images are used in the training stage. Each category has very few training images, which poses a unique challenge for learning deep representations.</p>
<p><bold>Experimental Settings.</bold> The proposed method is implemented by PyTorch 1.2.0 and CUDA 11.3, and all experiments run with NVIDIA A100-PCIE-40GB GPU. All images in the MVTec are resized to a specific resolution (e.g., 380 <inline-formula id="ieqn-27"><mml:math id="mml-ieqn-27"><mml:mo>&#x00D7;</mml:mo></mml:math></inline-formula> 380, 224 <inline-formula id="ieqn-28"><mml:math id="mml-ieqn-28"><mml:mo>&#x00D7;</mml:mo></mml:math></inline-formula> 224) in the proposed method, and anomaly detection is performed on one category at a time. For training, we adopt the EfficientNet-b4 network with <inline-formula id="ieqn-29"><mml:math id="mml-ieqn-29"><mml:mn>3</mml:mn><mml:mrow><mml:mo>&#x223C;</mml:mo></mml:mrow><mml:mn>7</mml:mn></mml:math></inline-formula> layers as the backbone and take the output of the intermediate layer as features. The default variance threshold of the OPCA approach is 0.05. We also use the ResNet-18 network as a supplementary experiment.</p>
<p><bold>Evaluation Metrics.</bold> Image-level anomaly classification and pixel-level anomaly localization performance are measured via Area Under Receiver Operator Characteristics (AUROC). But, as mentioned in, AUROC is biased in favor of large anomalies. Hence, Per-Region-Overlap (PRO) was proposed to evaluate the performance of pixel-level anomaly localization. The higher the PRO score is, the better the localization performance of the anomaly is.</p>
</sec>
<sec id="s4_2"><label>4.2</label><title>Comparison with State-of-the-Art</title>
<p>The quantitative results on image-level anomaly classification across the 15 classes are summarized in <xref ref-type="table" rid="table-1">Table 1</xref>, and we further compare it to the current state-of-the-art performance reported in the existing literature. The best result of each category is highlighted in boldface. We do not reproduce those methods, taking the corresponding values directly from the linked sources. The proposed method significantly outperforms current state-of-the-art with 98.6&#x0025; in anomaly classification. In all 15 categories, the proposed method achieves an AUROC of at minimum 93.1&#x0025;, indicating that the proposed method can effectively deal with different kinds of defects. Furthermore, the results prove that not all the deep features are useful for the anomaly detection task. On the contrary, reducing the number of features can ensure detection efficiency and reduce memory requirements.</p>
<table-wrap id="table-1"><label>Table 1</label><caption><title>Quantitative comparison with different methods on anomaly detection performance (AUROC&#x0025;) on MVTec AD dataset</title></caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">Category</th>
<th align="left">SPADE [<xref ref-type="bibr" rid="ref-2">2</xref>]</th>
<th align="left">U-Student [<xref ref-type="bibr" rid="ref-1">1</xref>]</th>
<th align="left">DifferNet [<xref ref-type="bibr" rid="ref-32">32</xref>]</th>
<th align="left">PaDiM [<xref ref-type="bibr" rid="ref-27">27</xref>]</th>
<th align="left">Ours</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Carpet</td>
<td align="left">-</td>
<td align="left">95.3</td>
<td align="left">92.9</td>
<td align="left">-</td>
<td align="left"><bold>100</bold></td>
</tr>
<tr>
<td align="left">Grid</td>
<td align="left">-</td>
<td align="left">98.7</td>
<td align="left">84.0</td>
<td align="left">-</td>
<td align="left"><bold>99.1</bold></td>
</tr>
<tr>
<td align="left">Leather</td>
<td align="left">-</td>
<td align="left">93.4</td>
<td align="left">97.1</td>
<td align="left">-</td>
<td align="left"><bold>100</bold></td>
</tr>
<tr>
<td align="left">Tile</td>
<td align="left">-</td>
<td align="left">95.8</td>
<td align="left">99.4</td>
<td align="left">-</td>
<td align="left"><bold>99.8</bold></td>
</tr>
<tr>
<td align="left">Wood</td>
<td align="left">-</td>
<td align="left">95.5</td>
<td align="left"><bold>99.8</bold></td>
<td align="left">-</td>
<td align="left">99.3</td>
</tr>
<tr>
<td align="left">Bottle</td>
<td align="left">-</td>
<td align="left">96.7</td>
<td align="left">99.0</td>
<td align="left">-</td>
<td align="left"><bold>100</bold></td>
</tr>
<tr>
<td align="left">Cable</td>
<td align="left">-</td>
<td align="left">82.3</td>
<td align="left">95.9</td>
<td align="left">-</td>
<td align="left"><bold>99.3</bold></td>
</tr>
<tr>
<td align="left">Capsule</td>
<td align="left">-</td>
<td align="left">92.8</td>
<td align="left">86.9</td>
<td align="left">-</td>
<td align="left"><bold>99.0</bold></td>
</tr>
<tr>
<td align="left">Hazelnut</td>
<td align="left">-</td>
<td align="left">91.4</td>
<td align="left">99.3</td>
<td align="left">-</td>
<td align="left"><bold>100</bold></td>
</tr>
<tr>
<td align="left">Metal_nut</td>
<td align="left">-</td>
<td align="left">94.0</td>
<td align="left">96.1</td>
<td align="left">-</td>
<td align="left"><bold>99.9</bold></td>
</tr>
<tr>
<td align="left">Pill</td>
<td align="left">-</td>
<td align="left">86.7</td>
<td align="left">88.8</td>
<td align="left">-</td>
<td align="left"><bold>93.5</bold></td>
</tr>
<tr>
<td align="left">Screw</td>
<td align="left">-</td>
<td align="left">87.4</td>
<td align="left"><bold>96.3</bold></td>
<td align="left">-</td>
<td align="left">93.1</td>
</tr>
<tr>
<td align="left">Toothbrush</td>
<td align="left">-</td>
<td align="left"><bold>98.6</bold></td>
<td align="left"><bold>98.6</bold></td>
<td align="left">-</td>
<td align="left">96.9</td>
</tr>
<tr>
<td align="left">Transistor</td>
<td align="left">-</td>
<td align="left">83.6</td>
<td align="left">91.1</td>
<td align="left">-</td>
<td align="left"><bold>99.3</bold></td>
</tr>
<tr>
<td align="left">Zipper</td>
<td align="left">-</td>
<td align="left">95.8</td>
<td align="left">95.1</td>
<td align="left">-</td>
<td align="left"><bold>99.1</bold></td>
</tr>
<tr>
<td align="left">Average</td>
<td align="left">85.5</td>
<td align="left">92.5</td>
<td align="left">94.9</td>
<td align="left">97.9</td>
<td align="left"><bold>98.6</bold></td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Anomaly localization requires a more fine-grained result that gives the label for each pixel. The performance of anomaly localization is an important criterion to verify the method&#x2019;s validity. We compare the localization performance to current state-of-the-art results in <xref ref-type="table" rid="table-2">Table 2</xref>, and the proposed method outperforms others by at least 0.6p.p in the AUROC.</p>
<table-wrap id="table-2"><label>Table 2</label><caption><title>Comparison of anomaly localization performance. (Pixel-level AUROC&#x0025; and PRO&#x0025;)</title></caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">Method</th>
<th align="left">AE<sub>L2</sub> [<xref ref-type="bibr" rid="ref-3">3</xref>]</th>
<th align="left">P-SVDD [<xref ref-type="bibr" rid="ref-10">10</xref>]</th>
<th align="left">U-Student [<xref ref-type="bibr" rid="ref-1">1</xref>]</th>
<th align="left">CutPaste [<xref ref-type="bibr" rid="ref-11">11</xref>]</th>
<th align="left">Ours</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">AUROC (&#x0025;)</td>
<td align="left">82</td>
<td align="left">95.7</td>
<td align="left">-</td>
<td align="left">96.0</td>
<td align="left"><bold>96.6</bold></td>
</tr>
<tr>
<td align="left">PRO (&#x0025;)</td>
<td align="left">79</td>
<td align="left">-</td>
<td align="left">85.7</td>
<td align="left">-</td>
<td align="left"><bold>87.0</bold></td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The anomaly localization heatmaps of the proposed method on different classes are shown in <xref ref-type="fig" rid="fig-2">Fig. 2</xref>. Remarkably, the proposed method can precisely locate defects in the images. It can be explained that the proposed method selects the features with the lowest variance which is effective in the anomaly detection task.</p>
<fig id="fig-2"><label>Figure 2</label><caption><title>The visualization results on part categories of MVTec AD dataset</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="JNM_32447-fig-2.png"/></fig>
<p>A visualization of the qualitative evaluation is presented in <xref ref-type="fig" rid="fig-3">Fig. 3</xref>, we highlight the anomalies in red. To investigate the robustness of the method, we classified the defect types into two categories: large defects and subtle defects. Compared to the SPADE [<xref ref-type="bibr" rid="ref-2">2</xref>], which also uses the pre-trained network to extract anomaly-free features, the proposed method can locate anomalies precisely. Compared to the PaDiM [<xref ref-type="bibr" rid="ref-27">27</xref>], which is the runner-up in <xref ref-type="table" rid="table-1">Table 1</xref>, our method is still competitive and accurately segments defects of different sizes and types.</p>
<fig id="fig-3"><label>Figure 3</label><caption><title>The visualization results on part categories of MVTec AD dataset</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="JNM_32447-fig-3.png"/></fig>
<p>The proposed method generally surpasses previous methods by a wide margin, yielding 98.6&#x0025; in image-level anomaly detection, 96.6&#x0025; in pixel-level anomaly localization, and an 87.0&#x0025; PRO score. The <italic>Dimensionality Reduction</italic> block can choose critical features to distinguish between anomaly-free images and anomalous images. While choosing discriminative features, we also throw away the noise in the features, and it allows our method not to localize non-anomalous regions.</p>
</sec>
<sec id="s4_3"><label>4.3</label><title>Limitations</title>
<p>In addition, we show some failure detection cases in <xref ref-type="fig" rid="fig-4">Fig. 4</xref>, the anomaly type from top to bottom are: <italic>defective</italic> toothbrush, <italic>hole</italic> on hazelnut and <italic>scratch</italic> on capsule. We provide normal samples as a reference. One limitation is pixel-wise anomaly localization, for instance, defects on the <italic>toothbrush</italic> and <italic>hazelnut</italic>. The proposed method can locate abnormal regions but lack accurate localization of anomalous pixels. Another limitation is that it may miss subtle anomalies, such as <italic>scratches</italic> on the <italic>capsule</italic>.</p>
<fig id="fig-4"><label>Figure 4</label><caption><title>Bad case of false detection type</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="JNM_32447-fig-4.png"/></fig>
</sec>
<sec id="s4_4"><label>4.4</label><title>Running Time</title>
<p>Running time is the other dimension we are interested in. The main purpose of dimensionality reduction is to retain the most discriminative features for anomaly detection. We measure the running time of the proposed method using the NVIDIA A100-PCIE-40 GB GPU with serial implementation. We list the corresponding running times and the performance of detection for each category in the MVTec in <xref ref-type="table" rid="table-3">Table 3</xref>. Please notice that each category has a different number of test images and we record the average running time of each category.</p>
<table-wrap id="table-3"><label>Table 3</label><caption><title>Running time and anomaly segmentation performance of each category on the proposed method</title></caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">Category</th>
<th align="left">Carpet</th>
<th align="left">Grid</th>
<th align="left">Leather</th>
<th align="left">Tile</th>
<th align="left">Wood</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Test images</td>
<td align="left">117</td>
<td align="left">78</td>
<td align="left">124</td>
<td align="left">117</td>
<td align="left">79</td>
</tr>
<tr>
<td align="left">RT (s)</td>
<td align="left">9</td>
<td align="left">6</td>
<td align="left">8</td>
<td align="left">6</td>
<td align="left">6</td>
</tr>
<tr>
<td align="left">AUROC (&#x0025;)</td>
<td align="left">97.7</td>
<td align="left">94.9</td>
<td align="left">97.5</td>
<td align="left">93.4</td>
<td align="left">90.0</td>
</tr>
<tr>
<td align="left">Category</td>
<td align="left">Capsule</td>
<td align="left">Hazelnut</td>
<td align="left">Metal_nut</td>
<td align="left">Pill</td>
<td align="left">Screw</td>
</tr>
<tr>
<td align="left">Test images</td>
<td align="left">132</td>
<td align="left">110</td>
<td align="left">115</td>
<td align="left">167</td>
<td align="left">160</td>
</tr>
<tr>
<td align="left">RT (s)</td>
<td align="left">8</td>
<td align="left">10</td>
<td align="left">7</td>
<td align="left">11</td>
<td align="left">14</td>
</tr>
<tr>
<td align="left">AUROC (&#x0025;)</td>
<td align="left">98.5</td>
<td align="left">97.8</td>
<td align="left">98.2</td>
<td align="left">97.2</td>
<td align="left">99.6</td>
</tr>
<tr>
<td align="left">Category</td>
<td align="left">bottle</td>
<td align="left">cable</td>
<td align="left">toothbrush</td>
<td align="left">transistor</td>
<td align="left">zipper</td>
</tr>
<tr>
<td align="left">Test images</td>
<td align="left">83</td>
<td align="left">150</td>
<td align="left">42</td>
<td align="left">100</td>
<td align="left">151</td>
</tr>
<tr>
<td align="left">RT (s)</td>
<td align="left">4</td>
<td align="left">9</td>
<td align="left">2</td>
<td align="left">6</td>
<td align="left">9</td>
</tr>
<tr>
<td align="left">AUROC (&#x0025;)</td>
<td align="left">97.1</td>
<td align="left">96.5</td>
<td align="left">98.6</td>
<td align="left">97.3</td>
<td align="left">96.4</td>
</tr>
</tbody>
</table>
<table-wrap-foot><fn id="tfn1_1"><p>Abbreviations: RT &#x003D; running time.</p></fn>
</table-wrap-foot>
</table-wrap>
<p>The proposed method uses EfficientNet-b4 pre-trained on ImageNet as a backbone, hence, we can focus more attention on the task of anomaly detection. The effectiveness of the proposed method suggests that it is better to use a pre-trained model than learn a model of normality from scratch using task-specific datasets.</p>
</sec>
<sec id="s4_5"><label>4.5</label><title>Ablation Studies</title>
<p>We perform ablation studies on the MVTec AD dataset to answer the following questions: How much dimensionality reduction affects the performance of the proposed method, and which layer of the backbone provides the most information-rich features for anomaly detection?</p>
<sec id="s4_5_1"><label>4.5.1</label><title>Influence of Dimensionality Reduction</title>
<p>The goal of Dimensionality Reduction is to obtain appropriate features for the anomaly detection task and save computing time. First, we investigated the influence of anomaly detection on reducing partial features and using all features. The results on EfficientNet-b4 network are shown in <xref ref-type="fig" rid="fig-5">Fig. 5</xref>. For instance, OPCA-0.05 means we retain principal components that account for 5&#x0025; of the overall variance. Conversely, PCA-0.95 means we remove principal components that account for 5&#x0025; of the total variance. Note that retaining an account of 5&#x0025; of the variance can improve detection and localization performance.</p>
<fig id="fig-5"><label>Figure 5</label><caption><title>Anomaly detection performance on EfficientNet-b4 network with different variance threshold</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="JNM_32447-fig-5.png"/></fig>
<p>We experimented by using ResNet-18 to generate hierarchical convolution features for images to complete the experimental results. We use different layers of ResNet-18 to extract features using a default setting of <inline-formula id="ieqn-30"><mml:math id="mml-ieqn-30"><mml:mn>0</mml:mn><mml:mrow><mml:mo>&#x223C;</mml:mo></mml:mrow><mml:mn>3</mml:mn></mml:math></inline-formula> layers (index from 0). The experimental results are shown in <xref ref-type="table" rid="table-4">Table 4</xref>, and the best results are shown in boldface. Unlike the deep network, like the EfficientNet, shallow neural networks (non-deep networks) should use all features for the anomaly detection task. It can be explained that the shallow networks often extract simple features of the image, and filtrating partial features will easily lose the richer representation of the image and reduce the detection performance.</p>
<table-wrap id="table-4"><label>Table 4</label><caption><title>Anomaly detection performance under different variance thresholds of PCA on ResNet-18</title></caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">Method</th>
<th align="left">Image-level</th>
<th align="left">Pixel-level</th>
<th align="left">PRO</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">All features</td>
<td align="left"><bold>94.9</bold></td>
<td align="left"><bold>97.0</bold></td>
<td align="left"><bold>89.9</bold></td>
</tr>
<tr>
<td align="left">OPCA-0.05</td>
<td align="left">92.6</td>
<td align="left">96.6</td>
<td align="left">88.7</td>
</tr>
<tr>
<td align="left">OPCA-0.15</td>
<td align="left">92.8</td>
<td align="left">96.8</td>
<td align="left">89.1</td>
</tr>
<tr>
<td align="left">PCA-0.95</td>
<td align="left">94.7</td>
<td align="left">96.7</td>
<td align="left">88.2</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s4_5_2"><label>4.5.2</label><title>Network Hierarchy Selection</title>
<p>More global context can be achieved by higher the network hierarchy, but with the cost of reduced resolution and heavier ImageNet class bias. In <xref ref-type="table" rid="table-5">Table 5</xref>, we show the performance of selecting a single layer of the EfficientNet-b4 network to extract features and using these features to detect and segment anomalies. We noticed that convolution features at different semantic levels could provide diverse and valuable information to detect anomalies. Features from layer five can achieve the best performance on detection, while features from layer four can give the best performance on localization.</p>
<table-wrap id="table-5"><label>Table 5</label><caption><title>The performance of anomaly detection with single layer of EfficientNet-b4 network</title></caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">Layer</th>
<th align="left">3</th>
<th align="left">4</th>
<th align="left">5</th>
<th align="left">6</th>
<th align="left">7</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Anomaly detection</td>
<td align="left">94.5</td>
<td align="left">97.1</td>
<td align="left"><bold>98.4</bold></td>
<td align="left">96.4</td>
<td align="left">95.1</td>
</tr>
<tr>
<td align="left">Anomaly localization</td>
<td align="left">94.9</td>
<td align="left"><bold>97.2</bold></td>
<td align="left">97</td>
<td align="left">94.7</td>
<td align="left">93.7</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
</sec>
<sec id="s5"><label>5</label><title>Conclusion</title>
<p>In this study, we propose a novel framework for the challenging problem of unsupervised anomaly detection on the MVTec AD dataset. It comprehensively demonstrates that the principal components containing little variance in normal data are crucial for the anomaly detection task. Experimental results show that the proposed method can detect and locate anomalies quickly and effectively and have a great performance on the MVTec AD dataset. Furthermore, the proposed method uses the features extracted by the pre-trained CNN. We argue that using pre-trained CNN is a promising research direction in anomaly detection. Subsequent works can fine-tune the pre-trained network to obtain more discriminative and compact features and improve the performance of anomaly detection.</p>
</sec>
</body>
<back>
<ack>
<p>Thank my tutor for guiding my article and the help of my classmates.</p>
</ack>
<fn-group>
<fn fn-type="other"><p><bold>Funding Statement:</bold> The author received no specific funding for this study.</p></fn>
<fn fn-type="conflict"><p><bold>Conflicts of Interest:</bold> The author declare that they have no conflicts of interest to report regarding the present study.</p></fn>
</fn-group>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>P.</given-names> <surname>Bergmann</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Fauser</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Sattlegger</surname></string-name> and <string-name><given-names>C.</given-names> <surname>Steger</surname></string-name></person-group>, &#x201C;<article-title>Uninformed students: Student-teacher anomaly detection with discriminative latent embeddings</article-title>,&#x201D; in <conf-name>2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition</conf-name>, Seattle, WA, USA, pp. <fpage>4182</fpage>&#x2013;<lpage>4191</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>N.</given-names> <surname>Cohen</surname></string-name> and <string-name><given-names>Y.</given-names> <surname>Hoshen</surname></string-name></person-group>, &#x201C;<article-title>Sub-image anomaly detection with deep pyramid correspondences</article-title>,&#x201D; arXiv preprint arXiv:2005.02357, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>P.</given-names> <surname>Bergmann</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Fauser</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Sattlegger</surname></string-name> and <string-name><given-names>C.</given-names> <surname>Steger</surname></string-name></person-group>, &#x201C;<article-title>MVTec AD&#x2014;A comprehensive real-world dataset for unsupervised anomaly detection</article-title>,&#x201D; in <conf-name>2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition</conf-name>, Long Beach, CA, USA, pp. <fpage>9584</fpage>&#x2013;<lpage>9592</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>C.</given-names> <surname>Baur</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Wiestler</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Albarqouni</surname></string-name> and <string-name><given-names>N.</given-names> <surname>Navab</surname></string-name></person-group>, &#x201C;<article-title>Deep autoencoding models for unsupervised anomaly segmentation in brain MR images</article-title>,&#x201D; in <conf-name>Int. MICCAI Brainlesion Workshop</conf-name>, <publisher-name>Granada, Spain, Springer</publisher-name>, pp. <fpage>161</fpage>&#x2013;<lpage>169</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>W.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>W.</given-names> <surname>Luo</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Lian</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Gao</surname></string-name></person-group>, &#x201C;<article-title>Future frame prediction for anomaly detection&#x2013;A new baseline</article-title>,&#x201D; in <conf-name>Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition</conf-name>, Salt Lake City, UT, USA, pp. <fpage>6536</fpage>&#x2013;<lpage>6545</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>H.</given-names> <surname>Park</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Noh</surname></string-name> and <string-name><given-names>B.</given-names> <surname>Ham</surname></string-name></person-group>, &#x201C;<article-title>Learning memory-guided normality for anomaly detection</article-title>,&#x201D; in <conf-name>Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition</conf-name>, Seattle, WA, USA, pp. <fpage>14372</fpage>&#x2013;<lpage>14381</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>B.</given-names> <surname>Sch&#x00F6;lkopf</surname></string-name>, <string-name><given-names>R. C.</given-names> <surname>Williamson</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Smola</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Shawe-Taylor</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Platt</surname></string-name></person-group>, &#x201C;<article-title>Support vector method for novelty detection</article-title>,&#x201D; <source>Advances in Neural Information Processing Systems</source>, vol. <volume>12</volume>, pp. <fpage>582</fpage>&#x2013;<lpage>588</lpage>, <year>1999</year>.</mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>Y. Q.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>X. S.</given-names> <surname>Zhou</surname></string-name> and <string-name><given-names>T. S.</given-names> <surname>Huang</surname></string-name></person-group>, &#x201C;<article-title>One-class SVM for learning in image retrieval</article-title>,&#x201D; in <conf-name>Proc. 2001 Int. Conf. on Image Processing</conf-name>, Thessaloniki, Greece, vol. <volume>1</volume>, pp. <fpage>34</fpage>&#x2013;<lpage>37</lpage>, <year>2001</year>.</mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>L.</given-names> <surname>Ruff</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Vandermeulen</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Goernitz</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Deecke</surname></string-name>, <string-name><given-names>S. A.</given-names> <surname>Siddiqui</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Deep one-class classification</article-title>,&#x201D; in <conf-name>Int. Conf. on Machine Learning</conf-name>, Stockholm, Sweden, pp. <fpage>4393</fpage>&#x2013;<lpage>4402</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Yi</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Yoon</surname></string-name></person-group>, &#x201C;<article-title>Patch svdd: Patch-level svdd for anomaly detection and segmentation</article-title>,&#x201D; in <conf-name>Proc. of 15th Asian Conf. on Computer Vision</conf-name>, Kyoto, Japan, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>C. L.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Sohn</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Yoon</surname></string-name> and <string-name><given-names>T.</given-names> <surname>Pfister</surname></string-name></person-group>, &#x201C;<article-title>Cutpaste: Self supervised learning for anomaly detection and localization</article-title>,&#x201D; in <conf-name>2021 IEEE/CVF Conf. on Computer Vision and Pattern Recognition</conf-name>, Nashville, TN, USA, pp. <fpage>9659</fpage>&#x2013;<lpage>9669</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Sakurada</surname></string-name> and <string-name><given-names>T.</given-names> <surname>Yairi</surname></string-name></person-group>, &#x201C;<article-title>Anomaly detection using autoencoders with nonlinear dimensionality reduction</article-title>,&#x201D; in <conf-name>The MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis</conf-name>, New York, United States, pp. <fpage>4</fpage>&#x2013;<lpage>11</lpage>, <year>2014</year>.</mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Akcay</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Atapour-Abarghouei</surname></string-name> and <string-name><given-names>T. P.</given-names> <surname>Breckon</surname></string-name></person-group>, &#x201C;<article-title>GANomaly: Semi-supervised anomaly detection via adversarial training</article-title>,&#x201D; in <conf-name>2018 14th Asian Conf. on Computer Vision</conf-name>, <publisher-name>Perth, Australia, Springer</publisher-name>, pp. <fpage>622</fpage>&#x2013;<lpage>637</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Pidhorskyi</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Almohsen</surname></string-name> and <string-name><given-names>G.</given-names> <surname>Doretto</surname></string-name></person-group>, &#x201C;<article-title>Generative probabilistic novelty detection with adversarial autoencoders</article-title>,&#x201D; in <source>Advances in Neural Information Processing Systems</source>, Cambridge, Massachusetts (USA), vol. <volume>31</volume>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>P. C.</given-names> <surname>Mahalanobis</surname></string-name></person-group>, &#x201C;<article-title>On the generalised distance in statistics</article-title>,&#x201D; <source>National Institute of Science of India</source>, vol. <volume>2</volume>, pp. <fpage>49</fpage>&#x2013;<lpage>55</lpage>, <year>1936</year>.</mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>D. M.</given-names> <surname>Tax</surname></string-name> and <string-name><given-names>R. P.</given-names> <surname>Duin</surname></string-name></person-group>, &#x201C;<article-title>Support vector data description</article-title>,&#x201D; <source>Machine Learning</source>, vol. <volume>54</volume>, no. <issue>1</issue>, pp. <fpage>45</fpage>&#x2013;<lpage>66</lpage>, <year>2004</year>.</mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Tack</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Mo</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Jeong</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Shin</surname></string-name></person-group>, &#x201C;<article-title>CSI: Novelty detection via contrastive learning on distributionally shifted instances</article-title>,&#x201D; in <source>Advances in Neural Information Processing Systems</source>, Cambridge, Massachusetts (USA), vol. <volume>33</volume>, pp. <fpage>11839</fpage>&#x2013;<lpage>11852</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>D. P.</given-names> <surname>Kingma</surname></string-name> and <string-name><given-names>M.</given-names> <surname>Welling</surname></string-name></person-group>, &#x201C;<article-title>Auto-encoding variational Bayes</article-title>,&#x201D; <source>stat</source><italic>,</italic> vol. <volume>1050</volume>, pp. 1, <year>2014</year>.</mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>D.</given-names> <surname>Gong</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>V.</given-names> <surname>Le</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Saha</surname></string-name>, <string-name><given-names>M. R.</given-names> <surname>Mansour</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection</article-title>,&#x201D; in <conf-name>Proc. of the IEEE/CVF Int. Conf. on Computer Vision</conf-name>, Seoul, Korea (South), pp. <fpage>1705</fpage>&#x2013;<lpage>1714</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>T. W.</given-names> <surname>Tang</surname></string-name>, <string-name><given-names>W. H.</given-names> <surname>Kuo</surname></string-name>, <string-name><given-names>J. H.</given-names> <surname>Lan</surname></string-name>, <string-name><given-names>C. F.</given-names> <surname>Ding</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Hsu</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Anomaly detection neural network with dual auto-encoders GAN and its industrial inspection applications</article-title>,&#x201D; <source>Sensors</source>, vol. <volume>20</volume>, no. <issue>12</issue>, pp. 3336, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Y.</given-names> <surname>Shi</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Yang</surname></string-name>, and <string-name><given-names>Z. Q.</given-names> <surname>Qi</surname></string-name></person-group>, &#x201C;<article-title>Unsupervised anomaly segemntation via deep feature reconstruction</article-title>,&#x201D; <source>Neurocomputing</source>, vol. <volume>424</volume>, pp. <fpage>9</fpage>&#x2013;<lpage>22</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>P.</given-names> <surname>Perera</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Nallapati</surname></string-name> and <string-name><given-names>B.</given-names> <surname>Xiang</surname></string-name></person-group>, &#x201C;<article-title>OCGAN: One-class novelty detection using GANs with constrained latent representations</article-title>,&#x201D; in <conf-name>2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition</conf-name>, Long Beach, CA, USA, pp. <fpage>2893</fpage>&#x2013;<lpage>2901</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-23"><label>[23]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Sabokrou</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Khalooei</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Fathy</surname></string-name> and <string-name><given-names>E.</given-names> <surname>Adeli</surname></string-name></person-group>, &#x201C;<article-title>Adversarially learned one-class classifier for novelty detection</article-title>,&#x201D; in <conf-name>2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition</conf-name>, Salt Lake City, UT, USA, pp. <fpage>3379</fpage>&#x2013;<lpage>3388</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-24"><label>[24]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>D.</given-names> <surname>Abati</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Porrello</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Calderara</surname></string-name> and <string-name><given-names>R.</given-names> <surname>Cucchiara</surname></string-name></person-group>, &#x201C;<article-title>Latent space autoregression for novelty detection</article-title>,&#x201D; in <conf-name>2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition</conf-name>, Long Beach, CA, USA, pp. <fpage>481</fpage>&#x2013;<lpage>490</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-25"><label>[25]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>X.</given-names> <surname>Han</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Chen</surname></string-name> and <string-name><given-names>L. P.</given-names> <surname>Liu</surname></string-name></person-group>, &#x201C;<article-title>GAN ensemble for anomaly detection</article-title>,&#x201D; arXiv preprint arXiv: 2012.07988, vol. <volume>7</volume>, no. <issue>8</issue>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-26"><label>[26]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>L. J.</given-names> <surname>Latecki</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Lazarevic</surname></string-name> and <string-name><given-names>D.</given-names> <surname>Pokra-jac</surname></string-name></person-group>, &#x201C;<article-title>Outlier detection with kernel density functions</article-title>,&#x201D; in <conf-name>Int. Workshop on Machine Learning and Data Mining in Pattern Recognition</conf-name>, <publisher-name>Berlin, Heidelberg, Springer</publisher-name>, pp. <fpage>61</fpage>&#x2013;<lpage>75</lpage>, <year>2007</year>.</mixed-citation></ref>
<ref id="ref-27"><label>[27]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>T.</given-names> <surname>Defard</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Setkov</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Loesch</surname></string-name> and <string-name><given-names>R.</given-names> <surname>Audigier</surname></string-name></person-group>, &#x201C;<article-title>Padim: A patch distribution modeling framework for anomaly detection and segemntation</article-title>,&#x201D; in <conf-name>Int. Conf. on Pattern Recognition</conf-name>, <publisher-name>Italy Milan, Springer</publisher-name>, pp. <fpage>475</fpage>&#x2013;<lpage>489</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-28"><label>[28]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Salehi</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Sadjadi</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Baselizadeh</surname></string-name>, <string-name><given-names>M. H.</given-names> <surname>Rohban</surname></string-name> and <string-name><given-names>H. R.</given-names> <surname>Rabiee</surname></string-name></person-group>, &#x201C;<article-title>Multiresolution knowledge distillation for anomaly detection</article-title>,&#x201D; in <conf-name>2021 IEEE/CVF Conf. on Computer Vision and Pattern Recognition</conf-name>, Nashville, TN, USA, pp. <fpage>14897</fpage>&#x2013;<lpage>14907</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-29"><label>[29]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>D. P.</given-names> <surname>Kingma</surname></string-name> and <string-name><given-names>P.</given-names> <surname>Dhariwal</surname></string-name></person-group>, &#x201C;<article-title>Glow: Generative flow with invertible 1 &#x00D7; 1 convolutions</article-title>,&#x201D; in <source>Advances in Neural Information Processing Systems</source>, Cambridge, Massachusetts (USA), vol. <volume>31</volume>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-30"><label>[30]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Tan</surname></string-name> and <string-name><given-names>Q.</given-names> <surname>Le</surname></string-name></person-group>, &#x201C;<article-title>Efficientnet: Rethinking model scaling for convolutional neural networks</article-title>,&#x201D; in <conf-name>Int. Conference on Machine Learning. PMLR</conf-name>, pp. <fpage>6105</fpage>&#x2013;<lpage>6114</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-31"><label>[31]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>O.</given-names> <surname>Ledoit</surname></string-name> and <string-name><given-names>M.</given-names> <surname>Wolf</surname></string-name></person-group>, &#x201C;<article-title>A well-conditioned estimator for large dimensional covariance matrices</article-title>, &#x201D; <source>Journal of Multivariate Analysis</source>, vol. <volume>88</volume>, no. <issue>2</issue>, pp. <fpage>365</fpage>&#x2013;<lpage>411</lpage>, <year>2004</year>.</mixed-citation></ref>
<ref id="ref-32"><label>[32]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Rudolph</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Wandt</surname></string-name> and <string-name><given-names>B.</given-names> <surname>Rosenhahn</surname></string-name></person-group>, &#x201C;<article-title>Same same but DifferNet: Semi-supervised defect detection with normalizing flows</article-title>,&#x201D; in <conf-name>2021 IEEE Winter Conf. on Applications of Computer Vision (WACV)</conf-name>, Waikoloa, HI, USA, pp. <fpage>1906</fpage>&#x2013;<lpage>1915</lpage>, <year>2021</year>.</mixed-citation></ref>
</ref-list>
</back>
</article>









