Anomaly Detection on Textured Images with Convolutional Neural Network for Quality Control of Micrometric Woven Meshes

Industrial woven meshes are composed of metal material and are often used in construction, industrial and residential industries. The context of this work is defect detection in industrial fabrics in the quality control process. It is often performed with a manual method and could be quite tedious and time-consuming. We propose here a method to automatically detect defects in micrometric steel meshes using a Convolutional Neural Network. The database used for this work comes from the real problem of anomaly detection on micrometric woven meshes. This detection is performed through supervised classi ﬁ cation with Convolutional Neural Network using a VGG19 architecture. To this aim, we propose a pipeline and a strategy to tackle the small amount of data. It includes i) augmenting the database with translation, rotation and symmetry, ii) using pre-trained weights and iii) checking the learning curve behaviour through cross-validation. The proposed method has been evaluated by automatically detecting if metallic fabrics has defects. Obtain results show that, despite the small size of our databases, detection accuracy of 96% was reached.


Introduction
Industrial woven meshes are composed of metal material.It can be found in various industries: architecture (perforated sheets for facade cladding, railings, sunscreens, etc.), aerospace industries (engines soundproofing or air filtering and other applications including silt fence, erosion control, pipeline protection and paving products. Defects can occur during the manufacturing process.The weaving could be altered such as having some bumps or some irregularities.It is crucial to detect them as the final product could lose its overall purpose (e.g., filtering).Companies working in industrial woven meshes have then to use a quality control process.This process is currently manually performed on sampling data.Some methods use visual observation, with a magnifying glass depending on the size.Some methods rather use the sense of touch of the employee in charge of the quality control to detect the bumps.
We propose here a method to detect such defects based on computer vision using non-geometric but textured objects.These images are opposed to natural images representing a shape such as a cat for example.We will focus on woven meshes where images do not have contours but represent a repeating pattern.One approach to identifying defects in these images will be to consider that these images are textures and that classification algorithms will be used for this purpose.The classification process is then with defect vs. without defect classes.We used a VGG19 Convolutional Neural Network (CNN).The industrial conditions of this application do not allow to have a large database.To cope with that data augmentation has been performed for the training and pre-train weights from an existing database have been used.Learning curve behaviour also had to be monitored through cross-validation.To sum up, the main objective of this work is to implement a complete framework that can automatically detect if a manufactured Micrometric Woven Mesh contain defects by using machine learning technique on a picture acquire in control conditions.

State of the Art
Texture represents an important aspect in visual perception, involved in objects characterization and identification.In recent years, this property has been extensively studied and used in computer vision applications, including image classification.Even though textures are easily identified and classified by human beings, they do not have a unique definition that can be used in computer vision applications.Therefore, to capture the wide variety of information lying in textures, different methods have been proposed in the literature.
For instance, textures can be characterized by statistical methods.In this case, the texture is defined by taking into consideration the spatial distribution of the contained grey values, including approaches like the grey level co-occurrence matrices [1] the autocorrelation features [2], the variograms [3,4], the local binary patterns [5], etc.
Moreover, the textural information can be extracted by using stochastic models.First, the textures are analyzed using the multiscale, or the multiresolution representation.At this point, the Fourier transform [6], the Gabor filters [7], or the wavelet transform [8] can be considered.Next, the extracted coefficients are statistically modelled, to obtain the final texture signature.The employed statistical models include the univariate generalized Gaussian distributions [9], or Gamma distributions [10].Even though these univariate models have been successfully used for modelling filtered coefficients, they cannot consider all the information lying in images, like the spatial, or spectral dependencies.Therefore, multivariate models have been proposed, including copula-based distributions [11] spherically invariant random vectors, multivariate generalized Gaussian distributions [12], or Riemannian distributions [13].
In addition, texture can be captured through feature encoding methods which are used to model the local information contained in non-stationary images.In this case, low-level features are extracted to represent the texture information.Then a codebook is generated to identify the features containing the significant information, by using the k-means or the expectation-maximization algorithm.In the end, all the features are projected onto the codebook space.
At this stage, some of the most employed algorithms are the bag of words model [14,15], the vectors of locally aggregated descriptors [16], and the Fisher vectors [17].First, these methods have been proposed for non-parametric features in the Euclidean space, and then they have been extended to features defined on the Riemannian manifold, given descriptors like bag of Riemannian words [18], Riemannian Fisher vectors [19], Riemannian vectors of locally aggregated descriptors [20], extrinsic vector of locally aggregated descriptors [21], etc.
Later, CNN methods have emerged in the literature.A large review has been done in [22].In most methods, the CNN is used to produce a representation of the patches which are then used in more classic dictionary-based classification (Fisher vectors, bag of words, etc.).This combination of approaches leads to complex frameworks.In the cases studied in the literature, the texture is complex and a naive use of simple CNN methods such as VGG [23], developed by the Visual Geometry Group at the University of Oxford, would not work, which explains the complexity of the proposed approaches.In the following, we will show that in the case of very regular textures, VGG approach can achieve suitable performances.Some deep learning approaches have been used in defect detection on fabric material.A large review can be found in [24].For this application, (see, e.g., [25]), the state-of-the-art methods need some large database with a lot of defect samples.In fact, it is extremely difficult to provide such quantity of defect images.The method of [25] tackles this problem by generating synthetic defect images.In our work, the proposed deep learning approach performs competitive results with a small database of real defect samples.

Database
All our database comes from an industrial context of metal woven meshes provided by our collaborating company.They are two datasets and both have been manually labelled by the quality control service of the company.
Dataset A contains images of regular grids where the defect is hardly visible to the eye.When these grids have a defect, they are not perfectly flat but they are curved.During the quality control of these grids, defects are detected by touch because it is difficult to see them with naked eyes.However, when the grids are curved, a slight difference in illumination may occur.
Dataset B contains images of grids also with a regular grid pattern.When there is a defect, there is an irregularity in the grid pattern with grid lines spaced further apart than the others.Our approach will thus consist, for a given type of grid (type A or B), in establishing a classification model (defect vs. without defect) based on machine learning.Fig. 1 illustrates for each grid type an example of an image in each of the classes with and without defects.dataset between a train and a test set and to compute the objective function on the test set while learning our model with the training set.In principle, if the algorithm works normally, during the iterations of the optimization algorithm, the accuracy increases on the training data but may not on the test data (overfitting).To visualize the correct operation of an algorithm, these values are observed during the iterations (see for example Fig. 4).
To check if a very simple CNN model could work on our data, we chose a VGG19 architecture [23], composed of 19 convolution layers.Between these convolution layers, max-pooling layers are used for regularization.As a result of these convolution layers, a dense network is activated by ReLU functions.Then there are three fully connected layers and the final layer is a softmax function.The loss function that has been used is the cross-entropy function.This configuration is among the most common ones (see Fig. 2).Since there are not enough images in the database to learn without pre-trained weights, the network learning was performed by transfer using the weights of the network trained on the ImageNet database [26].Experiments on learning with and without pre-trained weights are presented in the Results and Discussion section.The use of a pre-trained architecture requires images with the same dimensions as the database with which it was pre-trained.So, the images were resized with a 224 × 224 bilinear interpolation.It was later refined on the database containing the grid database.Refining was carried out with Adam optimizer with a learning rate of 10 −5 and batch sizes of 8.The number of epochs varies in our experiment depending on the convergence of the learning curves.

Data Augmentation
The classification was initially carried out without data augmentation then with data augmentation transforming the images by rotations, translation or horizontal and vertical symmetries.The data augmentation was performed using a generator performing rotations and translations between 0 and a maximum value.Rotations from 0 to 2 degrees and from 0 to 15 degrees were tested as well as translations up to 10% of the image.Translations were performed on both horizontal and vertical axes.
In the data augmentation with translations and rotations, the missing pieces of the image resulting from the transformation were replaced by the removed parts to periodically repeat the texture of the image.An illustration of the data augmentation is given in Fig. 3.

Validation Process
To avoid over-fitting, the model was first tested on the data without pre-training to adjust the hyperparameters.The database is small, which means that a small amount of data can be used for testing.For this purpose, the cross-validation method was used with five training/test sets.The K cross-validation method consists of making K combinations of training/test sets allowing to test on a larger set of images while keeping enough images for training.The training is repeated five times (K = 5) on different data and tested on different data.The dataset was divided into five sets having 20 test images and 80 training images in each set.The validation set represents one-third of the training set.It means that we randomly select 80 data on a total of 100 to build up our prediction model and the 20 other data are used for validation.This process is repeated to be sure it is robust.

Results and Discussion
As previously discussed, the learning process is not efficient without using pre-trained weights and learning everything from scratch.The learning curves corresponding to this experiment is presented in Fig. 5.The curves are not stable, the accuracy is not converging toward 100% and the loss function is even not decreasing.The weights cannot be learnt on so few data without initial values closer to a reliable model.
To tackle this issue, we adopt a pre-learned model and a fine-tuning is used on our dataset.With VGG19 pre-trained on imageNet we obtain an average of 15.8 correct images out of 20 representing an average precision of 79% with a variance of 3.36 (compared to the number of correct images) on the dataset B. The weights provided by imageNet are a good initialization as they represent patterns that can also occur on our dataset, like horizontal or vertical features.The accuracy of the training learning curve manages to converge to 100% but not the accuracy of the validation curve (see Fig. 4).The dataset may be probably too easy to classify and is not representative of the errors occurring in most cases of fabrication issues.Moreover, the testing set may be not large enough.
To tackle the issue of the small amount of data, we propose to use classical data augmentation techniques.When such approaches are used, only training and validation data was augmented.The accuracy of the training and the validation learning curve can converge to 100% (see Fig. 6).  1 and 2 showing the number of correct images for each batch.The accuracy is the average accuracy over all these batches.The data augmentation has stabilized the differences in terms of performance between the different batches, as well as the overall performance over the model.Dataset A has a score of 100% but this does not mean that the model is perfect, knowing that for each test set there are only 20 images.If the database had been larger, the 100% accuracy may not have been achieved.An error leads to a 5% drop and it is not excluded that the model would make mistakes.Finally, the results obtained with the VGG19 architecture were compared with those obtained from the commonly used InceptionV3 architecture [27].With this last model, 100 epochs were required to reach an acceptable convergence and the final accuracy on database B was only 90%, lower than the 96% reached with VGG19 (see Fig. 7).
As limitations of our work, two elements have to be noted: firstly, the database is too small to generalize the accuracy on the large scale and secondly only two kinds of grids have been tested.Both limitations will be addressed in future work with methods detailed in the next section.
Moreover, whereas deep learning approaches have show during the last decade a powerful prediction ability, the explanation of the results are still a open research subject (see, e.g., [28,29]) and the state-ofthe-art studies are not able to give a physical interpretation of the results.

Conclusion
A new kind of application was presented in this paper: the direct use of CNN-based classification in the domain of micrometric industrial fabrics for quality control.This special application has the advantage of working with images acquired in a controlled environment on a material that should have a regular pattern.We show that such properties allow using much less learning data than most work of the literature with a good classification precision.We used a pre-trained VGG19 architecture and on-the-fly augmentation.In the end, we can accurately predict if there is a defect on a data from a specific metallic woven mesh with our learning process from only 80 images.
As perspectives, we plan to increase our database of industrial fabric images to check if our method could be scaled while maintaining its robust behaviour.More data is needed along with a new protocol of labelling with bounding boxes to detect localized defects.Our strategy will then be to use weak supervised networks.Another avenue of research we are considering is the generation of synthetic images of woven meshes, given the fact that everything is controlled (lighting, camera position, material, sewing patterns).In future works, we would like to develop unsupervised decision rules that can work without having a database containing samples of defects.

3. 2 Figure 1 :
Figure 1: Example of images from both database A and B with or without defects

Figure 2 :
Figure 2: Architecture of the VGG19 network

Figure 3 :
Figure 3: Example of augmented images generated on a batch of 8 images

Figure 4 :
Figure 4: Dataset B learning curves without augmentation

Figure 5 :
Figure 5: Learning curves on dataset B without using pre-trained weights

Figure 6 :
Figure 6: Accuracy learning curves on dataset B with the train (blue) and validation (orange) data when using data augmentation (a) Accuracy learning curve (b) Loss learning curve

Figure 7 :
Figure 7: Learning curves on dataset A without using pre-trained weights with the train (blue) and validation (orange) data: (a) accuracy vs. epoch and (b) loss vs. epoch

Table 1 :
Training results on grid A

Table 2 :
Training results on grid B