Visualization for Explanation of Deep Learning-Based Fault Diagnosis Model Using Class Activation Map

Permanent magnet synchronous motor (PMSM) is widely used in various production processes because of its high efficiency, fast reaction time, and high power density. With the continuous promotion of new energy vehicles, timely detection of PMSM faults can significantly reduce the accident rate of new energy vehicles, further enhance consumers’ trust in their safety, and thus promote their popularity. Existing fault diagnosis methods based on deep learning can only distinguish different PMSM faults and cannot interpret and analyze them. Convolutional neural networks (CNN) show remarkable accuracy in image data analysis. However, due to the “black box” problem in deep learning models, the diagnostic results regarding providing accurate information to the user are uncertain. This paper proposes a motor fault diagnosis method based on improved deep residual network (ResNet) and gradient-weighted class activation mapping (Grad-CAM) to analyze demagnetization and eccentricity faults of permanent magnet synchronous motors, and the uncertainty limitation of fault diagnosis based on the convolutional neural network is overcome by the visual interpretation method. The improved ResNet is formed by using ResNet9 as the backbone network, replacing the last convolution layer with a atrous spatial pyramid pooling (ASPP), and adding a multi-scale feature fusion module and attention channel mechanism (CAM). The proposed model not only retains the effective extraction of image features by ResNet9 but also enhances the sensitivity field of the network through the hollow convolution pyramid and realizes the feature fusion of the web on different scales through the multi-scale feature fusion module (MSFFM), further improving the diagnostic accuracy of the network on different types of fault features. The diagnostic effect of the network is verified on the self-made data set, which mainly includes five states: normal (He), 25% demagnetization (De25), 50% demagnetization (De50), 10% static eccentricity (Se10), and 20% static eccentricity (Se20). The number of pictures in the training set is 6000, and the number in the test set is 1500. The average diagnostic accuracy of the improved ResNet on this dataset is 99.00%, which is 1.04%, 8.89%, 4.58%, and 7.22% higher than that of the multi-column convolutional neural network (MCNN), Bi-directional long short-term memory (Bi-LSTM), deep belief network (DBN), and recurrent neural network (RNN) models, respectively. Finally, gradient activation heat maps were used to globally average pool the final output feature map of the network to obtain feature weights. They were superimposed with the original image to get gradient activation heat maps of different grayscale images. The warmer the tone of the heat map, the greater the impact on the network diagnosis results, and then the demagnetization and eccentricity fault characteristics of the permanent magnet synchronous motor were determined-visual characterization of quantitative analysis.


Introduction
Due to its wide range of constant power speed, better dynamic performance, easy maintenance, and other advantages [1], permanent magnet synchronous motor has been widely used in social production because of its extensive speed range, excellent dynamic performance, and relatively simple structure [2].Therefore, it is of great social and economic significance to analyze and diagnose the faults of PMSM.
To date, the diagnosis methods of motor faults are mainly based on three aspects: model, signal, and data-driven method [3].The model-based method usually requires establishing a mathematical model and using equations to describe the troubled motor's rigid structure accurately.Modeling methods include methods based on classical state estimation or process parameter estimation [4], finite element method, etc. [5,6].This method conducts fault analysis based on the model, which has the advantage of going deep into the essence of motor operation but has the disadvantage of relying on an accurate mathematical model of the motor.The second approach is signal-based and does not rely on precise mathematical models.At present, there are several methods to diagnose different types of motor faults by signal: (1) Spectrum analysis; (2) Vector decomposition method [7]; (3) Time-frequency analysis method [8], etc.But these methods require corresponding expertise.The method of motor fault diagnosis based on data drive originates from the extensive application and progress of massive AI technologies in different fields [9], such as aerospace [10], industrial production [11] and construction engineering [12], and so on.Unlike the signal-based fault diagnosis method, a neural network does not need the corresponding expert knowledge.It only needs to import a large amount of data to train the neural network.After its related parameters are fixed, the fault diagnosis of different types of motor faults can be carried out.
Various deep and machine-learning algorithms have been widely used in mechanical fault diagnoses [13].Literature [14] and literature [15] proposed dislocated time series convolutional neural network (DTS-CNN) and hierarchical convolutional neural networks (TICNN) to detect motor bearing faults by processing motor vibration signals.In literature [16], the stator current of PMSM is transformed by fast fourier transform (FFT), and its amplitude is imported into one-dimensional CNN for training to diagnose the demagnetization fault and bearing fault of PMSM.Tamilselvan and Wang proposed a fault diagnosis method of DNN based on a constrained Boltzmann machine is offered.Still, this method requires many sensors to collect related signals.Each layer is treated as a deep network structure in this method and trained.Although this method has an excellent diagnostic effect on machine faults, it has a lot of background noise which is difficult to eliminate.
Literature [17] proposed an uncrewed aerial vehicle (UAV) image target detection based on the improved you only look once (YOLO) algorithm.From the perspective of model efficiency, it effectively solved the real-time target detection problem of multi-scale targets by using target box dimension clustering, pre-training network classification, multi-scale detection training, and changing the screening rules of candidate boxes.Literature [18] proposed an automatic weed detection system based on UAV images based on CNN.The overall accuracy of the convolutional neural networklearning vector quantization (CNN-LVQ) model developed for weed detection was significantly improved through strict hyperparameter adjustment.Literature [19] optimized the climate depth long short-term memory (CDLSTM) model for predicting temperature and rainfall in Himalayan states.It uses Facebook's FB-Prophet model to predict and evaluate the performance of the developed CDLSTM model.It has higher accuracy and a lower error rate than the previous algorithm.Literature [20] proposed a brain tumor classification model (DCNNBT) based on a new deep convolutional neural network.By scaling the image resolution, depth of layers, and width of channels and strictly optimizing the hyperparameters, the model achieved a classification accuracy of up to 99.18%, which was significantly higher than other studies based on the same database.Literature [21] proposed the application of a five-layer CNN to large-scale plant classification in the natural environment.Plant identification based on experimental data has high accuracy.The model achieved the highest recognition rate of 96% on the NU108 dataset, and the accuracy of NU101 drone images reached 97.8%.However, most fault diagnosis methods based on deep learning are black boxes.It is impossible to know which image features have been classified based on the neural network, and the mapping established by the neural network needs descriptive analysis.Therefore, a complete CNN and Grad-CAM method for motor fault diagnosis is proposed in this paper, and demagnetizing faults and eccentricity faults of PMSMs are diagnosed.
The major innovations of the article are as follows: 1. Multi-scale feature fusion module, attention mechanism, and atrous convolution pyramid are added to the traditional ResNet network, which enhances the feature extraction capability of the model and improves the fault diagnosis accuracy of the motor.In this paper, a multi-scale feature fusion module, attention mechanism, and atrous convolution pyramid are added to the traditional ResNet network, which enhances the feature extraction capability of the model and improves the fault diagnosis accuracy of the motor.
2. Grad-CAM is introduced to solve the "black box" problem in the fault diagnosis process of the CNN, and the diagnosis based on the features of the image is studied, and the original frequency signal corresponding to the features on the image is determined.
The rest of this paper is arranged as follows: The second section introduces the establishment of symmetrical and asymmetric PMSM models on the finite element software AltairFlux, and cosimulation with MATLAB to obtain the current, torque and speed data of different faulty motors during operation.The third section converts the time-domain signals of stator currents of other defective motors into frequency-domain signals and converts the frequency-domain signals into grayscale images to make data sets and convolutional neural network architecture based on attention channel, multi-scale feature fusion module, and ASPP was constructed, and training parameters were configured.In the fourth section, the produced data set is imported into the built CNN for training and compared with other algorithms.Then, Grad-cam was used to analyze the gray image of the demagnetization fault and eccentricity fault and find out the fault characteristics of the demagnetization fault and eccentricity fault.

Build Different Types of Fault Models of PMSM
AFinite element modeling software-Altair Flux is often used for modeling and simulation in the electromagnetic field [22].Taking it to create a model can be divided into four steps: establishing the motor geometric model, setting the physical properties, setting the model-solving parameters and solving them, and processing the results.At every step of the process, Flux will save relevant files.As long as the "post-processing" is performed to visualize or save the results, the solution results of each parameter can be obtained.
Motor faults can be divided into electrical defects, mechanical faults, magnetic faults, and so on.In this article, the static eccentric responsibility (Se10, Se20) and the demagnetization fault (De25, De50) are taken as examples, and the CNN is built, and the feature visualization technology is used to diagnose these two kinds of faults.
Based on the geometric parameters of PMSM in the Flux official tutorial, this paper redesigned the electrical parts of the motor and the structure of the permanent magnet and established an 8-pole, 48-slot PMSM model, whose structural parameters are shown in Table 1.

Establishment of PMSM Model for Demagnetization Fault and Eccentricity Fault
When the motor appears to have an eccentric weakness, its topology will become asymmetric.When the strange fault model of the engine is built in Flux, the intact geometric construction of the PMSM needs to be contrived.In this paper, 10% and 20% static eccentricity fault models of permanent magnet synchronous motors are built in Flux.As shown in Fig. 2a, the coordinates Or and Os of the stator center of PMSM and the rotor axis are set as (0.06,0) and (0,0), respectively, so that the distance between them is 0.06 mm (uniform air gap length is 0.6 mm).In this way, the motor model with a 10% static eccentricity fault is obtained Fig. 1b shows the setting method of the PMSM model for 20% eccentric fault.That is, the fixed and rotor axes are set as (0.12,0) and (0.0), respectively, so that the distance between the fixed and rotor axes is 0.12 mm.Then in mechanical properties, set the center of rotation of the rotor to (0.0) so that the center of rotation of the rotor coincides with the axis of the rotor.Thus, the PMSM model with 10% and 20% static eccentricity faults is obtained.Due to its high energy density and low price, neodymium boron (NdFeB) is commonly used to make permanent magnets for permanent magnet synchronous motors.However, it also has the disadvantages that the magnetization intensity is greatly affected by working conditions, and the demagnetization is irreversible.The demagnetization failure of PMSM is usually uniform demagnetization of permanent magnets.
In Flux, it can be described by "Remanent flux density" and "Relative permeability" to the type of permanent magnet and the residual magnetic of permanent magnet materials Br, the slope of the magnetic curve is defined.In this paper, three magnetic curves are defined respectively, which are the magnetic curves of ordinary (He), 25% (De25), and 50% (De50) demagnetized materials.An PMSM model with demagnetization faults of different degrees in permanent magnets on opposite pairs is established.
As shown in Fig. 2, yellow, blue, and red lines in three different colors represent normal, 25% demagnetization, and 50% demagnetization magnetic profile curves of permanent magnets.Their remanence Br can be set to 1.2, 0.9, and 0.6 T, respectively, and their permeability can be set to 1.05.Figs.3a and 3b respectively represent the sub-densities of permanent magnets with 25% and 50% demagnetization.It can be seen from the figure that the more profound the demagnetization degree of permanent magnets, the more sparse the magnetic densities.

Co-Simulation and Result Analysis
In this paper, finite element models of different types of PMSM established in Altair Flux were imported into a vector control system in MATLAB for co-simulation.Current strategies used in the vector control system i d = 0 and using the space vector pulse width modulation (SVPWM) motor power supply voltage vector modulation algorithm.Fig. 4 shows the PMSM finite element model in a loop simulation system co-simulated by AltairFlux and MATLAB-Simulink [23].5 respectively shows the stator three-phase current simulation results of 25% demagnetization, 50% demagnetization, and normal PMSM at a rated speed of 4000 rpm and load 0 N. Figs.5a-5c are the simulation results of stator currents of three types of motors, respectively.It can be seen from the figure that demagnetization of the engine will increase the time for the motor to enter the steady state.Fig. 5d shows the speed variation of the three motors.It can be seen that demagnetizing failure will increase the time required for the motor speed to reach a steady state.Fig. 5e shows the electromagnetic torque comparison.In the same way, the greater the demagnetizing intensity, the longer the adjustment time required for the motor to reach a steady state and the smaller the electromagnetic torque.In summary, demagnetizing failure is equivalent to increasing the engine load, thus increasing the adjustment time for the motor to reach a steady state.6 shows the simulation results of typical, 10% eccentric, and 20% strange permanent magnet synchronous motor when the given speed is 4000 rpm, and the load is 0 N.m.Figs.6a-6c are the simulation results of stator currents of three types of motors, respectively, and (d) and (e) are the comparison of three motor speeds and electromagnetic torques.It can be seen that pure static eccentricity fault has no apparent influence on the stator current, rotational speed, and electromagnetic torque of PMSM [24].

Data Set Making and CNN Model Building 3.1 Stator Current FFT Transformation
FFT is a standard signal processing method that can quickly realize signal transformation from a time domain to a frequency domain.Discrete fourier transform (DFT) is the most basic method, digital signal processing of the input signal by frequency f s interval sampling after n times, such as DFT processing of n discrete points get n the plural.Each complex number represents the amplitude and phase information of the frequency component with f s /n as the interval, to obtain the spectrum information of the input signal [25].
For a time-domain discrete signal X(n) with length n, the DFT transformation result is X(k), then: Then X(k) can be expressed as: FFT mainly uses periodicity, Symmetry, and scaling of rotation factor to carry out butterfly iterative calculation of DFT to reduce the computation amount of DFT, but its basic properties are still similar to DFT.
In the PMSM fault model built by AltairFlux and MATLAB, the simulation model collected 8000 data points in the time domain signal of stator current at 0.25 s, and the sampling frequency was 32 kHz.The spectrum diagram of the stator three-phase currents of the standard motor, the 25% and 50% demagnetization fault motor, and the 10% and 20% static eccentricity fault motor obtained through the co-simulation is shown in Fig. 7. Since bilateral spectra are symmetrical, only unilateral spectra can be analyzed.

The Frequency Domain Signal is Converted into a Gray Image
Because high-dimensional features often contain more data features, this paper first upgraded the frequency domain signal of motor stator current into the two-dimensional gray image, then produced a data set and imported it into CNN [26].The scattering of data sample values within the range of gray values can be deemed as a probability problem.Because of the significant number theorem, on the condition the amount of data is large enough, the data will obey the Gaussian distribution.Therefore, the mapping function of data scattering into the gray range is set as a one-dimensional Gaussian distribution function in this paper, as shown in the formula (5).Fig. 8 shows the process of image conversion based on the autocorrelation matrix [27].
where x ij are elements in the ith row and the Jth column of the autocorrelation matrix; μ is the mean value of the original data; σ 2 is the variance of the original data.
Therefore, the pixels in the I-th row and J-th column of the grayscale image generated by the image conversion method based on the autocorrelation matrix reflect the intensity of the I-th data point in the original data, and the generated grayscale map is symmetrical about its diagonal.

Make a Data Set
In this paper, an image conversion method based on an autocorrelation matrix is used to convert the frequency domain signal of motor stator current into a gray-level image.In the spectrum diagram, 500 data points are sampled every 30 data points and dimensionally upgraded.The way of data interception is shown in Fig. 9:   The training of CNN generally requires large data sets, and reading grayscale images in data sets one by one will occupy a lot of computing resources.In the TensorFlow framework, sample images can be made into TFRecord files to compress the data set into binary coding.During training, it can be decoded and then imported into CNN for training.This method will significantly save computing resources.

Multiscale Feature Fusion
CNN was first proposed by French American computer programmer Yang Likun in 1989, inspired by primate visual system neurons.It is a deep feedforward neural network [28], which has a solid ability to extract input data features and is used explicitly for processing input data with network-like structures.However, with the increasing number of CNN layers, the image information contained in FeatureMap gradually decreases in the forward propagation process.As a result, the neural network gradient disappears.ResNet adds a skip join before each convolutional layer, which not only solves the problem of vanishing gradients.It also speeds up the training process of the network and is beneficial to the backpropagation of the slope.
Literature [29] improved the Selective Convolutional Kernel Network (SKNet) through skip connections, which learned the weight of shallow image information through skip links and fused it with the extracted global features to obtain more image elements [30].Although external feature maps contain a lot of semantic information, their resolution could be higher.The semantic information level of deep feature maps is lower but has a higher resolution [31].This problem can be solved by collecting image features of different levels simultaneously.In addition, the attention module is widely used in various CNN structures.This paper also introduces an MSFFM (Multi-scale feature fusion module) based on a channel attention module to fuse image features of different levels [32].To solve the semantic gap problem in the process of feature fusion, MSFFM calculates the relevance between pixels of varying feature maps through the number and multiplication operation between matrix elements and uses this relevance as the weight vector of deep feature maps: S ij is used to measure the correlation between the ith channel in shallow feature maps and the jth medium in deep feature maps.N is the number of pixels on the feature map of different P media, and Q is the shallow and deep features maps created by different convolution layers in CNN, respectively, where {P, Q} ∈ R C * N .As shown in Fig. 11, firstly, the generated feature map is imported into the corresponding convolution layer, the number of channels and the size of the feature map are compressed, and the feature map is generated feature maps A and B, {A, B} ∈ R C * H * W . H and W is the size of the feature map [33].Then, shallow feature As and in-depth feature Bs were fused and reshaped into P and Q. Respectively, where N = H * W is the number of elements on different feature plots?Then, multiply the two matrices, transpose, and compute the spatial attention graph with the softmax function S ∈ R N * N .
Then multiply the matrix between Q and the spatial attention graph S to generate the feature graph L ∈ R C * H * W . Finally, bear the elements corresponding to BL and to obtain the final O ∈ R C * H * W .The output is as follows: Among α initialize to 0 and gradually learn to allocate more weights q i , represents the ith pixel in the low-dimensional feature maps, and B j is the jth channel in the high-dimensional feature map.From (8), it can be inferred that each position of the final feature O is the weighted sum of all top-quality functions.Because shallow parts generate the last semantic features, they will contain rich semantic information [34].

Channel Attention Module
Shallow features contain a lot of semantic information.The feature graph of every channel will generate a specific feature map, and every feature map will significantly impact the final output.Through the channel attention module shown in Fig. 12, the consistency of feature mapping at each layer is enhanced by changing the feature weights of each channel [35].CAM (Channel Attention Module) is to reweight each channel based on all elements of each feature map.First, the global average pooling layer is used to compress the feature map size and the number of feature channels [36].Then, the corresponding weight vector is generated by the modified activation function (ReLU) and sigmoid function.Finally, the output feature map is generated by combining the matrix multiplication operation with the input feature map.Integrate the general information into the weight vector to ensure the reliability of feature mapping in turn [37].

Atrous Convolutional Spatial Pyramid Pooling
The Atrous Convolutional Spatial pyramid pooling (ASPP) incorporates Aperture Convolutions into SPP.Atrous convolutional can systematically aggregate image feature information at different scales and levels without losing resolution [38].
The basic principle of atrous convolution is shown in formula (9): In Eq. ( 9), i is the coordinates of each element in the two-dimensional matrix, r is the expansion rate of the atrous convolution, and k is the dimension of the convolution kernel [39].The ordinary convolution can be viewed as an atrous convolution with a rate of expansion of 1.
For standard convolution operations, the sliding step size of the convolution kernel is 1, which can be divided into three situations [40]: That is, when downsampling is performed while convolution is being performed, the dimension of the convolutional feature map will decline; 1. Indicating a convolution with an average step size of 1; 2. As the feature map is pooled, the dimension of the feature map will increase.
The atrous convolution fills in between the blank elements.It ignores the part or leaves the input unchanged, adding the weight of the convolution kernel parameter to 0, thereby expanding the acceptance domain, convolving with a value more excellent than one can achieve the same effect.However, downsampling will be performed simultaneously with the convolution, which will reduce the feature map's size and is unsuitable for use [41].
Assuming that the atrous fraction of the atrous convolution is r and the dimension of the convolution kernel is k, the extent F of the receptive field obtained is: The structure of the atrous convolutional spatial pyramid pooling is shown in Fig. 13.Compared with SPP, an empty convolution is inserted in front of each parallel pooling window to increase the extraction range of image features.The similar atrous convolution layers have four different expansion rates to extract feature information at different levels and scales [42].

Construction of a GCNN Model for Atrous Convolutional Pyramid
Based on the traditional CNN model, multi-scale feature modules and attention channels are added to form an 8-layer convolutional neural network, including six convolutional layers and two fully connected layers.Maxpool is used for all pooling layers.In this paper, the improved ResNet network based on MSFFM-CAM-ASPP is named GCNN.An atrous convolution pyramid is introduced behind the sixth pooling layer to enlarge the field of view and enhance the feature extraction capability of the network.After pooling, the first five pooling layers are connected to the fusion layer by multi-scale feature modules, and the data features of different levels are extracted.Finally, the data is fused with the components extracted by ASPP.The network structure is shown in Fig. 14.

Figure 14: GCNN neural network structure
The four-layer ASPP adopts an average pooling layer.Although the highest pooling layer is standard, it can only output a maximum value and is unsuitable for GCNN networks with added atrous convolutional kernels.At each level of ASPP, feature maps with dimensions of (w, h) (w for width, h for height) are equally divided.For example, if it is divided into four modules, the dimension of every module will be (w/2, h/2); If it were divided into nine modules, the Measurement of every module would be (w/3, h/3); If it were divided into 36 modules, the dimension of every module would be (w/6, h/6).This article divides the four-layer feature maps into 1, 4, 9, and 36 modules, respectively, so that 50 sub-regions can be obtained and 50 different sets of image features can be extracted.These features are fused with the feature maps obtained from the first six convolutional layers through jump connections.
Every layer of ASPP in Fig. 14 has an atrous convolutional kernel, and the dimensions of the convolutional kernel are set to 3 × 3. Table 3 shows the parameters of the four-layer ASPP.Atrous convolutional kernels with high expansion rates will lose feature information, affect network accuracy, and affect the accuracy of the network.Therefore, using atrous convolutional kernels with low expansion rates to segment feature maps is common.

Parameter Configuration of the GCNN Model
GCNN network parameters are the optimal hyperparameters determined by the grid search method.The parameters are set reasonably, and the diagnostic accuracy of different faults is close to the maximum accuracy allowed by the theory.This article adopts a step decay learning rate, and the selected decay learning rate function is: Among them, k 0 is the learning rate at the beginning of GCNN training; a is the attenuation coefficient of GCNN; n indicates how many rounds GCNN has trained so far; m is a constant applied to adjust the variation of GCNN learning rate; n m stands for rounding; k stands for the current learning rate of GCNN.
The initial learning rate is 0.01, the attenuation coefficient a is 0.85, and 30 rounds of training with a m of 5 can result in faster and better training results.The learning rate of GCNN during the exercise is shown in Fig. 15 [43]: Step attendee's learning rate GCNN adopts the cross entropy that can be adopted to estimate the contrast between the theoretical value and the actual value to evaluate its loss function.The smaller the cross entropy value is, the closer the two probability distributions are, and it is described as formula (12) [44]: In formula (12), x stands for the sample, n stands for the total amount of models, y stands for the theoretical value, and it stands for the actual value.
The other model parameters in GCNN, except for ASPP, are shown in Table 3, and the model parameters of ASPP are shown in Table 4, where k is the size of the ordinary convolution kernel, c is the number of characteristic output channels, d is the expansion rate of cavity convolution kernel, p is the filling mode, r is the type of activation function, and b is the offset.

GCNN Training Results and Comparison Experiment
The produced data set was imported into the built GCNN model for training.The training results are shown in Fig. 16.The accuracy of the test set is 99.08%, and the loss function is 0.0035.The diagnostic accuracy of De25, De50, He, Se10, and Se20 motors is 99.80%, 99.60%, 99.70%, 98.10%, and 98.20%, respectively, which has high diagnostic accuracy.In addition, cross-validation was used to repeat the experiment many times to enhance the reliability of training results and avoid the interference of random factors on experimental results.After many experiments, it is found that the average accuracy of the five-fold cross-validation is the highest.The verification results of GCNN are shown in Table 5.The difference between the average and experimental results obtained by random segmentation data is only 0.08%, which can fully prove the reliability of the experimental results.To demonstrate the advantage of the GCNN, this paper conducts comparison experiments with MCNN [27], Bi-LSTM [45], DBN [46], and RNN [47], respectively, and the experimental results are shown in Fig. 17.The optimal hyperparameters of other comparison algorithms are determined by the grid search method, the parameters are set reasonably, and the diagnostic accuracy of different faults is close to the maximum accuracy allowed by the theory.The data set used is the same data set as GCNN, and both are self-made data sets.It can be seen that the accuracy rates of MCNN, Bi-LSTM, DBN, and RNN on the test set are 97.96%,90.11%, 94.42%, and 91.78%, respectively, which are much lower than 99.00% of GCNN; the loss functions of the four comparison algorithms are 0.099, 0.242, 0.168, and 0.231, respectively, which are higher than that of GCNN.The comparison experiments thoroughly verify the superiority of GCNN.

Noise Interference Experiment
In order to verify the anti-interference ability and robustness of the model proposed in this paper against noise, Gaussian noise, black pepper and salt noise, white pepper and salt noise, and black pepper and salt mixed noise with white pepper and salt were added to the self-made data set, respectively.The gray images after adding noise are shown in Figs.18-21.After adding noise, different network fault diagnosis accuracy rates are shown in Table 6, and the parameters of each network for comparison experiment are determined as the optimal parameters by the grid search method.It can be seen from the results of noise interference experiments that the fault diagnosis accuracy of the proposed GCNN network for different types of permanent magnet synchronous motors has not changed much in the face of noise interference such as Gaussian noise, black pepper and salt noise, white pepper and salt noise, black and white pepper, and salt noise, and is always maintained within 0.5%.The ability of anti-noise interference and robustness are obviously better than the other four networks.Different channels of the convolutional neural network convolutional layer can extract other fault features from images, and these fault features are related to PMSM fault types to some extent [48].The correlation between fault characteristics and fault types of different channels is also distinct, so assigning different weights to these fault characteristics is necessary.However, obtaining the weight parameters of output features of various media from the convolutional neural network is problematic, resulting in the need for more interpretability of the mapping relationship established by deep learning [49].
To solve this problem, the Grad-CAM visualization analysis method is adopted in this paper to calculate the weight k of each channel, and the feature map of each channel is superimposed with the weight k to generate the thermal activation map [50].The intensity of pixel color in the thermal activation map corresponds to the correlation degree of features, and the closer the pixel color is to the warm color, The more significant the activation intensity of features, the higher the degree of correlation, which can directly reflect the basis for CNN to judge fault characteristics [51].
The principle formula of Grad-CAM is shown in formulas ( 13) and ( 14): Type of α k said the weight of the K-th channel, A k ij as the K-th channel output characteristic figure, A k in the element value of coordinates (i, j), p and q respectively extraordinary clinic figure the length and width, Z = p * q and L Grad−cam is the Grad-CAM [52].

Method of Determining Fault Characteristics
The PMSM fault diagnosis method based on Grad-CAM is divided into two stages: the selection of the thermal map and the determination of the original data corresponding to the thermal map.The specific implementation steps are described as follows.
Selection of heat map: The time domain signals of PMSM of different fault types obtained by co-simulation have 8000 data points, and the bilateral spectrum diagram generated after FFT transformation also has 8000 data points, which are evenly distributed on the horizontal axis of 32 kHz.Since the bilateral spectrum is axisymmetric, only the half spectrum composed of the first 4000 data points can be analyzed.The gray image based on the autocorrelation matrix takes 500 data points every 30 points to upgrade the dimension, so studying every heat map is unnecessary.Theoretically, all the data points on the single side spectral diagram can be covered by analyzing the Grad-CAM of Nos.1,17,33,49,65,81,97,113, and 116 gray images of demagnetization fault and eccentric fault.
Determine the raw data corresponding to the heat map: Since each gray image is dimensionally upgraded from 500 data points, the gray image is strictly diagonal symmetric, and the pixel points in each row and column reflect the intensity of the corresponding data points in the original data.Therefore, the corresponding pixel points in the unilateral spectrum can be found by analyzing the pixels in the warm color region in the thermal map.Thus, the fault characteristics of demagnetization and eccentricity of PMSM are determined.

Fault Characteristics of Demagnetization and Static Eccentricity of PMSM
Grad-CAM of Nos. 1, 17, 33, 49, 65, 81, 97, 113, 116 gray images of 25% and 50% demagnetization faults were selected for analysis.The Grad-CAM is shown in Figs.22 and 23.Through the analysis of the Grad-CAM thermal map, it can be found that there are 21 fault features in Se10, were 180 Hz, 2, 3.04, 3.84, 4. 16, 4.64, 4.96, 5.44, 5.76, 6.76, 8, 9.8, 10.88, 11.72, 12.24,     In addition, given the defect that the traditional data-driven fault diagnosis method is unable to make interpretative analysis, this paper introduces Grad-CAM for interpretative analysis of different degrees of eccentricity and demagnetization faults, finds out their fault characteristics, and marks them on the spectrum diagram.However, the interpretative analysis method based on Grad-CAM proposed in this paper can only determine the general range of the symptoms and cannot accurately locate the fault characteristics.A plan can be found to locate different fault characteristics in the following research accurately.
The experimental data in this paper are all co-simulated by Flux and MATLAB and do not involve complex scenes with multiple heterogeneous targets.However, our research team is purchasing the faulty motor and the corresponding drive platform and plans to simulate better the operation state of permanent magnet synchronous motor on new energy vehicles in future research.

Conclusions
This paper presents a fault diagnosis method based on improved ResNet9 and Grad-CAM.It is used to diagnose demagnetization and eccentricity faults of permanent magnet synchronous motors and determine their fault frequency.The data set used in this paper was obtained by joint simulation of Flux and MATLAB.The data set contained 7,500 fault images of different types of permanent magnet synchronous motors, the specific parameters shown in Table 2.The overall accuracy of the improved ResNet9 network fault diagnosis results proposed in this paper is 99.00%, which has higher diagnostic accuracy than other traditional deep learning algorithms and has good robustness in the face of different noise interference.In addition, the "black box" problem in neural network fault diagnosis is solved using Grad-CAM.The demagnetization and eccentricity faults of permanent magnet synchronous motors are analyzed, and their fault characteristics are determined.

Figure 17 :
Figure 17: The result of comparison experiments: (a) Accuracy curve; (b) Loss graph

Table 1 :
Geometric parameters of PMSM model

Table 3 :
GCNN network model parameters

Table 4 :
The ASPP model parameters

Table 6 :
Noise interferes with experimental results