CNN Based Features Extraction and Selection Using EPO Optimizer for Cotton Leaf Diseases Classification

Worldwide cotton is the most profitable cash crop. Each year the production of this crop suffers because of several diseases. At an early stage, computerized methods are used for disease detection that may reduce the loss in the production of cotton. Although several methods are proposed for the detection of cotton diseases, however, still there are limitations because of low-quality images, size, shape, variations in orientation, and complex background. Due to these factors, there is a need for novel methods for features extraction/selection for the accurate cotton disease classification. Therefore in this research, an optimized features fusion-based model is proposed, in which two pre-trained architectures called EfficientNet-b0 and Inception-v3 are utilized to extract features, each model extracts the feature vector of length N × 1000. After that, the extracted features are serially concatenated having a feature vector length N × 2000. The most prominent features are selected using Emperor Penguin Optimizer (EPO) method. The method is evaluated on two publically available datasets, such as Kaggle cotton disease dataset-I, and Kaggle cotton-leaf-infection-II. The EPO method returns the feature vector of length 1 × 755, and 1 × 824 using dataset-I, and dataset-II, respectively. The classification is performed using 5, 7, and 10 folds cross-validation. The Quadratic Discriminant Analysis (QDA) classifier provides an accuracy of 98.9% on 5 fold, 98.96% on 7 fold, and 99.07% on 10 fold using Kaggle cotton disease dataset-I while the Ensemble Subspace K Nearest Neighbor (KNN) provides 99.16% on 5 fold, 98.99% on 7 fold, and 99.27% on 10 fold using Kaggle cotton-leaf-infection dataset-II.


Introduction
Cotton is called "White Gold" and "King of Fibers", among cash crops as it utilized a superior status and it is the main raw substance for the textile enterprise.It is also a considerable agricultural asset all around the globe which provides a beneficial amount to the number of farmers [1].Cotton is the ultimate essential cash crop in Pakistan as the country earns 55% of foreign exchange from it.65% of cotton in Pakistan is grown up in Punjab and the rest is in Sindh [2].Several diseases creates a pessimistic impact on the production of cotton crops in recent decades [3].Cotton gets effect by diseases at any phase [4].All over the globe day by day, agricultural land is getting reduced because of many hazards like lack of water resources, increase in population, and diseases in plant leaves [5,6].The crops get affected by various abnormalities that are available in the environment like water deficiencies, insects, weeds and fungi, etc., hence the early detection of diseases and the health of crop yield are the strategies for better agriculture production [7].The recognition of plant pathology at an early stage is difficult because there is no symptom appears at an initial stage [8,9].Smart farming has several applications in weather forecasting, precise farming, data analysis, collection, etc. regarding these crop diseases detection is also a subset of smart farming.Disease detection through the bare eye is inaccurate and time-taking [10].So accurate, timely, and authentic advice is required at a low cost [11].Therefore it is very important to move towards advanced strategies for the controlling and automatic diagnosis of the disease.There are several pesticides available that are efficient to cure the disease and boost crop cultivation but it is a difficult task to find out the most suitable pesticides for a particular disease, it also requires expert advice that's costly and also time taking.So there is a demand for an accurate, efficient, and affordable machine-supported manner for the awareness of cotton leaf disorders [12,13].Several challenges degrade the classification results such as low-quality images, variation in orientation and complex backgrounds, etc.To overcome such limitations the method is proposed having the following contributions: 1.The relevant feature extraction and optimized feature selection are challenging tasks for accurate classification.Therefore two pre-trained EfficientNet-b0 and Inception-v3 models are selected after extensive experimentation to get features.2. The features are serially concatenated after that, the best/optimum features are selected using the EPO method that is further passed to the classifiers for binary and multiclass classification of cotton diseases.
The article's organization is as: Section 2 confers the literature, the proposed method define in Section 3, the results and discussion are shown in Sections 4, and 5 describe the conclusion.

Related Work
There are many machine-supported methods utilized including detection [14][15][16][17][18], optimization [19] fusion [20][21][22][23][24][25], and Convolutional Neural Network (CNN) [26] to obtain effectiveness.The Mask R (Region) based CNN object detection algorithm is applied by researchers that are focused on instance segmentation and recognizing the diseases as well as pests on the cotton leaves [27].A Meta deep learning-based model is utilized by the researchers to correctly discover various cotton leaf diseases, the methodology proposed to gain generalization as well as good accuracy [28].Machine learning in the agricultural sector performs a major role, the researchers utilized transfer learning with the Mask RCNN object detection algorithm to find the effectiveness while using it in the practical situation to discover cotton leaf diseases [29].To find out the status of cotton plant diseases using real-time samples of plants and leaves, the conception of deep learning is utilized.The model consists of deep learning packages including TensorFlow, Keras, and Googlecolab [30].To boost the recognition process of pests the researchers utilized CNN [31].To recognize cotton leaf disorders and pests CNN is utilized [32].The researcher proposed a framework to recognize leaf diseases using cotton plant leaves.In preprocessing, noise removal and image reconstruction are performed after that threshold-based segmentation, and Gray Level Co-Occurrence Matrix (GLCM) attributes are derived to perform by a Euclidean distance classifier [33].The segmentation of cotton leaf samples was performed using an improved factorization-based model.The number of texture and color attributes drawn out from the segmented image and classified using various machine learning algorithms [34].A metric learning approach-based framework is developed by the researchers for cotton leaf diseases.The S-DenseNet is constructed to perform classification on a small sample [35].The bilateral filtering is used for the removal of noise after the Chan vese approach is combined along the level set method beyond re-initialization [36].The researchers introduced simple linear iterative clustering and roughness measures-based approaches to detect cotton leaf disorder.The GLCM extract features and the Support Vector Machine (SVM) performed classification [37].To discover and classify cotton leaf diseases initial captured RGB samples are changed into another color space, then the segmentation is done by Otsu's global thresholding.The various features are gained with the help and GLCM and multi-SVM [38].The researchers define a mechanism in which, firstly the samples get preprocessed using histogram equalization, segmented using k-means clustering, and at last categorization of disorder is done using a neural network [39].The local information and an active gradient-based automatic segmentation model are used for the segmentation of cotton leaves [40].The neuro-fuzzy-based methodology was introduced by the researchers to find out cotton leaf diseases.The Graph cat procedure is utilized an adaptive fuzzy inference approach for segmentation [41].CNN model is utilized for the detection of cotton leaf disorders [42].The researchers proposed the employment of deep learning and other approaches for the detection process [43][44][45][46].
The limitations that occur in the classification of cotton diseases are mentioned in Table 1.To classify disease types, color features were not enough, so texture features are also extracted.
[35] 2021 Metric learning approach-based framework Better data-gathering methods and low shot learning approach with generalization should find out to improve robustness.[41] 2014 The neuro-fuzzy-based methodology The present study only covers some diseases, the study extends to classify more classes of diseases and disease detection for other plants.
Hence, there are some limitations the dataset is not enough so the need to perform augmentation, to classify the type of disease only color features is not enough hence the need to extract texture features and improvement in classification accuracy, etc.To overcome the existing limitation, in this study improved features extraction, selection, fusion, and optimization model is proposed.
The pre-trained model EffficientNet has various architectures but in this presented research EfficeintNet with B0, architecture is utilized which provides preferable accuracy as compared to the other pre-trained models, similarly, Inception-v3 is utilized because it diminishes the error rate as compared to its foregoing models [47].

Proposed Methodology
In the presented work, cotton images are supplied to the EfficientNet-b0 and Inception-v3 models to get the feature vector.Moreover, the feature vectors, are concatenated/fused serially.After fusion, the most significant features are chosen with the help of EPO, and at last, classification is done using machine learning classifiers.The whole architecture is depicted in Fig. 1.
In this research features vector with the dimension of 1×1000 is retrieved from the fully connected layer of EfficientNet-b0 and 1 × 1000 features are obtained from the prediction layer of Inceptionv3.The features are serially concatenated with the dimension of 1 × 2000.

Fusion of Deep Learning Features
In several machine learning algorithms, data fusion is performed.It is an essential task that combines more than one feature vector into a single vector.In this presented work we have gained a total of two feature vectors from deep learning models (EfficienetNet-b0 and Inception-v3) which are serially concatenated.Eqs. ( 1) and ( 2) describe the initial feature vectors mathematically.
Hence, the single fused vector is presented in Eq. (3).
Here, the fused vector has (1 × 2000) features which is the single fused count of two feature vectors in each dataset.

Feature Selection Using Emperor Penguin Optimizer
After the fusion of the features vectors, the selection of optimal features is done using EPO optimizer as presented in Fig. 2.
EPO [52,53] is the bio-inspired optimization algorithm that copies the huddling action of emperor penguins.Hence, in the proposed approach, the fused feature vector is passed to the emperor penguins algorithm which selects the most optimized and significant features to classify the samples.Firstly it initializes the population → , where (y) = 1, 2, 3, . . ., n.After that, the early variables are chosen like T , S, Max iteration , S(), →  4) and (5).Emperor Penguin positioned itself on the polygon grid shape boundary during the huddle.∅ describe the velocity of wind and α defines the gradient ∅.
Vector ω combined with ∅ for complex potential generation where i is imaginary constant and F is an analytical operation.

Figure 2: Features extraction, fusion, and selection approach of presented work
In the next step, the temperature T is calculated using Eq. ( 6).
where y is current iterations, S is the radius [0 1], T is the time utilized to discover the optimal ideal solution in a given space and Max iteration is a maximum number of iterations.Now the distance that occurs between the penguins is calculated by Eqs.(7) to (11).
N is the movement variable to maintain the gap to avoid collision and its value is set to 2, P grid (Accuracy) shows the accuracy of the polygon grid moreover, Rand () is a random operation.The function S() computed in the equation below: g = 2 and k = 3 are the control parameters for fine searching and their value lay between [2,3] and [1.5,2].after that, the positions of search agents get updated by Eq. (12).
→ Pep (y + 1) defines the next updated location.After the whole process check whether the search engine goes beyond the boundary in the provided area and amend it.Find and update the fitness value of the search engine and make upgradation to the previous optimal solution.The algorithm keeps searching until a stopping criterion is satisfied, if not then calculate temperature profile T again and follow the whole process until getting the optimal solution.In the presented work initially the vector of (1 × 2000) is supplied to the algorithm to get the best and optimal features for both datasets that become helpful in classification.So that the algorithm return (1 × 755) best significant features for the cotton disease dataset and (1 × 824) optimal features for the second cotton leaf infection dataset.The chosen parameters of EPO are exhibited in Table 2.  1 depicts the selected parameters of EPO which are finalized after experimentation that reduced the classification error rate.The convergence plot of the EPO framework is visualized in Fig. 3. Fig. 3 presents the ratio among the total iterations across fitness values, in this experiment after the twenty iterations error rate is consistent.

Classification
To perform classification machine learning classifiers are utilized.The input samples are labeled to perform supervised learning and divided into testing and training phases.On the cotton disease detection dataset, the QDA [54][55][56] is utilized to classify the cotton samples in the relevant class and the Ensemble Subspace KNN [57][58][59] classifier is utilized to classify the cotton-leaf-infection dataset.In this presented work two publically available datasets are utilized.The cotton disease dataset [60] is downloaded from Kaggle.The dataset consists of two classes of diseased and healthy moreover the dataset is augmented using translation and flip techniques.The cotton-leaf-infection dataset [61] is also downloaded from Kaggle.The dataset consists of four classes which are Bacterial-Blight (BB), Curl-Virus (CV), Fusarium-Wilt (HW), and Healthy (H), moreover, all images get resized.The evaluation is conducted on cotton datasets using the system Core i5 gen 6th, using MATLAB.Fig. 4 shows the Confusion matrixes and Receiver Operating Curves (ROC) of both datasets on 5, 7, and 10 folds.The results of the proposed methodology on Kaggle cotton disease dataset-I are taken using 5,7 and 10 folds.The QDA classifier classifies the two classes and provides the overall accuracy of 98.9% on 5 fold, 98.96% on 7 fold, and 99.07% on 10 fold, moreover, the detailed results of each class are presented in Table 3.The proposed method outcomes on Kaggle cotton-leaf-infection-II dataset are also taken using 5, 7, and 10 folds with the help of Ensemble Subspace KNN.The overall accuracy of 99.16%, 98.99%, and 99.27% is gained using 5, 7, and 10 folds, respectively, further detailed results of each class are delivered in Table 4.The classification outcomes are calculated on benchmark datasets regarding as mean and standard deviations presented in Table 5.
The graphical presentation of the proposed approaches outcomes regarding the standard deviation and mean of ROC is depicted in Fig. 5.The performance of the proposed method is compared to the existing approaches to authenticate the model's effectiveness.Table 6 reveals the comparison between the proposed methodology and other techniques.As shown in Table 6, the researchers in [28] show that Custom CNN, ResNet50, and VGG16 obtained an accuracy of 95.37%, 98.32%, and 98.10% and their proposed strategy of meta deep learning achieved an accuracy of 98.53% respectively.Utilizing the transfer learning with Mask RCNN while decreasing the loss value due to increased optimized iterations, achieved an accuracy of 94% [29].The researcher proposed a CNN framework that is encouraged by the AlexNet framework for the sake of the classification of healthy and diseased cotton plants and leaves with an accuracy of 97.98% [30].For cotton disease detection and pest recognition, the researchers utilized CNN deep learning technique is utilized which gives an accuracy of 96.4% [32].The researchers utilized the concept of simple linear iterative clustering and roughness measures.Using GLCM features taken and supervised learning using SVM delivers an accuracy of 94% [37].The CNN architecture is utilized for the detection of cotton leaf disorders providing an accuracy of 97.13% [42].
At last, the proposed work gets the feature vector using two CNN pre-trained models EFFicientNet-b0 and Inception-v3 and serially concatenated.The most important task of significant features selection is performed for better classification using EPO algorithm last supervised learning using QDA on cotton disease dataset is performed on 5, 7, and 10 holds out with the accuracy of 98.9%, 98.96%, and 99.07%respectively and on cotton-leaf-infection Ensemble Subspace KNN perform the accuracy of 99.16%, 98.99%, and 99.27%, respectively.

Conclusion
The classification of cotton leaf disorders is a challenging assessment because of low-quality images, complex backgrounds, and differences in the size, color, and shape of leaves.The detection of disorder in leaves helps the farmers to take precautions to save the crop from heavy loss in the early phase.Therefore this research presents a methodology in which two pre-trained frameworks are utilized for features extraction such as EfficientNet-b0 and Inception-v3.The extracted features are fused by serial concatenation.After that, the optimizer EPO returns the considerable features and removes redundant and irrelevant features.That helps to provide better results.Finally, QDA and Ensemble Subspace KNN classifiers are utilized for classification.The proposed framework achieves an accuracy of 99.27% on the multi-classification of the cotton leaf infection dataset-I and 99.07% on the cotton disease dataset-II using 10-fold cross-validation.
This study is conducted on maximum of four classes, moreover, the study may expand by covering more classes in the future.Furthermore, this methodology will be deployed in the mobile application that provides help in real-time detection.The researchers may conduct a study on remote access to high-resolution satellite image samples to obtain better achievements with optimization.

Figure 1 :
Figure 1: Proposed methodology for cotton disease detection

.
The huddle boundary is determined using Eqs.(

=
indicates the current iteration, → P defines the optimal solution, → Pep indicates the position vector, S()is the social force that moves towards the best optimal solution.N × T + P grid (Accuracy) × Rand ()

Figure 3 :
Figure 3: Convergence plot of EPO model

Figure 4 :
Figure 4: Results of presented work (a) Confusion matrix, (b) ROC curves

Table 1 :
Limitations in existing techniques

Table 2 :
Chosen parameters of EPO

Table 3 :
Results of the presented methodology using the cotton disease dataset

Table 4 :
Results of the proposed methodology using the cotton leaf infection dataset

Table 6 :
Comparison of the presented approach with different existing approaches on various datasets