Improved Soil Quality Prediction Model Using Deep Learning for Smart Agriculture Systems

Soil is the major source of infinite lives on Earth and the quality of soil plays significant role on Agriculture practices all around. Hence, the evaluation of soil quality is very important for determining the amount of nutrients that the soil require for proper yield. In present decade, the application of deep learning models in many fields of research has created greater impact. The increasing soil data availability of soil data there is a greater demand for the remotely avail open source model, leads to the incorporation of deep learning method to predict the soil quality. With that concern, this paper proposes a novel model called Improved Soil Quality Prediction Model using Deep Learning (ISQP-DL). The work considers the chemical, physical and biological factors of soil in particular area to estimate the soil quality. Firstly, pH rating of soil samples has been collected from the soil testing laboratory from which the acidic range has been categorized through soil test and the same data has been taken as input to the Deep Neural Network Regression (DNNR) model. Secondly, soil nutrient data has been given as second input to the DNNR model. By utilizing this data set, the DNNR method is used to evaluate the fertility rate by which the soil quality has been estimated. For training and testing, the model uses Deep Neural Network Regression (DNNR), by utilizing the dataset. The results show that the proposed model is effective for SQP (Soil Quality Prediction Model) with efficient good fitting and generality is enhanced with input features with higher rate of classification accuracy. The results show that the proposed model achieves 96.7% of accuracy rate compared with existing models.


Introduction
The main objective of soil management in the process of agriculture for increasing the yield based on the soil features and quality. The enormous growth of population, global limitations are reducing the soil quality and fertility rate in many places. In cases of modern agriculture, crop health is the important factor to be concerned for increased productivity. Moreover, the considerable growth of crop yield can be achieved by the effective measurement of soil quality and crop health management method. The soil quality can be effectively managed with the efficient resource managements and counteractive measures to supply soil nutrients. The problems are to be detected and addressed in timely manner by the agricultural experts based on analyzing the soil resources and management methods. The soil fertility rate calculation with Artificial Neural Network is done via back propagation neural network model. In this method a feed forwarded architecture with a back propagation weight adjustment algorithm has been used. In KNN(K-Nearest Neighbour) model for soil fertility rate computation, the input dataset were grouped and the calculated factors are limited through model training. From EDL-ASQE Enhanced Deep Learning Model for IoT based Automated Soil Quality Evaluation model, fertility rate of the soil has been identified by pre grouping the soil into high moisture level and low moisture level. Hence initial pre group model is required. Hence a model proposed in this paper uses deep learning methods for solving the problems and enhancing the result accuracy for soil evaluation metrics. The major soil features that are considered in this work are a follows,

Chemical Features:
i) Soil Texture: defines the particle size in soil based on the texture ii) Water Retention Character: measures the water holding capacity of soil after irrigation 2. Physical Features: i) Extractable N, K, P: denotes the nutrients, productivity and environmental quality ii) pH: denotes the threshold rate of biological and chemical soil activities.

Biological Features:
i) Natural Manure: Natural manure that is available in Soil for plant growth In present scenario, the prediction and classification problems are effectively processed with the implementation of machine learning and deep learning methods. The general flow of the soil test model with incorporated deep learning methods is presented in Fig. 1.
The model comprises of phases such as, Pre-processing phase, Training phase and Testing Phase. Additionally, the incorporation of deep learning methods effectively reduces several confrontations that are faced by experts. Previously, Artificial Neural Networks (ANN) is used for determining the soil fertility measures [1]. Some of the soil features such as Electrical Conductivity (EC), water capacity are evaluated based on Partial Least Square Regression is used [2]. Many research works have been done using machine learning techniques for solving the agricultural problems [3]. J48, Apriori and K-Nearest Neighbour (KNN) classifiers are user for classifying the wheat yield in [4]. An unbiased linear prediction technique has been utilized in [5] for determining the organic carbon ratio in Soil. The same soil mineral has been evaluated using boosted regression trees in [6]. Genetic algorithm based feature selection is integrated with random forest for analyzing the rate of organic carbon in [7]. Different ML techniques are incorporated for determining the soil minerals, types, moisture rate [8]. The work presented in [9] classifies the village soil based on their fertility rate using classifiers such as, Random Forest, Support Vector machine, Neural Networks and Bagging. The soil data are classified under the classes such as, low, medium and high rates of fertility. The work in [10] contains the data about the district based soil fertility rate of India, which is feasible for appropriate decision making for the rate of fertilizers to be used and the procedure for distribution.
From the above survey, the following findings are identified. i) Artificial Neural Network technique takes higher processing time to evaluate the soil quality and it gives lower precision, specificity, recall and F1-Score value of soil samples. ii) Other techniques like Partial Least Square Regression method, K-Nearest Neighbour method, Unbiased Linear Prediction technique need complex calculations to evaluate the soil fertility rate.
The main objective of this work is to classify the area based soil quality index based on the sample data obtained from the regions in and around Coimbatore and Erode districts, Tamil Nadu, India. Based on the classification results, index report is framed for decision making to provide fertilizer recommendations to the experts. Additionally, the deep learning based soil quality analysis helps in minimizing the unwanted fertilizer usage and measures the soil and environmental quality. The proposed Improved Soil Quality Prediction Model using Deep Learning (ISQP-DL) model aims to classify the soil data indices and pH rates of sample data from Coimbatore and Erode region based on the aforementioned soil factors. The Soil Quality Report (SQR) is further provided for the experimentation and evaluations. And, the fertility rates are categorized over the six levels of soil nutrient rates, as, Very-Less, Less, Medium, Modest, High and Max-rate. For Training and Testing, Deep Neural Network Regression (DNNR) model, which has multiple hidden layers, is used here. The major advantage of the model is that the model can correlate parameter combinations, which may reduce the computational complexities and enhances generalization. Furthermore, the model involves in improving the classification accuracy of quality indices based on the pH rate and nutrients of the sample soil data. The incorporation neural networks operations provides better rate of quality prediction over the statistical techniques. In the scenario of smart agriculture, the The remainder of this paper is framed as follows: Section 2 contains the works related to various machine learning techniques used for various category of agriculture process and soil quality analysis. Section 3 describes the study state data and the work process of the proposed model with diagrams in detail. The results and comparisons are presented with graphs in Section 4. Finally, Conclusion of the proposed model and future enhancement ideas are given in Section 5.

Related Works
In recent scenario of agricultural research, myriad researches have been done using machine learning techniques. For determining the soil fertility rate, the work in [11] used different classification methods such as Naive Bayes (NB), Random Forest (RF) and J48. The results stated that the J48 algorithm provided better results than other models. Further, in the study [12], a comparative analysis has been presented for the machine learning models such as NB, JRip and 48 in classifying soil categories. The evaluations have been carried out with 110 input samples and the results stated that the JRip model produced higher rate of accuracy. In [13], data mining method were incorporated for measuring the yield rate and enhancing the gain. Further, a comparative study has been done in [14], between two techniques called Support Vector Machine SVM and Artificial Neural Networks (ANN). The results depicted that Support Vector Machine (SVM) produced better result than ANN. Fuzzy C-Means Clustering has been used in [15] for classification.
In a different manner, Coactive Neuro-Fuzzy Inference System (CANFIS) [16] has been derived for measuring the soil temperature. The results have shown that the model produced maximal error. Furthermore, in [17], Extreme Learning Machine (ELM) and Self-Adaptive Evolutionary (SaE) model are combined to form SaE-ELM algorithm. The model concentrated on measuring the temperature, pressure and solar radiation. Gene Expression Programming (GEP) has been used in [18] for estimating the soil temperature, using the data obtained from 31 regions in Iran.
Different set of soil data with different geographical data were given as input for derivations. The soil data samples from different depths are collected and given to Group Method of Data Handling (GMDH) for estimating the soil temperature in [19]. But, the implementation of those models is not cost effective and there some time complexities [20]. It has been found from the result that the accessibility and reliable soil temperature that are partial and there is a requirement for an effective model. Recently, Fractionally Autoregressive Integrated Moving Average (FARIMA) has been developed for measuring the soil temperature in [21]. The obtained results are compared with the results from GEP and Artificial Intelligence model and found that the FARIMA model is inadequate in determining for the extreme rates of soil temperature [22,23].

Proposed Model
The soil sample data are acquired from the region around Coimbatore and Erode as presented in Section 3.1 and the soil features such as, Organic Carbon, pH, Phosphorus and Potassium rates in soil. And, Soil Quality Report is generated based on the derivation of Fertility Rate (FR) of Soil. For performing classification DNNR algorithm is incorporated and the soil data samples are classified under classes such as, Very-Less (A), Less (B), Medium (C), Modest (D), High (E) and Max-rate (F).

Information of Study Site
In this work, the soil samples are obtained from the two vast of Tamil Nadu, India, such as, Coimbatore and Erode, which is located at, 11°1′6″N 76°58′21″E and 11°21′0″N 77°44′0″E of latitude and longitude respectively. The location maps of the aforementioned two regions are presented in Fig. 2. And, the soil types on those regions are geographically depicted in Fig. 3. Moreover, it is observed from the map that the two regions are majorly comprised of two types of soil such as, black and red soil.  The soil analyses are carried out based on their nutrients for determining and managing the area-wise soil health information. Moreover, the data are obtained during the years 2016-2020 from soil evaluation laboratory. And, the soil data samples are evaluated based on the features such as, soil pH, OC (Organic Carbon), Potassium, Phosphorous and the ranges are classified based on the results given in Tabs. 1 and 2.

Computation of Fertility Rate (FR) of Soil Samples
The Fertility Rate is for all the agricultural lands in and around the observed soil regions are computed as follows in Eq. (1) where, A, B, C, D, E and F, represents the ranges of cultivation as, max-rate, high, modest, medium, less and veryless, respectively, in any particular region. Based on FR, the soil rates are classified under 3 classes as in Tab. 3.

Procedure of Improved Soil Quality Prediction Model (ISQP)
The procedure of ISQP is presented in Fig. 4 for the classification of soil data. The sample soil data for each classification process are randomly organized and 80% of data are utilized for training and validation, and 20% of data are given for testing. Here, the ten-fold cross validation process is utilized for training and data validation, in that, 90% of data are provided for training and 10% is provided for validation. The classification model is trained based on the soil features computed in the computation phase. Moreover, for the process of classification, the features of soil nutrients are considered in the range [10,150] and the pH classification is defined as, [10,200]. In particular, the classification process is carried out with respect to the FR and the other features defined in the previous section.

Classification Model Construction
In the proposed model, DNNR is used for classification of soil samples. This is a multi-hidden layer containing minimal two hidden levels of regression NN. The Network structure of DNNR is depicted in Fig. 5. The benefit of the incorporation of DNNR is the model can combine the varied feature combinations that perform amalgamation of hidden attributes. This effectively reduces the computational complexities and enhances the generalization ability. Moreover, the network structure comprises of three layers such as, input layer, hidden layers and output layer, in which the nodes are connected with each other. Based on the number of soil samples, the number of hidden layers is varied. And, the operations are explained as below.
1. The number of soil features determines the number of input layers in the DNNR structure. The increasing hidden layers denotes the number of attributes to be reduced for effectively avoiding the overfitting problems 2. Neurons are presented in hidden layers, to perform aggregation operation and the process of rectifier activation. And, Rectified Linear Activation Function (RLAF) is determined to enhance the network depth and increases the training speed. The formula for computing RLAF is given as Eq. (2), 3. In DNNR, the output layer is different from the traditional model for classification. And, the regression function for deriving the output is given as Eq. (3), where, 'b' is the bias rate, 'W' denotes the weight factor and 'i' is the number of nodes.
5. Finally, the optimization function is derived with the RLAF, where, r i; t Að'Þ denotes the gradient rate of 'i' th node in 't' th round, 'E' is the minimal angle rate, B i;t is the combined result with previous round, and, the expression is given as Eq. (5),

Performance Evaluation Metrics
The proposed model is evaluated based on the following metrics.
i) Mean Absolute Error (MAE) is derived in Eq. (6), ii) Mean Square Error (MSE) is computed in Eq. (7), iii) Root Mean Square Error (RMSE) is calculated in Eq. (8), And, the implementation of classification model requires evaluation based on classification accuracy based on the results of True Positive (A), True Negative (B), False Positive (C) and False Negative (D). The mathematical computations are provided in Eqs. (9)-(11), iv. Sensitivity Rate: v. Specificity Rate: vi. Precision Rate: vii. Accuracy Rate of Classification: The efficiency of the proposed model is derived by rate of classification accuracy, which is formulated based in Eq. (12).
iii. F1-Score: F1 score can be computed as the harmonic mean of precision and recall values, which can be measured in Eq. (13), Using the collected values from the sensor and IoT devices from soil samples, the training and testing process are carried out using the Deep Learning model. The data from the samples contains missing and duplicate data, which are removed at the pre-processing phase and then, given for training, can effectively reduce the  As per the evaluation metrics provided in Section 3.5, the model efficiency is measured. Moreover, in the following Fig. 8, the results obtained for the parameters Precision, Specificity, Recall and F1-Score of the proposed and the compared works are presented. It is explicitly shown in the Fig. 8. that the proposed ISQP model achieves higher results than the compared neural networks and machine learning models.  For any classification model, the model efficiency is completely based on the accuracy rate of classification and the formula for computation is presented in Section 3.5. Here, the average rate of accuracy for the evaluation of 10 samples for both Coimbatore and Erode regions and the values are plotted with respect to models. It is clearly observed from the following Fig. 9. the proposed model achieves higher rate of accuracy in classifying the soil samples under their exact category.
The following graphs given in Figs. 10 and 11. displays the results obtained for the evaluation of fertility rate of soil samples in and around the regions of Coimbatore and Erode, respectively. The results are analysed and compared between the predicted fertility score with the proposed model and the real values from the available sources. From that, the analysis shows that there is not a greater deviation for both values in all comparisons.
By the effective implementation of DNNR, the proposed model reduces the computational and time complexities. Previously, the missing values and duplications of input data are also effectively handled before training. Hence, the proposed model provides results of the soil quality in minimal time than other compared models and the results are provided in Fig. 12.  The pie charts that are shown in Figs. 13 and 14 displays the final classification results of soil samples of Coimbatore and Erode regions, respectively. Based on that results, the Soil Quality Report is generated and submitted for further references and proceedings on to the cloud server. The results are shown that the proposed model achieves 96.7% of accuracy rate in classification accuracy, which can help the agricultural experts in better manner.

Conclusion and Future Work
This paper develops a novel model called, Improved Soil Quality Prediction Model using Deep Learning (ISQP-DL) for soil quality evaluation and defining better soil and crop management patterns. By effective analysis, the excessive utilization of fertilizers can also be avoided and the soil nutrients can be effectively preserved and utilized. For that, initially, the model analyzes the soil features, such as, OC, Potassium, Phosphorous, pH, Temperature, and soil moisture. Based on the values and thresholds, the results are classified under six classes as, Very-Less (A), Less (B), Medium (C), Modest (D), High (E) and Max-rate (F), using the Deep learning technique called DNNR. By the classification results, the better decision making can be processed by the farmers or the agricultural experts, regarding soil resource management and fertilizer recommendations. Moreover, the model provides 97% of accurate results with minimal computational and time complexities than the compared works.
In future, the work can be further enhanced to support automated irrigation process with IoT implementations.
Funding Statement: The authors received no specific funding for this study.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.