Desertification Detection in Makkah Region based on Aerial Images Classification

Desertification has become a global threat and caused a crisis, especially in Middle Eastern countries, such as Saudi Arabia. Makkah is one of the most important cities in Saudi Arabia that needs to be protected from desertification. The vegetation area in Makkah has been damaged because of desertification through wind, floods, overgrazing, and global climate change. The damage caused by desertification can be recovered provided urgent action is taken to prevent further degradation of the vegetation area. In this paper, we propose an automatic desertification detection system based on Deep Learning techniques. Aerial images are classified using Convolutional Neural Networks (CNN) to detect land state variation in real-time. CNNs have been widely used for computer vision applications, such as image classification, image segmentation, and quality enhancement. The proposed CNN model was trained and evaluated on the Arial Image Dataset (AID). Compared to state-of-the-art methods, the proposed model has better performance while being suitable for embedded implementation. It has achieved high efficiency with 96.47% accuracy. In light of the current research, we assert the appropriateness of the proposed CNN model in detecting desertification from aerial images.

process to ensure food availability. Automatic desertification detection plays a huge role in saving damaged lands. Detecting the desertification process through aerial images from satellites or aircraft helps avoid more damage to the land.
With the big development of image processing techniques, the performance of computer vision applications has also been improved. The recent Deep Learning techniques [3] has been widely deployed for solving many applications such as object detection [4], traffic sign detection [5], indoor object detection and recognition [6,7], pedestrian detection [8], and images segmentation [9]. Most image processing applications are based on Convolutional Neural Networks (CNN) [10]. CNN is a Deep Learning model inspired by the biological nervous system. It is composed of millions of artificial neurons stacked in a hierarchical structure. CNN has the ability to learn directly from data, unlike old machine learning techniques that needed handcrafted engineering for features extraction. This model can be trained using a huge amount of data without getting overfitted.
The availability of aerial images provided by satellites and drones has allowed the surveillance of a wide area of land. The aerial images were used for desertification detection, water level, and deforestation. Most coming desertification causes are the degradation of the vegetation cover by deforestation and overgrazing in addition to desert movement. Arial images can be used for real-time detection of desertification factors and react to avoid much more damage. The automatic desertification detection system is can be achieved by combining the performance of CNN models and the availability of aerial images.
In this work, we propose to use Deep Learning techniques to detect land state variation and sand movement in real-time. The efficientNet model [11] was used to classify the land state and its variation in real-time to detect the desertification process. The efficientNet model is a lightweight CNN model with high-performance for image classification. The model has achieved a top-5 accuracy of 97% on the ImageNet image classification dataset [12]. The model is 8.4 times smaller and 6.1 times faster than other models with the same accuracy. The efficientNet has proposed a compound scaling technique that allows balancing the parameters of the network. Generally, a CNN model has three parameters: the resolution, the depth, and the width. The resolution is the dimension of the input image. The depth is the number of hidden layers, and the width is the number of channels at the end of the feature extraction stage. Zagoruyko et al. [13] proved that getting wider networks while having a deep neural network can improve the performance of the CNN model. The compound scaling technique was proposed to find the balance between different parameters concerning the available computation resources. The efficientNet model is suitable for embedded implementation due to its lightweight size and its high performance.
For the training and the evaluation of the proposed CNN model, the Aerial Image Dataset (AID) [14] was used for training and performance evaluation. The dataset is composed of satellite images for different outdoor scenes such as desert, vegetation, city, and many others. For the desertification detection application, only the desert and vegetation classes were considered and other classes were used as negative samples. The proposed CNN model has achieved an accuracy of 96.47%. It can be integrated into drones for instant detection of desertification factors such as deforestation and overgrazing. Besides, the Landsat 7 dataset was used to analyze the desertification progresses between 2002 and 2021.
The main contributions in this work are the following: Proposing a desertification detection system based on classifying aerial images. Proposing the efficientNet CNN model for aerial image classification to detect desertification.
Evaluating the proposed CNN model on the Aerial Image Dataset and achieving high performance.
The rest of the paper is organized as follows: Section 2 is reserved for related works; Section 3 provides a description of the proposed approach in details; Section 4 presents the experimental results and discussion; and Section 5 is the conclusion of the study.

Related Works
The desertification process affects the livelihood of millions of humans and all types of living beings. Recovering damaged land is still possible provided immediate action is taken. Desertification detection has been an important research field and many models were proposed to address this problem.
Azzouzi et al. [15] proposed a desertification monitoring system in Biskra, Algeria. The proposed system was based on the supervised classification method. A Support Vector Machine was used to classify satellite images of 25 years period changes. A pre-processing technique was applied to the input images such as radiometric calibration, atmospheric normalization, geometric correction, and image registration. The ENVI software was used for image pre-processing by using 255 control points. An accuracy of 95.15% was achieved on the Landsat dataset.
A desertification detection model in Naiman Banner in China was proposed in Ye et al. [16]. The proposed model was based on the Albedo-Modified Soil Adjusted Vegetation Index (MSAVI) and the effects of soil background. The MSAVI is the spatial distribution density of vegetations. Thus, it can provide information on the growth status of plants. Therefore, it is used as a robust indicator for desertification monitoring. Besides, the Albedo is an important parameter that characterizes the radiation characteristics of the vegetation area by measuring the sole moisture such as snow, vegetation, and desert. Two models were proposed to detect desertification. The first is a point-to-point model which detects desertification level based on measuring the distance between two points. If the distance between two points increases; it is a severe desertification status. The second model is a point-to-line model, which is used to detect the desertification process. Increased distance between a point and line results in augmenting the desertification severity. An accuracy of 93.3% was achieved on the Landsat8 dataset.
Afrasinei et al. [17] proposed a desertification detection system in Biskra, Algeria, and OumZessar, Tunisia. The proposed system was based on machine learning where supervised and unsupervised learning was used. Firstly, the visual data was interpreted for land mapping. Then, a nomenclature was created based on the knowledge guidelines in the literature. As a result, detailed LCLU maps were created to be used as base maps and further analysis. The classification models were designed and deployed. The decision tree was used for supervised model and the IsoData of Knepper PC was used as a classification method for unsupervised learning. The proposed methods were evaluated on the Landsat dataset. Data from 1984 to 2015 and from 1984 to 2014 were used for Biskra and for OumZessar respectively to detect the changes in land type. In Biskra, the proposed methods were evaluated on the validation data and using Google Earth data. In OunZassar, the data were collected manually and then divided into training and validation data. The supervised learning method based on a decision tree was evaluated on the proposed data and an accuracy of 85% was achieved.
A desertification detection approach was used to identify land state variation in Yunnan-Guizhou Plateau, China [18]. The proposed approach has been used to analyse satellite data from 1989 to 2016. The input data was pre-processed such as atmospheric correction, geometric correction, and image registration. The Fast Line-of-sight Atmospheric Analysis of Spectral Hypercube (FLAASH) module of ENVI 5.3 was used for atmospheric correction. The geometric correction was performed by selecting ground control points from topographic maps with rectification errors of less than one pixel. The preprocessed images were used to rectify the original images based on image-to-image registration using the second-order polynomial model in which the total root-mean-square error was less than 0.5 pixel. The proposed approach has deployed a combination of desertification indicators such as fractional vegetation coverage, a fraction of underlying coverage, slope, and gully density, to detect desertification degree. The rule-based tree model was used to classify the images by iteratively segmenting them into smaller subdivisions based on rule sets. The classifier was built by using the desertification indicators to classify each pixel of the input image. The Landsat datasets were combined such as Landsat thematic mapper images acquired in 1989, Landsat thematic mapper images acquired from 1992 to 2010, Landsat ETM Plus (ETM+7) images acquired in 2001, and Landsat OLI images acquired from 2013 to 2016, for the evaluation process. An accuracy of 90.16% was achieved.
The proposed works were based on machine learning techniques [19] which need engineering knowledge for features extraction [20]. Also, machine learning classifiers [21] have achieved their highest performance and no more enhancement can be achieved. The proposed desertification systems were based on analyzing time-series data to detect land state variation. A real-time desertification system must be deployed to detect variations and take the necessary action to stop this threat.

Proposed Approach
In this paper, we propose a desertification detection system in Makkah region. The proposed system is based on the CNN model. We started with describing the area of interest. Then, we introduced the proposed desertification system in details.

Description Of Makkah Region
The study reported in this paper focused on the Makkah region, Saudi Arabia. It is an important city for Saudis and Muslims worldwide. It is located in western Saudi Arabia, in the central part of the Hijaz Mountains, inland from the Red Sea coast. Makkah has a total area of 1200 Km². It is known for its high temperature that does not go below 20°C with low humidity. The average temperature in summer is 40°C. According to the General Authority for Statistics in the Kingdom of Saudi Arabia, the latest recorded population in Makkah region was 1.579 million in 2015. The city has experienced fast urbanization and exhibits a variation in land use and cover types. Every year, Makkah receives more than 3 million visitors for Hajj, pilgrimage. Similarly, the internal and external migration for religious or economic reasons has increased the population.
A satellite overview of Makkah city is presented in Fig. 1. As shown in Fig. 1, most of the city is desert and the rest is urban spaces. It is very important to avoid more sand movement from the nearby cities that causes damage to land productivity. The city is facing a desertification problem and early detection may allow avoiding the damage.

Desertification Detection Using CNN Model
The desertification detection can be achieved by classifying aerial images. In this work, we propose a CNN model due to its success in solving many image processing applications. The state-of-art performance was boosted and a high level was achieved. However, the CNN model needs a huge amount of training data and a lot of computational effort. To overcome these limitations, many techniques were proposed to reduce the training time and data in addition to reducing the computation efforts.
The transfer learning technique was proposed to allow the reuse of existing CNN models, designed for a specific application, for a new application with minor modifications. The transfer learning technique helps reduce the need for a huge amount of training data and training time. Besides, it guarantees the model convergence and achieves a high performance. In this work, the transfer learning technique was applied on a CNN model to be used in aerial image classification for desertification detection.
We propose the use of the efficientNet model due to its light size and its high performance. The model was proposed to be suitably integrated on embedded devices. Two main contributions were proposed by the efficientNet model. Firstly, it proposed the use of Inverted Residuals Bottlenecks. Those blocks were already proposed in the mobileNet v2 model [23]. In the efficientNet model, those blocks are named MBConv. The main idea of those bottlenecks was the use of depthwise and pointwise convolutions. The depthwise convolution is a convolution layer that has the same number of input and output channels. The pointwise convolution is a convolution layer with a kernel size of 1 Â 1. The Inverted Residuals Bottlenecks are composed of three stages. The first stage is an expansion layer which is a convolution layer with a kernel size of 1 Â 1 followed by a batch normalization layer and a non-linear activation layer. The main idea of this stage was to compress the number of input channels to reduce the computation and the number of parameters. The second stage is a depthwise convolution layer followed by a batch normalization layer and a non-linear activation layer. The third stage is a pointwise convolution followed by a batch normalization layer and a non-linear activation layer. The non-linear is based on a modified rectified linear unit (ReLU6). The proposed ReLU6 convert negative weights to zero and weights with a value higher than six to six and all other weights are maintained. The main idea was to prevent weights from becoming too high and require more storage memory. The architecture of the Inverted Residuals Bottlenecks is presented in Fig. 2. Generally, the convolution layer filters and compress features to pass them to the next layer. The Inverted Residuals Bottlenecks was proposed to separate the filtering and compression processes. The filtering process was performed using the depthwise convolution and the compression process was performed using the pointwise convolution. Separating the filtering process and the compressing process allows to compute the same function faster than using a regular convolution layer and results in fewer parameters.
The second contribution of the efficientNet model is the compound scaling which aims to balance the parameters of the model. A CNN model has three main parameters which are resolution, depth, and width. The resolution is the dimension of the input image, the depth represents the number of layers and the width is the number of channels at the output of the feature extraction part. The proposed compound scaling can be applied to any CNN model to enhance its performance but a better performance could be achieved if a good CNN model was selected. The efficientNet model was obtained using Neural Architecture Search (NAS) [24] which is optimized for accuracy and floating-point operations (FLOPS). The efficientNet B0 model is illustrated in Fig. 3.
After selecting a good model that allows achieving high-performance, the compound scaling technique was applied. Scaling the parameters of the CNN model independently can enhance the performance but does not allow to achieve the ultimate performance. Therefore, compound scaling was proposed to find a balance between CNN parameters concerning the available computation resources. The compound scaling technique uses a compound coefficient [ for a uniform scaling of the resolution, depth, and width. The scaling of the parameters is presented in Eq. (1).
where α, β, and γ are constant values that specify how to assign the available computation resources on resolution depth and width of the model. [ is a user-specified coefficient that is used to control the model scaling with respect to available resources. Basically, the FLOPS in a CNN model is proportional to r 2 ; d; and w 2 : So, if the depth of the model is doubled, then the number of FLOPS will be doubled, too. However, if the width or resolution is doubled, then the number of FLOPS will be increased four times. Since most of FLOPS come from the convolution layers, the total increase of the number of FLOPS can be computed as Eq. (2).
Assuming that the available computation resources are two times the needed FLOPS, the a:b 2 :c 2 % 2 was considered. So, for any new value of the scaling coefficient [, the total number of FLOPS will increase by 2 [ times. The compound scaling technique becomes efficient if a high-resolution image is used as input, so, the CNN model will have more hidden layers to increase the receptive field and more channels to detect smaller patterns of the input image. Based on previous works, it has been proved that there is a strong relationship between resolution, depth, and width. The efficientNet model added the resolution to the already proved relationship. The resolution of the input image has enhanced the performance of the model. The CNN model parameters were informally scaled based on a set of fixed scaling coefficients. Considering 2 N available computational resources, the parameters can be increased by a N for the resolution, b N for the depth and c N for the wide.
To find the appropriate value of α, β, and γ, a small grid search was performed. For efficientNet B0 model, the best values found were α = 1.2, β = 1.1 and γ = 1. 15. The values which were obtained under the consideration of the available computation resources are two times the needed FLOPS.
The scaling technique is composed of two steps; first is the search for the best values of α, β, and γ by fixing [ ¼ 1 and performing a small grid search and second is manipulating the value of the scaling coefficient [ to find the ultimate performance.
The proposed efficientNet was used for aerial image classification to detect desertification in the Makkah region. The model was designed to spot land state variation and detect any desertification index such as deforestation and sand movement. The proposed desertification detection approach is presented in Fig. 4.

Experiments and Results
The proposed model was evaluated using a desktop with a Linux operating system equipped with an Intel i7 CPU, 32 GB of RAM, and Nvidia GTX960 GPU. The proposed model was developed based on  To train and evaluate the proposed CNN model, we propose the use of the Aerial Images Dataset (AID) [14]. The dataset was collected from Google Earth imagery. The images were post-processed using RGB rendering from the optical aerial images. The post-processing technique has proved that there is no difference between the RGB image and original optical images especially for land use/cover mapping at the pixel level. So, both RGB and optical images were used for the evaluation of scene recognition techniques. AID has 30 aerial scenes, but for this work, we consider only five scenes which are bare land, desert, farmland, forest and mountain. The dataset has a total of 10000 images in which the number of samples for each class varies from 220 to 420 samples. The resolution of the images was fixed to 600 Â 600. Each pixel resolution covers an area with a size of half a meter to eight meters. In this work, we proposed to use 60% of the data for training and 40% of the data for testing. The whole dataset was used and non-considered scene classes were used as negative data.
The transfer learning technique was applied to efficientNet model and used for the scene recognition from aerial images. The model weights were initialized using the ImageNet weights. The Adam optimizer was used to train the model. The Adam optimizer does not need to set the learning rate and the weights decay. The learning rate is generated randomly and optimized while training. The weight decay is proportional to the first and second momentums. A batch size of 8 images was used. The model was trained for 290000 iterations and an early stopping condition was proposed which stops the training process if the accuracy does not change for 10000 iterations. The cross-entropy loss was used as the loss function for the proposed model. A data augmentation technique was applied to increase the training data which allows enhancing the performance of the model. As shown in Tab. 1, the proposed model has achieved high performance with a few FLOPS. The obtained results have proved that the proposed model can be implemented on an embedded device for integration in aircraft and drones for instant desertification detection. The image acquisition system will collect images and the desertification detection system will analyze them instantly to detect the land state. The results will be compared to the old stored state. If the state is changed from vegetation cover to desert then a desertification warning is generated.
The achieved results were compared to state-of-the-art desertification detection methods. Tab. 2 presents a comparison of the proposed method against state-of-the-art methods. The proposed desertification detection method has outperformed state-of-the-art methods. The achieved results prove the efficiency of the proposed model for scene recognition from aerial images which allow detecting desertification. The proposed method was more accurate and need lower computation efforts. CNN model was very effective in learning from the aerial images and extract more relevant features that allow achieving high classification accuracy.
The proposed method was evaluated using Landsat images to detect land cover modification over the years. The collected images were from a range between 2002 and 2021. Fig. 5 presents the region where images were collected and specifies an example of Landsat path. In the test, we considered only the wild regions and city regions were removed. The studies region is almost bare land with some desert. The vegetation cover is too low. Only a few desert plants exist and the rest of the region is rocks.
The study of the desertification changes in Makkah region between 2002 and 2021 proved that there is no much modification on the quantity of vegetation cover quantity. Fig. 6 presents an example of the Landsat images in 2002 and 2021. The main idea of the proposed work is to prevent desertification in the future by detecting the land state and the vegetation quantity and quality. The proposed method aims to detect land state changes in real time by its integration in aerial imaging devices such as drones and aircrafts. Its low computation complexity allows its implementation on a wide range of embedded devices.
The obtained results proved the efficiency of the proposed CNN model for desertification detection. The efficientNet model was proposed to be suitable for embedded implementation. The obtained lightweight model size proved that the model can fit into the small memory of the embedded devices. Besides, the SVM [15] 95.15 MSAVI [16] 93.3 Hong et al. [18] 90.13 efficientNet B0 (ours) 96.4 performance of the model was enhanced through the availability of aerial images dataset (AID). The dataset contains 10000 images from 30 scenes. In this work, we consider only 5 scenes to be recognized which are bare land, desert, farmland, forest, and mountains. Those scenes are responsible for the desertification detection. Change of the land state from vegetation cover to desert or sand movement from desert to vegetation is considered as desertification indicators. Bare land and mountains were detected to make difference between land states. Most of the studied region is mountains, desert, and bare land. Only a small area has vegetation.

Conclusion
Desertification degrades land productivity and threatens all types of life. It is caused by natural factors and human activities such as overgrazing, deforestation, and the overuse of insecticide. Desertification detection plays a major role in saving land productivity by detecting the desertification process at an early stage and allows to apply a recovery process to avoid much more damage. In this paper, we proposed a desertification detection system by analyzing aerial images. The proposed system was based on a lightweight CNN model to be suitable for embedded implementation for possible integration in aircraft and drones. The efficientNet model was proposed for aerial image classification to detect land state and allow the detection of the desertification process. The proposed model was trained and evaluated on the aerial images dataset (AID). The efficientNet model has achieved an accuracy of 96.47%. Compared to state-of-the-art desertification detection methods, the proposed method has better performance while being suitable for embedded implementation. The obtained results proved the efficiency of the proposed method for desertification detection. Moreover, the proposed method can instantly detect desertification from aerial images. If the desertification process is detected in an early stage, a warning is generated to start a recovering process to avoid serious damage to land productivity.