Planetscope Nanosatellites Image Classification Using Machine Learning

To adopt sustainable crop practices in changing climate, understanding the climatic parameters and water requirements with vegetation is crucial on a spatiotemporal scale. The Planetscope (PS) constellation of more than 130 nanosatellites from Planet Labs revolutionize the high-resolution vegetation assessment. PS-derived Normalized Difference Vegetation Index (NDVI) maps are one of the highest resolution data that can transform agricultural practices and management on a large scale. High-resolution PS nanosatellite data was utilized in the current study to monitor agriculture’s spatiotemporal assessment for the Al-Qassim region, Kingdom of Saudi Arabia (KSA). The time series of NDVI was utilized to assess the vegetation pattern change in the study area. The current study area has sparse vegetation, and exposed soil exhibits brightness due to low soil moisture, constraining NDVI. Therefore, a machine learning (ML) based Random Forest (RF) classification model was used to compare the vegetation extent and computational cost of NDVI. The RF model has been compared with NDVI in the current investigation. It is one of the most precise classification methods because it can model the complexity of input variables, handle outliers, treat noise effectively, and avoid overfitting. Multinomial Logistic Regression (MLR) was implemented to compare the performance of both NDVI and RFbased classification. RF model provided good accuracy (98%) for all vegetation classes based on user accuracy, producer accuracy, and kappa coefficient.


Introduction
Agriculture is a crucial sector in Saudi Arabia due to the increasing population and significant economic growth [1]. Despite various limitations, such as changing climate, less rainfall, limited water resources, hyper aridity, and scattered cultivatable areas, agriculture has prioritized improving food security and achieving self-dependency [2]. Agricultural activities mainly depend on the availability of water consumed from aquifers (shallow/deep) and seasonal water in an arid area [3]. It also makes it very crucial to estimate the water consumption for planning the water resources. Due to the unavailability of water use at the microlevel, satellite-based GRACE data becomes a vital utility to estimate water extraction. Few studies have discussed the consumption of water concerning irrigation using GRACE are available in the literature. [4] Furthermore, [5] used GRACE data (2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016) and to understand the Groundwater Storage Change (GWSC) in Saudi Arabia.
Estimation of crop extent and its interaction with climatic parameters is an emerging and vital research area. The role of geospatial techniques provides a promising commitment to estimating and monitoring agricultural growth, environmental change, and water storage change. Previous studies used MODIS (Moderate Resolution Imaging Spectroradiometer) and Landsat suite (TM 5, Landsat 7 ETM+ and Landsat 8 OLI) to obtain the vegetation extent and change across the globe, e.g., [6][7][8][9][10]. Remote Sensing (RS) based NDVI is a technique that can estimate the extent of vegetation in an area and is a measure of crop health. Some studies analyzed climatic parameters such as temperature and rainfall with vegetation based on Landsat NDVI [9,[11][12][13][14][15]; the central issue of these studies was that the comparison made with limited images for different times and low spatial resolution (30 m).
Planet Labs' PS constellation (175 nanosatellites) revolutionized the RS-based vegetation assessment, see Fig. 1. PS-derived NDVI maps are one of the highest resolution data that can transform agricultural practices and management on a large scale. The NDVI time series data is helpful for farmers, and government authorities use this high-resolution information for sustainable agriculture planning, matching crops with changing environment, and allocating resources such as seeds, fertilizers, and, more importantly, water for significant development. Additionally, these NDVI based maps help any disease in the crops and estimate possible crop yield to ensure food security.
Few studies used PS data for vegetation analysis [16] used PS data and estimated the crop sowing date. [17] use PS data to map Striga weed based on RF. [18] used PS data with other satellite data to monitor crop growth. [19] used ML-based classification to make landuse maps based on PS data and observed an accuracy between 87% and 96%. [20] used PS data to analyze vegetation phenology in Kenya and advocated that the PS images have more detailed NDVI observations due to high spatiotemporal resolution. [21] used PS data and estimated the development stages of winter wheat. [22] fused PS data with Sentinel−2 to classify the vegetation based on NDVI. The research [23] developed an enhancement method to improve the quality of PS data using Landsat 8 and MODIS datasets. The primary issue with the previous study was that the vegetation derived from PS data needs to calibrate the atmospheric correction with other satellites. Another issue was the time series PS observations; some researchers used six months of data while others used 12 months. The present investigation overcame the limitation by using the level 3B surface reflectance products 3B_AnalyticMS_SR.tif and a continuous time series of more than 2.5 years for good insight into crop growth assessment.
Machine learning algorithms generally used in image classification are Artificial Neural Network (ANN), Support Vector Machine (SVM), Random Forest (RF), and Maximum Likelihood Classifier (MLC) [24][25][26][27]. [28,29] used the RF algorithm to analyze the sugarcane vegetation. Several studies have logically advocated using Random Forest's classification for remote sensing data [17,[30][31][32][33]. RF used several decision trees based on ensemble modeling of training samples and variables. The RF model can process higher dimension data, solve multicollinearity issues, and provide high accuracy with less computation time [10,34]. Almost negligible studies used PS data for RF-based classification. The comparative assessment of any ML model is essential to understand the performance of the model. The novelty of the present investigation is to use MLR between NDVI and RF-based vegetation classification for three categories: high vegetation, low vegetation, and no vegetation.
The assessment of water extraction and the impact of climatic parameters on vegetation is a significant contribution attempted in the present investigation. In the present research, an attempt was made to apply high-resolution PS data and climatic parameters to assess the spatiotemporal assessment of vegetation in the Al-Qassim region in KSA. There are three primary objectives of the current investigation: (1) Monitoring the agricultural pattern change for improved decision-making and sustainable agricultural development; (2) Estimating water extraction based on GRACE and GRACE-FO; (3) vegetation classification using RF model; (4); Performance comparison between NDVI and RF-based vegetation classification using MLR; and (5) Assessing the interaction of vegetation with climate parameters and water extraction.
The novelty of the present investigation is to develop and apply an RF classification model for continuous PS nanosatellite vegetation time series of 28 months, which is the longest continuous time series analysis for nanosatellites images for crop pattern growth and change. Another novelty of the present investigation is to assess the impacts of gravity-based groundwater change and climatic parameters on high-resolution vegetation change.

Study Area
Al-Qassim region contains almost 1.5 million population and a cultivation area of 1029 km 2 , making Qassim the third-largest agricultural region, around 12.32% of total KSA in 2009 [35]. The Qassim region is well documented in earlier work for vegetation change [36]. The primary cultivar is dates, wheat, and clover, representing approximately 40.35%, 23.39%, and 15.18% of the cultivated area, respectively [37]. Qassim region has a hot desert climate and rainfall around 93.44 mm.

High-Resolution PS Data
A total of 29; level 3B surface reflectance products (3B_AnalyticMS_SR.tif) supplied by Planet Labs were used in the present investigation, see Tab. 1. It was worth noting that the very high spatial resolution of PS 3 (m) and temporal resolution of 1 day make it suitable to assess the vegetation pattern with high accuracy. The surface reflectance product (SR-Planet Surface Reflectance Product) involves various quality checks, corrections including atmospheric correction [38], were utilized directly in the present investigation. The present investigation captures the vegetation details through PS nanosatellites from September 2018 to April 2021 (28 months) with gaps for 2019 and March, June, September, and October (4 months).

GRACE Data
Terrestrial water storage (TWS) was obtained from GRACE and GRACE-FO satellites from Sep 2018 to April 2021 monthly temporal and half degree spatial resolutions. GRACE release 06 (RL06) V 2.0 global mass concentration blocks or mascon products were used in the present investigation, acquired from CSR (Center for Space Research at the University of Texas, Austin) and Jet Propulsion Laboratories (JPL).

GRACE Processing
GRACE does not distinguish between anomalies from several elements of TWS (Terrestrial Water Storage) contained in surface water storage, canopy water, and soil moisture content. Therefore, subtraction of non-groundwater components was required to obtain the GWS. The surface water and canopy water are almost negligible for the current investigation area due to the semi-arid region. Therefore, only the soil moisture content (SMC) must be subtracted from the TWS to obtain the GWS as Eq. (1).
SMC was obtained using the Global Land Data Assimilation System (GLDAS) by averaging the three different LSMs viz. NOAH, CLM, and VIC (https://disc.gsfc.nasa.gov/). The GRACE JPL, CSR, and average data are shown in Fig. 2. The SMC and TWS errors were added in quadrature to obtain the error in GWS [4,41].

NDVI
The NDVI index is based on vegetation greenness or the photosynthetic process of vegetation, an index of plant "greenness" or photosynthetic activity [7,9,33]. NDVI used the principle that the vegetation absorbs the red light required for photosynthesis; however, it reflects the NIR light. The non-vegetated areas or stressed vegetation do the reverse process, i.e., the reflection of red light is more, and the NIR region's reflection is relatively more minor.
NDVI can be obtained from the following Eq (2).
NDVI depends on various factors such as total vegetation cover, soil moisture, and vegetation stress. Being a ratio, NDVI has advantages such as compensation of illumination conditions due to topography and multi-temporal image analysis for comparisons as images collected in different seasons. Fig. 3 illustrates the NDVI of 60 pivot fields.

Trend and Correlation Analysis
Mann-Kendall trend tests [42,43] were carried out for trend analysis to detect trends and changes in vegetation, GWSC, and climate variables over the years of analysis. Sen's slope values [44] were used to understand the trend of GWSC change for the study area from Sep 2018 to April 2021. The Mann-Kendall S value statistics were evaluated chronologically in the time series (Eq. (3)). The observations VAR(S) variance in the time series was also estimated (Eq. (5)). Standardized test Z (Eq. (6)) [45] was also performed for the statistical analyses. Here, Xi and Xj are chronologically placed values of variables in the time series, n represents the total count of observations, ties for pth value are shown as tp, and tied values number is shown as q. A positive and negative value of Z determined the increasing and decreasing trend, respectively. The Pearson correlation was used to understand the relationships between water extraction (GWSC), NDVI, and climatic parameters for the study area between Sep 2018 and April 2021.

Random Forest-Based Vegetation Classification
Defining the function was the first step in the RF classification; it defines the required features for data splitting, impurity function, number of samples, and stopping criteria. In this study, the Gini coefficient function was used for RF classification; which can be expressed as Eq. (7): where; n represents classes, v = vertex of the trees.
A total 300 decision trees were chosen for RF model.

Multinomial Logistic Regression Between NDVI and RF Model Classification
MLR was applied to find the correlation between NDVI and RF classified vegetation. Logistic regression (LR) is helpful for two categories. For more than two categories, the MLR is applicable; in the present investigation, three classes (high vegetation, low vegetation, and no vegetation) were used in MLR. Multinom function from the nnet package of R Language was applied. The level of the outcome was used as baseline applied through in relevel function. The MLR model was evaluated using stratified k-fold cross-validation, which ensures the class balance for training. Total three repeats and ten folds were selected to evaluate the MLR. The multinom package has no option for p-value calculation for the regression coefficients; therefore, p-values are calculated using z tests. H 0 : Null hypothesis: There is no effect of NDVI on the RF-based vegetation classification H 1 : Alternative Hypothesis: There is a significant effect of NDVI on the RF-based vegetation classification The z-statistic was computed using Eq. (10).
where x = mean of Sample; μ = mean of pixels; σ = standard deviation of pixels; n = number of observations The null hypothesis was false when the probability value was less than 0.05. The null hypothesis was rejected if the output value was higher than the probability threshold or significance level (α). The null hypothesis was rejected in case the output value was lesser than the α. The α value of 0.05 and the twotailed z-test were used to reject the null hypothesis. The relationship between NDVI and RF classification was observed as statistically significant when the null hypothesis was rejected.

Accuracy Assessment
The machine learning algorithm RF was performed for vegetation classification. The classification accuracies were calculated using the confusion matrix. The Kappa statistics were evaluated [46,47].
Cohen's kappa coefficient (K) [47] is given as Eq. (11) where N represents measurements. The SMC and TWS errors were added in quadrature to obtain the error (±1.2 mm) in GWS [4,41] 5 Results and Discussion

Extraction of Groundwater Based on GWSC
We have processed GRACE GWSC based on CSR and JPL mascon data for the Qassim region's identical PS footprint. The highest annual water extraction was observed in Aug 2020 for both GWS based on JPL and CSR. March 2020 and Feb 2021 were observed positive water levels for GWSJPL and March 2019 and February 20 for GWSCSR.

Vegetation Based on NDVI
28 PS images were processed between September 2018 to April 2021 for 60 pivot fields, see Fig. 3. Our findings indicated that the agricultural activities were almost similar for 60 pivot agriculture fields analyzed in the present investigation. The 60 pivot fields were categorized into three categories based on NDVI time series behavior. The highest values of NDVI were observed in April 2020 and Jan 2021, whereas the lowest NDVI values were observed for Sep 2019, June 2020, and Dec 2020. The fields from 1 to 7 and 41-60 showed significantly increasing NDVI value from September 2018 to April 2021; the 10 to 40 showed stable trends.

Trend Analysis
The vegetation extent based on NDVI showed an increasing trend for 30 pivot fields with Kendall's Tau of 0.15 and slope = 0.004. The other 30 pivot fields shown Kendall's Tau of 0 mean no trend or stable NDVI values. The Kendall's Tau values for the GWSC change rate were −0.8 mm/year and −0.06 mm/month for GWSJPL. The Kendall's Tau value for GWSCSR change rate was higher than GWSJPL, i.e., −1.6 mm/year and −0.12 mm/month for GWSCSR. The monthly rainfall showed a decreasing trend with Kendall's Tau of −0.60 and a slope of −0.01 based on CHRIPS gridded data. The highest monthly rainfall (110 mm) was observed during Nov 2018. The months from Jun-Oct shown no rainfall whereas April month showed higher rainfall during all three years. The annual rainfall for the years 2019 and 2020 was 149 mm and 194 mm. The Tmax was higher (45°c) during May-June, and Tmin was lower during winters (5°c). The temperature, wind speed, and evaporation trends were stable; additionally, the evaporation perfectly aligned between Tmin and Tmax. The highest ws was observed during Oct-20.

Correlation Analysis
Some studies analyzed the climatic impact on vegetation, especially using NDVI [13,[16][17][18]]. An attempt was made to study the impact of climatic parameters such as temperature and rainfall on vegetation, see Fig. 4. The PME MET station rainfall data was available for only one study area from 2009-2018 with gaps; additionally, one station can cover 10 km 2 [45]. Therefore, the latest CHRIPS and MERRA-2 datasets were utilized in the present investigation. The present investigation found interesting interactions between vegetation, environment variables, and water extraction. A correlation was also found between environment variables. The rainfall showed a correlation of 0.47 with reference evaporation. The correlation of rainfall with GWSJPL and GWSCSR was 0.38 and 0.26, respectively indicate that GWSJPL has shown a better relationship with rain than the GWSCSR. The rainfall and temperature (Tmax, Tmin, Tavg

RF Classification
RF provided good accuracy for all three classes, i.e., high vegetation, low vegetation, and no vegetation, based on user accuracy, producer accuracy, and kappa, see Fig. 5. This method performed admirably in vegetation identification with higher producer and user accuracy. The error margin was relatively low, see Tab. 2. The RF is one of the most precise classification algorithms due to its ability to model the complexity of input variables. It can handle the noise and outliers as exposed soil in the current study, leading to indices' saturation. As a result, the results obtained from RF are generally better than other ML classification algorithms.

Multinomial Regression Results
An attempt was made to evaluate the performance of vegetation classification between NDVI and RF. The correlation results for all three classes are given in Tab. 3. All three classes showed a strong correlation between NDVI values and RF-based vegetation classification. The z-value is below −1.96, and the p-value is < 0.05 for all three classes; therefore, it can be concluded that there is a strong correlation between NDVI and RF, so the null hypothesis is rejected. The results indicated that NDVI and RF had shown almost similar results, further confirmed through 30 random points using sub-meter resolution's Google Earth Image dataset for available months between Sept 2018 and April 2021.

Conclusions
Accurate vegetation knowledge over a high resolution (3 m) can be achieved using revolutionary nanosatellites datasets such as Planetscope (PS) for understanding the agricultural sprawl. According to the changing climate, it is significant to understand the interaction of climatic parameters and water requirements with vegetation on a spatiotemporal scale to achieve sustainable crop practices. In the present investigation, high-resolution PS nanosatellite data was utilized to assess the spatiotemporal assessment of agriculture for the Al-Qassim region, KSA. For this purpose, the time series of NDVI was used for 29 months of time series data. The present investigation used the GRACE and GRACE-FO data to estimate the water used for vegetation. RF model was used to classify the vegetation, and MLR was applied to compare NDVI and RF-based classification performance.  The present investigation found interesting interactions between vegetation, environment variables, and water extraction. Rainfall and water extraction through GWSJPL showed an R-value of 0.38, while reference evaporation showed an R-value of −0.53 with GWSJPL. Temperature and GWSJPL showed a significant relationship with an R-value of 0.41. Among all climatic parameters, reference evaporation showed a good relationship with NDVI of all fields with an R-value of 0.37. GWS has also shown a good relationship with all 60 pivot fields with an R-value of 0.4. The correlation of rainfall with GWSJPL and GWSCSR was 0.38 and 0.26, respectively indicate that GWSJPL has shown a better relationship with rain than the GWSCSR. The future scope of the present research will be to utilize other high-resolution datasets and extend the study area with additional datasets such as soil moisture models to understand the agricultural pattern with different climatic parameters and techniques [48,49]. The selection of crop type and harvesting time should be compatible with the changing environmental conditions and water availability to achieve sustainable agriculture. Future agricultural practice depends on making informed decisions using high-resolution temporal assessment of matching crops according to the environment.