Assessing the Efficacy of Improved Learning in Hourly Global Irradiance Prediction

Increasing global energy consumption has become an urgent problem as natural energy sources such as oil, gas, and uranium are rapidly running out. Research into renewable energy sources such as solar energy is being pursued to counter this. Solar energy is one of the most promising renewable energy sources, as it has the potential to meet the world’s energy needs indefinitely. This study aims to develop and evaluate artificial intelligence (AI) models for predicting hourly global irradiation. The hyperparameters were optimized using the Broyden-Fletcher-Goldfarb-Shanno (BFGS) quasi-Newton training algorithm and STATISTICA software. Data from two stations in Algeria with different climatic zones were used to develop the model. Various error measurements were used to determine the accuracy of the prediction models, including the correlation coefficient, the mean absolute error, and the root mean square error (RMSE). The optimal support vector machine (SVM) model showed exceptional


Introduction
Increasing concern about the effects of climate change and the need to diversify energy sources has led to a significant increase in the development of renewable energy sources.Among these, solar energy has emerged as a viable option due to its abundance and potential to reduce carbon emissions.Moreover, as oil and gas resources become less available, developing renewable energy sources is becoming increasingly important for the country's long-term energy security.According to the recent the intergovermental panel on climate change (IPCC) report, solar energy has the potential to meet a significant portion of the world's energy needs, and Algeria is no exception [1].
Accurately estimating the amount of solar radiation hitting the Earth's surface is critical for various applications such as photovoltaic systems, heating, medical research, agriculture, and architecture.This is usually done with solar measurement devices such as solarimeters or pyranometers.However, it is difficult to measure solar radiation in many places in Algeria because the meters are too expensive, and the systems are very complex.Even if there are several meteorological stations in different locations in Algeria, measurements may not always be available due to power outages or limitations on the number of variables that can be recorded [2].
To address these challenges, researchers have developed models that use readily available meteorological data to predict global solar radiation (GSR) more accurately.These predictive models are becoming more advanced daily, but results vary by location.Therefore, it is important to use sophisticated GSR prediction techniques to improve solar energy potential prediction accuracy in Algeria [3].
Many research efforts have been made to predict solar radiation (SR) in different areas of the world using various techniques such as artificial intelligence and empirical methods.One popular method is Multilayer Perceptron (ANN-MLP) artificial neural network technology.However, other methods, such as decision tree models, support vector machines (SVMs), and feed-forward radial basis functions (FF-RBFs), have also been used to estimate solar radiation.Researchers such as [4][5][6][7][8][9][10] have used SVMs, and others have also used FF-RBF.Loutfi et al. [11] presented three different Feed-Forward Neural Network (FFNN) model topologies for generating global, direct and diffuse hourly solar radiation in Fez, Morocco.In order to perform a comparative study, different models were implemented, including the decision tree model, random forest model, generalized linear models, artificial neural network, linear regression model, and adaptive fuzzy neural interference system model [12].Bamisile et al. [13] developed and compared eight artificial intelligence models for solar radiation prediction at different time intervals (hourly, every minute, and daily average) using datasets from 6 African countries.They found that different AI models suited different solar radiation estimation tasks.Extreme gradient boost algorithm (XG-Boost) boost was the best model for 10 of the 13 case studies considered in this work.They concluded that hourly solar irradiance prediction is more accurate for the models than the daily average and minute time step.Solar energy is a focus of Algeria's ambitious energy policy, which allocates significant resources to solar thermal and photovoltaic resources.Projections indicate that solar energy will account for more than 37% of the country's electricity generation by 2030 [14].The annual sunshine duration in Algeria is more than 3,900 h in the Sahara and 3,000 h on the plateaus.The daily energy gain on a horizontal surface of 1 m 2 averages 5 KWh [10], in most regions of the country [15].Thus, solar energy is a good basis to help the country meet its energy needs.However, accurate solar radiation estimates are needed to exploit this potential fully.In this study, we use state-of-the-art machine learning techniques to improve the predictability of solar energy potential in Algeria.
The main objective of this research is to develop a method to optimize the hyperparameters of traditional machine learning using the multilayer perceptron (MLP) and support vector machines, thus increasing the reliability of hourly predictions of global irradiance.We used the FNN-MLP and SVM models to produce a reliable forecast of global solar irradiance at one-hour intervals at stations with different climates in Algeria.The following overview provides the framework for this research work: the materials and processes are discussed in Section 2, while the construction of the model is covered in Section 3, and the results and discussion are presented in Section 4. The paper concluded with a conclusion.

Studied Region and Database Collection
In this investigation, two radiometric stations were used to compile the database.The first station, "Shems", located in Bouzareah in Algeria, recorded experimental data using Kipp and Zonen pyranometers to measure the global horizontal irradiation (GHI).The second station, located in Tamanrasset in the Sahara Desert in southern Algeria, is equipped with an Eppley PSP pyranometer and has the highest solar energy resources in an arid desert environment.Table 1 presents comprehensive information on the two stations used for training and testing purposes, including station ID, station name, latitude, longitude, elevation, climate zone, and data periods.The table indicates that the training station, Bouzareah (BOU), is located in the Mediterranean climate zone and covers the period between January 01, 2014, and December 31, 2014.In contrast, the testing station, Tamanrasset (TAM), is in the hot desert climate zone and covers the period from July 01, 2019, to December 17, 2020.The information in the table is crucial for understanding the data used in the investigation and interpreting the results obtained.
Table 2 provides statistical analysis results for the input and output variables of solar radiation prediction at the BOU and TAM stations for temperature (TMP), humidity (HUM), wind speed (WSP), and global horizontal irradiance at the BOU and TAM stations.The BOU station has a maximum temperature of 44.11°C, a minimum humidity of 8.28%, a mean wind speed of 4.62 m/s, and a mean GHI of 517.88 Wh/m 2 .The TAM station has a maximum temperature of 38.5°C, a minimum humidity of 2%, a mean wind speed of 5.19 m/s, and a mean GHI of 678.76 Wh/m 2 .The correlation between these variables and solar radiation is further explored in Table 3.The results show that temperature positively correlates with solar radiation at both stations, with humidity having a negative correlation at BOU and a weak negative correlation at TAM. Wind speed has a weak negative correlation at BOU and a positive correlation at TAM.These correlations can be useful for developing accurate solar irradiance prediction models and designing effective solar energy systems.Fig. 1 displays the frequency counts of GHI between the two stations, BOU and TAM, at different bin center intervals.The plot reveals that TAM has higher frequency counts of GHI than BOU at all bin center intervals.The highest frequency counts for TAM are at the bin centers of 250, 350, and 450 Wh/m 2 , with 508, 254, and 192 counts, respectively.In contrast, BOU has its highest frequency counts at the bin centers of 250, 450, and 550 Wh/m 2 , with 508, 490, and 458 counts, respectively.These results indicate that TAM receives more GHI than BOU due to their different geographic locations and climatic conditions.Such analysis is valuable for understanding the variability of GHI and designing solar energy systems.

Feedforward Neural Networks Multi-Layer Perceptron (FNN-MLP)
Feedforward Neural Networks' multilayer perceptron, also known as FNN-MLP, is modeled on the human brain's information processing.Their known ability to learn from their environment makes them ideal for nonlinear modeling systems that are difficult to characterize analytically.Even though the architecture allows for arbitrarily small approximation errors with related weight values, there is still an obstacle to their efficiency in some form of training.The multilayer perceptron, whose architecture defines multiple layers of neurons, is today's most widely used supervised neural network for approximation problems.
The "FNN-MLP" consists of three layers: an input layer, a hidden layer, and an output layer.Each of these layers hides information from the other two layers.Synaptic weights, W ij. connect the neurons in a layer to the neurons below it.These weights determine the relative importance of each input to the output of each neuron.An activation function ensures that each neuron picks up all the information transmitted by the neurons that preceded it in the layered structure.After this step, an output signal is generated, ready for transmission to the neurons in the subsequent layer [11,16].An "FNN-MLP" with three layers is shown in Fig. 2 and is used to predict GHI.w I , and w h : connection weights (input-hidden, hidden-output) b h , and b o : columns vector of neuron bias hidden and output, respectively.
The following Eqs.( 4) and ( 5) represent the assimilation of the GHI in an accurate model that includes all inputs x i .
The instance outputs Z j of the hidden layer: The output "GHI" The following mathematical formula, which accounts for all inputs and represents the global solar radiation, is produced when Eqs. ( 1) and ( 2 The "FNN-MLP" framework was optimized for GHI prediction using MATLAB 2020b.The methods, database distribution, layer depth, neuron count, and activation functions are all included.Table 4 displays the optimized FNN-MLP model's structure.The linear transfer function (identity)

Support Vector Machines (SVM)
The Support Vector Machine, commonly known as SVM, is a supervised learning method that has gained significant popularity recently for its ability to predict meteorological data such as temperature [17] and wind speed [18].The ease of use and adaptability of the SVM method makes it suitable for a wide range of classification and regression problems across various sectors, including mechanical engineering, energy, finance [19], and more.Despite its potential use in studies with small sample sets, the SVM method has been shown to provide balanced predicted performance due to its unique characteristics [20].In an SVM model, the regression function can model nonlinear relationships between input and output.The output of an SVM model can be determined by solving a specific equation [21]: f (x i ) : The predicted data.

∅ (x i ):
The implicitly constructed nonlinear function.
b: The SVM model's bias.
The dataset has a D-dimensional input vector x i ∈ R D and a scalar output y i ∈ R.
The following equations provide the SVM optimization model (for the training set): C: The factor that balances model complexity with empirical risk w 2 ξ * i : The slack variable to represent the sample's distance from the -tube The problem above can be solved in the same manner as a standard nonlinear restricted optimization problem by utilizing the concepts of Lagrange multipliers to generate a dual optimization problem: K(x i , x j ): Mercer's condition-satisfying kernel function.a i and a * i : The non negative Lagrange multipliers.

Model Development
Two models were used to develop an accurate prediction of hourly global solar irradiance: The FNN-MLP and the SVM.The process used to evaluate and improve the structure of the FNN-MLP and SVM models is shown in detail in Fig. 3.The datasets were divided into different subsets for each model.The FNN-MLP model's datasets were divided into three subsets: The training, validation, and testing phases.For the SVM model, on the other hand, the datasets were divided into two subsets, the training phase, and the testing phase.Both subsets were created from the entire data set.to obtain the most accurate model possible.An iterative testing process was performed to determine the FNN-MLP model with the best performance.An optimal SVM model was developed using the support vector machine learning strategy for the SVM technique.The selection of appropriate kernel features is critical to the success of the SVM model.The STATISTICA software provides a wide range of kernel functions for SVM models.The penalty term for the Gaussian radial basis function parameters was set to nu = 1.0000,C = 10.0000, and Gamma = 13.93.This process determined the optimal values for the target parameters of the SVM model.

Evaluation Criteria
In this study, various error measures were employed to determine the level of accuracy of the prediction models.These error measures include the Correlation Coefficient (R), mean percentage error (MPE), and Root Mean Squared Error (RMSE).These measures are mathematically represented by Eqs. ( 8)- (10) as described in [22][23][24][25][26][27][28][29][30][31][32].These error measures allow for a comprehensive evaluation of the performance of the prediction models, providing a clear understanding of their strengths and weaknesses.
n is the number of data points; Y i,exp and Y i,cal are the experimental and calculated data points of global solar radiation, respectively; and Y i,exp is the mean experimental data.

Results and Discussion
This subsection presents the results of the models developed in the study to predict hourly global irradiation.Initially, data collected from the Bouzareah station generated two models: The FNN-MLP and SVM.The performance of these models was evaluated using three different data divisions for training, validation, and testing.The results were visualized in Fig. 4, which depicts the correlation coefficient (R) error values obtained for each division.It can be observed that division 3 outperforms the other two divisions in terms of R-values for the testing phase, yielding R = 0.9567 for the FNN-MLP model and R = 0.9715 for the SVM model.For the FNN-MLP model, division 3 consisted of 60% of the data for training, 20% for validation, and 20% for testing.For the SVM model, division 3 had 60% of the data for training and 40% for testing.The results suggest that division 3 provides the most accurate predictions, making it the optimal choice for testing the FNN-MLP and SVM models.prediction accuracy (R = 0.9876), outperforming other models such as random forest, artificial multi-neural, and adaptive approach, which also achieved high prediction accuracies ranging from R = 0.95 to R = 0.96.The K-means clustering-NAR model had the lowest prediction accuracy (R = 0.93).[33] Random forest R 2 = 0.9637 Benmouiza et al. [34] K-means clustering-NAR R = 0.93 Jallal et al. [35] Artificial multi-neural R = 0.9624 García-hinde et al. [36] SVR-PLS R = 0.94 Akarslan et al. [37] Adaptive approach R = 0.96 Guermoui et al. [38] Machine learning R 2 = 96.68-98.52Benali et al. [39] Random forest R = 0.95 The comparison results suggest that machine learning models have great potential in predicting hourly global solar radiation.However, the performance of these models can vary based on various factors, such as the quality and quantity of input data, feature selection, and the specific algorithm used.Therefore, it is important to carefully consider and test different models to achieve the best results for a particular application.The high accuracy achieved by the present work using RBF-SVM indicates that it could be a useful model for future predictions in this area.

Conclusion
This study aims to improve the accuracy of hourly global horizental irradiation prediction by using advanced machine-learning techniques.The main objective is to develop a method that optimizes the hyperparameters of conventional machine learning models, specifically multilayer Perceptron Feedforward Neural Networks (FNN-MLP) and Support Vector Machines (SVM).To achieve this, two models were used: the FNN-MLP and the SVM.
To create the most effective model FNN-MLP, the BFGS quasi-Newton method was used as the training algorithm, four activation functions were tested in the hidden layer, and a single transfer function was used.The dimensions of the hidden layers were also varied to obtain the most accurate model possible.Regarding power and performance, the SVM model with the radial basis function (RBF) kernel function gives significantly better results than the SVM models with other functions.The RBF kernel function also shows a superior capacity in characterizing the SVM model's hourly global solar irradiance forecast.
The statistical error difference values between the RBF-SVM model and the FNN-MLP model are significant, indicating the higher accuracy of the proposed RBF-SVM model in predicting global solar irradiance compared to the FNN-MLP model.Moreover, all the machine learning methods discussed in this study provide highly accurate predictions of global solar irradiance at different temporal resolutions.However, our results show that the RBF-SVM model performs better than the FNN-MLP-BFGS model in predicting hourly global solar irradiance, with an R-value of 0.99 and an RMSE of 38.70 Wh/m 2 over all phases.Moreover, this study also investigates the performance of the proposed models in different climatic regions of Algeria, which is crucial for accurately predicting solar radiation at a specific location.In this way, it could help in the design and installation of solar energy systems as well as in the evaluation of thermal conditions in building studies.
In summary, this study provides a promising alternative to the traditional methods currently used in Algeria to predict solar radiation.With its superior accuracy and performance, the RBF-SVM model can be a valuable tool for predicting global solar irradiance at any location, thus supporting the development and implementation of renewable energy sources in the country.In addition, the study opens the possibility of using these techniques in other countries with similar climate and energy needs.

Figure 1 :
Figure 1: Graphical depiction of the GHI as a function of the relative frequency of each station

Figure 4 :
Figure 4: Effect of the division of the database in term coefficient correlation (R) for the testing phase

Figure 5 :
Figure 5: Comparison of predicted and experimental data in the testing phase The linear SVM model with C = 10 and E = 0.1 achieved an RMSE of 193.312Wh/m 2 and R of 0.625 in the training phase, an RMSE of 190.394 Wh/m 2 and R of 0.628 in the testing phase, and an overall RMSE of 192.150Wh/m 2 and R of 0.626.The polynomial SVM model with C = 10, nu = 1, Degree = 3, and Gamma = 0.125 achieved the lowest RMSE of 118.351Wh/m 2 and highest R of 0.884 in the training phase, an RMSE of 120.717Wh/m 2 and R of 0.873 in the testing phase, and an overall RMSE of 119.304Wh/m 2 and R of 0.880.The RBF-SVM model with C = 10, nu = 1, and Gamma = 13.93 achieved the lowest RMSE of 32.414 Wh/m 2 and highest R of 0.991 in the training phase but had a higher RMSE of 57.326 Wh/m 2 and lower R of 0.972 in the testing phase, resulting in an overall RMSE of 38.706 Wh/m 2 and R of 0.988.The sigmoid SVM model with C = 10, nu = 0.1, and Gamma = 0.125 had the highest RMSE of 229.925Wh/m 2 and lowest R of 0.414 in the training phase, an RMSE of 223.432Wh/m 2 and R of 0.434 in the testing phase, and an overall RMSE of 227.348Wh/m 2 and R of 0.422.Overall, the RBF-SVM model with C = 10, nu = 1, and Gamma = 13.93 outperformed the other evaluated models, achieving the lowest RMSE and the highest R-value in the training phase, as well as the second-lowest RMSE and the second-highest R-value in the testing phase.This resulted in the lowest overall RMSE and the highest overall R-value.In comparison to the linear and sigmoid SVM models, the RBF-SVM model demonstrated a substantial improvement in both RMSE and R-values during the training and testing phases, indicating its efficacy in predicting hourly global horizontal irradiation.Furthermore, Fig. 6 reveals a robust alignment between the predicted and actual values gathered from the solar irradiation measurement station, further substantiating the models' dependability and precision in predicting global irradiation using the chosen input features.

Figure 6 :
Figure 6: Comparison of predicted and actual hourly global solar irradiation

Table 1 :
Geographical region and period covered by stations in this investigation

Table 2 :
Statistical analysis of input and output variables for solar irradiance prediction at two stations

Table 3 :
Climatic-output correlations for solar irradiance prediction at two stations

Table 5 :
Statistical evaluation of FNN-MLP model performance on solar irradiation prediction

Table 6 :
Performance evaluation of SVM models with various kernels

Table 7 :
Performance evaluation of developed models on solar irradiation prediction for BOU and TAM stations Moreover, incorporating additional inputs such as MON, PRE, and WID did not yield improvements in the model's performance.These findings indicate that the SVM-RBF model provides a more accurate GHI prediction than the FNN-MLP model.The performance of the present work model in predicting hourly global solar radiation was compared with different techniques used in previous studies.Table 8 compares different machine learning models used in literature studies, providing insights into the effectiveness of the present work model compared to other techniques.The RBF-SVM model used in the present work achieved the highest

Table 8 :
Comparison of the present results with the literature studies in predicting hourly global solar irradiance