Artificial Intelligence Based Solar Radiation Predictive Model Using Weather Forecasts

: Solar energy has gained attention in the past two decades, since it is an effective renewable energy source that causes no harm to the environment. Solar Irradiation Prediction (SIP) is essential to plan, schedule, and manage photovoltaic power plants and grid-based power generation systems. Numerous models have been proposed for SIP in the literature while such studies demand huge volumes of weather data about the target location for a lengthy period of time. In this scenario, commonly available Artificial Intelligence (AI) technique can be trained over past values of irradiance as well as weather-related parameters such as temperature, humidity, wind speed, pressure


Introduction
In general, sixty percent of a building's energy is consumed for ventilation, air-conditioning, and heating functions [1,2].This energy could be saved through optimal control of heating and air-conditioning operations of the building.One of the major problems faced in terms of future global energy source is the combination of renewable energy source (mainly non-predictable ones such as solar and wind) to produce energy from current or upcoming energy sources.It is a must for an electrical operator to guarantee a proper balance between electricity production and consumption at any time.However, the operator faces several challenges during most of the times to preserve this balance with controllable and conventional energy production methods, mostly in small or not interrelated (i.e., isolated) electricity networks (that originate in an island, for example).The consistency of electric grid is decided based on the capacity of network to meet the unexpected and expected variations (i.e., in terms of consumption and production) and conflicts, while at the same time, preserving continuity and quality of facility to the consumers in a seamless manner.Afterwards, the energy provider should be able to handle the network with several timebased horizons [3].
A combination of renewable sources, connected with an electric system, complicates the network management process and consistency of consumption or production balance, owing to its unpredictable and intermittent environment [4].Solar energy production is a non-controllable and intermittent energy source due to which various challenges are faced like local power quality, stability issues, and voltage fluctuations.Therefore, predicting the output power of solar PV system is essential for efficient functioning of electrical network or optimum management of energy flows that occur in solar PV system [5].It is also required in electric network scheduling, resource estimation, optimum management of storage with stochastic production, congestion management, cost reduction in the production of electricity, and finally trade the energy generated in electricity market.It has become highly significant to predict the production of energy from solar PV since there is a significant rise in solar power production in the recent years.To prevent huge differences in renewable electricity generation, it is essential to involve the whole predictive system operation with storage results.

Role of Artificial Intelligence in Solar Irradiation Prediction
Artificial Intelligence (AI) technique has been applied in the recent years to predict performance improvement in SIP with regards to its capacity for simulating nonlinear and complex relations and manage the lost information [6,7].Various AI methods have been proposed earlier to predict SR methods like data mining, Fuzzy Logic (FL), support vector regression, genetic programming, regression tree, and Artificial Neural Network (ANN).Amongst the AI methods proposed so far, Adaptive Neuro-Fuzzy Inference System (ANFIS), a combination of ANN and FL techniques, is considered as one of the most effective models.Numerous investigations have demonstrated that ANFIS method is highly effective in predicting SR [8].For instance, hybrid and classical ANFIS methods are combined with ANFIS through Differential Evolution Algorithm (DEA), Particle Swarm Optimization (PSO), and Genetic Algorithm (GA) techniques.While these algorithms have been utilized in the prediction of monthly global SR from distinct metrological variables such as minimum and maximum rainfall, air temperature, sunshine time and clearness index, when positioned in Kuala Terengganu, Malaysia.The outcomes exhibited that a hybrid ANFIS-PSA achieved optimum SR prediction in comparison with other techniques.
Traditional methods like Multiple Linear Regression (MLR) and distinct kinds of AI techniques involving ANFIS have been established earlier to predict everyday global SR in Iraq by distinct metrological variables.The outcomes demonstrated that ANFIS offers precise outcomes than other prediction methods.A relative study conducted upon distinct AI methods in predictive SR exposed that ANFIS is one of the most appropriate methods for simulating SR.This is attributed to its capability to conquer the uncertainties related to time-sequential data.But, the main challenge of this method i.e., ANFIS is the change in hyper variables such as optimization of membership variable functions.Consequently, the research works conducted earlier combined classical ANFIS method with several optimization techniques to improve its efficiency.However, the efficiency of the present hybrid ANFIS method is too inspiring.However, its predictive ability needs improvement by assuming the significance of SR accuracy measurement.Moreover, one of the main drawbacks of present SR predictive method is its demand for several parameters as input.These parameters could not be made easily available due to lack of monitoring network.

Paper Contributions
The current study introduces an effective solar irradiance prediction model by integrating big data analytics and AI models (BDAAI-SIP) and weather forecast data is applied in this model.To manage the long-term collection of weather data, Hadoop MapReduce tool is utilized.At the beginning, the presented BDAAI-SIP model undergoes data preprocessing to boost the quality of weather-related data.Besides, Elman Neural Network (ENN), a type of Feedforward Neural Network (FFNN) is applied for predictive analysis.It can be separated as input layer, hidden layer, load-bearing layer, and output layer.To optimize the parameters, Mayfly Optimization (MFO) algorithm is used.In order to validate the efficacy of the proposed BDAAI-SIP model, a set of simulations was conducted.In short, the contributions of the paper are summarized herewith.
• A novel BDAAI-SIP model is proposed to predict solar irradiation with the help of weather forecasting data.• AI-based preprocessing is performed through three different ways such as data conversion, missing value replacement, and data normalization.• ENN model, comprising of load-bearing layer, is employed for prediction purposes.
• In order to tune the weights and hidden layer neuron count in ENN model, MFO algorithm is applied.• Parameter optimization of ENN model further helps in increasing the predictive results of the proposed BDAAI-SIP model • The performance of BDAAI-SIP model was validated under several aspects and a comparative analysis was made.

Paper Organization
Rest of the sections in this paper are organized as given herewith.Section 2 offers the existing works related to SIP.Section 3 introduces the system methodology of the proposed BDAAI-SIP model.Section 4 validates the performance of BDAAI-SIP model and Section 5 concludes the paper.

Prior Works on Solar Irradiation Prediction Models
Several investigations have been conducted earlier with regards to Model Predictive Control (MPC), an optimal control approach that is introduced to assure effective system operation and control the air-conditioning process.Numerous researches have established the influence of decreasing the energy consumption of a building via MPC.The efficiency of MPC control is influenced by accurate information about hourly load prediction of a building.While this load consumption requirement gets influenced by climate data of the upcoming day.Thus, most of the methods require weather forecasting data.The general aspects that influence the loads are solar irradiance and outside air temperature.Though it is easy to predict outside air temperature due to small hourly variations, it is challenging to predict the real hourly values of solar irradiance.
In prior MPC investigations, solar irradiance prediction technique has been rarely stated.Several investigations in the literature utilized the information offered by energy analyses program.Otherwise, the studies considered the complete forecasted information about the quantity of solar irradiance in solar irradiance predictive method [9].In general, solar irradiance predictive technique is either based on data or physics [10].Physical method is commonly established depending upon solar geometry to create an experimental connection between solar irradiance data and meteorological variables measured in previous monitoring areas [11].Black [12] established a method to predict solar irradiance by examining the connection between sky cover and solar irradiance data collected for a period of 3 years in an area.Likewise, Samimi [13] established a solar irradiance method with high accuracy in which the researcher utilized climate data of Iran collected over 17 years.
Paltridge et al. [14] proposed a physics-based solar irradiance prediction method utilizing several climate variables like long-term accumulated data, humidity, wind, and precipitation.However, according to Premalatha et al. [15], to define solar irradiance coefficient in an area under study, physics-based weather prediction method needs long-term measured data or information that is complex for protecting, in typical weather predicting data.Thus, this method could not be employed in the prediction of upcoming day solar irradiance.Solar irradiance method that depends upon physical method, is established to calculate annual/monthly overall solar irradiance, instead of real-world predictive methods like MPC application [16].
Lago et al. [17] stated that the NN framework is beneficial in prediction and evaluation of the time sequential information with high arbitrariness.Jiang [18] stated that ANN predictive method demonstrates high accuracy than the experimental physical solar irradiance predictive method.Solar irradiance predictive method was introduced earlier based on learning data and it integrates several learning techniques based on the objective.Sharma et al. [19] proposed a method that grasps the scenario in 15 min period whereas, Kemoku et al. [20] conducted an investigation using FFNN to predict upcoming day solar irradiance by learning solar irradiance information for previous 6 years in Japan.Ahmad et al. [21] performed an investigation to identify the optimal integration of input variables to predict the solar irradiance including climate integration of 12 conditions of solar irradiance predictive in New Zealand.
Benmouiza et al. [22] presented a learning technique different from existing ANN models to predict solar irradiance.The investigations on ANN-based solar irradiance predictive method are widely performed and in recent times, the predictive method does not utilize the information attained from developed land.In order to predict the local solar irradiance, Rodríguez et al. [23] proposed an ANN method that grasps information attained from a satellite.This study considered six years data collected from various places.Srivastava et al. [24] introduced a solar irradiance predictive method which analyzes a major quantity of satellite data collected from different European countries.This method was utilized for studying nine years of climate conditions in 21 cities across Europe and US.

The Proposed BDAAI-SIP Model
The overall system architecture is shown in Fig. 1.As shown in the figure, the proposed BDAAI-SIP model undergoes three major processes namely, data preprocessing, predictive analysis, and parameter optimization.Besides, the Hadoop MapReduce tool is also applied to handle the massive collection of weather forecast data.

Overall System Methodology
The processes involved in overall system methodology are briefed herewith.
• Initially, weather-related data is fed as input to BDAAI-SIP model and it is analyzed in big data analytics environment.• Then, preprocessing is performed through three different stages such as data conversion, missing value replacement, and data normalization.• Followed by, ENN-based predictive model is applied for prediction.This model makes use of a load bearing layer that transmits the state information and memory.• Next, the parameter optimization of ENN model takes place using MFO algorithm to optimally determine the values of weights and hidden layer neuron count.• Lastly, the performance of the BDAAI-SIP model is validated on benchmark dataset and the results are investigated in terms of different aspects.

Hadoop Mapreduce
In order to manage big data, Hadoop ecosystem and its components are widely applied.In a distributed atmosphere, Hadoop is a type of open-source design that allows a stakeholder to process big data on computer clusters with the help of simple programming systems.Since a single server has thousands of nodes, it can be simulated to involve improved scalability as well as fault tolerance.The three major components of Hadoop are MapReduce, Hadoop Distributed File System (HDFS), and Hadoop YARN.

Hadoop Distributed File System (HDFS)
Google File System (GFS) demonstrates HDFS as a structure of variety with master/slave, where master has more than one data node and is named after actual data whereas different name nodes are known to be metadata.

Hadoop Map Reduce
In order to offer massive scalability on thousand Hadoop clusters, Hadoop Map Reduce is utilized in the name of Apache Hadoop heart, a programming structure.To process huge information on massive clusters, MapReduce is utilized.Two essential stages are involved in MapReduce job modeling namely, Reduce and Map stage.All the stages contain key value pairs from input as well as output i.e., from the file system, combined output as well as input of the job are stored.The framework handles different tasks such as task scheduling, re-execution of the failed tasks and controlling the tasks.MapReduce framework contains one slave node control and a single master resource manager in every cluster node.

Hadoop YARN
Hadoop YARN is a method utilized to manage the clusters.Based on the experience obtained from initial Hadoop generation, the second Hadoop generation is processed as an essential feature.YARN functions as a central structure and resource manager over Hadoop clusters in order to deal security, reliable functions, and data governance tools.In big data management, another platform device and components are installed on Hadoop framework.

Data Preprocessing
Data pre-processing is an important part of AI technique and can considerably enhance the efficiency of the model.During data preprocessing in BDAAI model, the data initially undergoes conversion process in which the categorical values are transformed into numerical values.Besides, missing values' replacement occurs to replace the missing values with alternate ones.Finally, minmax based data normalization process is applied to adjust the dataset to a uniform scale.In this technique, maximal and minimal values from a set data are examined.Every other data is normalized to these values.The purpose of normalization is to make the minimum value to zero and maximum value to one so that every other data is distributed in the range of 0 to 1. Eq. ( 1) provides the equation for min.-max.normalization.

Elman Neural Network (ENN)-Based Predictive Model
ENN was presented by J. L. Elman to solve speech signal process in 1990 [25].ENN is a dynamic recurrent network.On the contrary to conventional BPNN, ENN has a specific layer called context layer that enables this network with the capability to learn the time changing patterns.Due to this feature, ENN is highly appropriate for separate time sequence problems.ENN framework is given in Fig. 2. Excluding the context layer, the remaining portions are assumed to be conventional multiple-layer networks.The context layer shown in Fig. 2 is acquired from the outputs of hidden layer.Later, the outcomes of context layer are fed as input to the hidden layer along with the following set of external input layer data.The data, collected from earlier times, is reused and stored in these features.2 has a n dimension external input layer which is denoted by x 1 (t) = [x 1,1 (t), x 1,2 (t), . . ., x 1,n (t)] T , now t represents tth input series.For ease, the output of previous layer is implemented in n neuron, and the output vector of the layer is given by y(t) = [y 1 (t), y 2 (t), . . ., y n (t)] T .The neuron present between context and hidden layers are individually equivalent.Later, the count of neuron context layer is denoted by m, that is similar to hidden layer.The hidden layer input from the context layer is determined by x 2 (t) = c(t − 1) = [c 1 (t − 1), c 2 (t − 1), . . ., c m (t − 1)] T .The whole input vector of the network is given by where k = m + n.Matrixes among three layers are denoted by W hi (t), W hc (t) and W oh (t) correspondingly.It is necessary to recognize the size of these matrices [26].With the evaluation of dimension of every layer, W hi (t) ∈ R m×n , W hc (t) ∈ R m×m and W oh (t) ∈ R n×m are attained.
Here, y(t) denotes the original output of this network and d(t) represents the desired output vector.When the activation function is selected as sigmoid function, then y(t) is calculated as follows The input of the hidden layer is comprised of two portions namely context and external inputs, so, W h (t) = [W hi (t) W hc (t)] ∈ R m×k .With whole input vector x(t) and sigmoid activation function, the output of hidden layer is given by The aim of this network is to reduce the error: To reduce(t), the update of every weight matrix is calculated by the formula given below.
here, μ represents the learning rate, and

Mayfly Optimization (MFO) Algorithm Based Parameter Optimization
The choice of parameters in ENN model is a crucial element to attain an effective classification outcome.Most of the ML models include multiple parameters that need to be optimized.Since trial-and-error method is infeasible, metaheuristic optimization based-MFO algorithm is applied in the selection of parameters.In general, the predictive error function acts as the objective function of MFO algorithm [27].Among mayflies, the swarms for MO technique are divided into male as well as female separately.When male mayflies are stronger, it subsequently acts as the optimal factor in optimization.When separate optimization is compared with that of swarms in PSO technique, the individuals in MO technique upgrade their location based on its present location, p i (t) and velocity v i (t) at present iteration: Every male mayfly and female mayfly upgrades its location in Eq. ( 13).But, its velocity gets upgraded in different ways.

Movements of Male Mayflies
Male mayfly swarms are performed with exploration or exploitation process during iterations.The velocity gets upgraded based on its present fitness value, f (x i ) and historical optimal fitness value in paths f (x h i ).When f (x i ) > f (x h i ), the male mayflies upgrades their velocities based on its current velocities.This value is combined with the distance between them and the global optimal location.The historical optimal path is defined herewith.
where g implies the variable that gets reduced linearly from maximal value to lesser one.a 1 , a 2 , and β are three constants to balance the values.r p and r g are two variables that are generally utilized in informing the Cartesian distance amongst the individuals and their historical optimal position with that of the global optimal location in swarms.Cartesian distance is the second norm to distance array and is given below.
Conversely, when f (x i ) < f (x h i ), the male mayflies upgrades their velocities in the present one with an arbitrary dance coefficient d: where, r 1 represents the arbitrary number from uniform distribution and is chosen in the domain ranged between −1 and 1.

Movements of Female Mayflies
Female mayflies upgrade their velocities through various styles.Biologically speaking, female winged-mayflies live only for a time span of 1-7 days.Thus, the female mayflies rush to detect the male mayflies for mating and reproduction.So, the velocities of female mayflies are upgraded according to male mayflies since it is required for mating purpose.In this MO technique, top optimal female and male mayflies are defined as the initial mate, and the second optimal female, male mayflies are defined as second mates, etc.Therefore, the i-th female mayfly, when f (y i ) < f (x i ), is denoted by where, a 3 signifies another constant and is utilized for balancing the velocities.r m implies the Cartesian distance between them.In contrast, when (y i ) < f (x i ), the female mayflies upgrade its velocities in the present one with other arbitrary dance, fl: where, r 2 denotes the arbitrary number from a uniform distribution in the domains ranged between −1 and 1.

Mating of Mayflies
Every top half female and male mayfly is mated and produce a pair of children.Its offspring are arbitrarily developed by their parents: where L represents the arbitrary numbers from Gaussian distribution.

Performance Validation
In order to assess the predictive performance of BDAAI-SIP model, a set of simulations was conducted using HI-SEAS Solar Irradiance Prediction dataset sourced from Kaggle repository [28].The dataset holds weather-related details from HI-SEAS Habitat in Hawaii.Particularly, the dataset comprises of the following parameters namely, Solar Irradiance (W/m 2 ), Temperature ( • F), Barometric Pressure (Hg), Humidity (%), Wind Direction ( • ), Wind Speed (mph), and Sun Rise/Set Time.The results were validated under two measures such as Mean Square Error (MSE) and Root Mean Square Error (RMSE) methods.
Tab. 1 and Fig. 3 shows the results of the analysis achieved by BDAAI-SIP model in terms of MSE and RMSE on the applied dataset.From the table, it is evident that the BDAAI-SIP model attained minimal MSE and RMSE values for training, testing, and validation datasets.For instance, on the applied fold-  Tab. 2 and Fig. 5 shows the results yielded by BDAAI-SIP model under varying times in terms of predicted irradiance.From the table, it can be understood that BDAAI-SIP model appropriately predicted the irradiance.In other terms, the difference from actual true value is considerably lower than the compared methods.For instance, for a time period of 8 h with true value being 23.380 W/m 2 , the BDAAI-SIP model has predicted the irradiance to be 42.380,whereas other methods such as LSTM, BPNN, Persistence, and LR models predicted inferior results for predictive irradiance such as 058.640, 148.145, 080.338, and 112.886W/m 2 respectively.From the above discussed results of the analysis, it is evident that the presented BDAAI-SIP model is an effective predictive tool for solar irradiation.The current research article designed a novel BDAA-SIP model to predict the solar irradiance using weather forecast data.Initially, weather related data are fed as input to BDAAI-SIP model which undergoes analysis on big data environment.Then, preprocessing is conducted through three different stages such as data conversion, missing value replacement, and data normalization.Followed by, ENN-based predictive model is applied in the prediction.This model makes use of a load bearing layer that transmits state information and memory.Next, the parameter optimization of ENN model takes place through MFO algorithm in order to optimally determine the values

Figure 1 :
Figure 1: The overall working process of BDAAI-SIP model

Figure 2 :
Figure 2: Structure of ENN model ENN is shown in Fig.2has a n dimension external input layer which is denoted by x 1 (t) = [x 1,1 (t), x 1,2 (t), . . ., x 1,n (t)] T , now t represents tth input series.For ease, the output of previous layer is implemented in n neuron, and the output vector of the layer is given by y(t) = [y 1 (t), y 2 (t), . . ., y n (t)] T .The neuron present between context and hidden layers are individually equivalent.Later, the count of neuron context layer is denoted by m, that is similar to hidden layer.The hidden layer input from the context layer is determined byx 2 (t) = c(t − 1) = [c 1 (t − 1), c 2 (t − 1), . . ., c m (t −1)] T .The whole input vector of the network is given by

Figure 4 :
Figure 4: Average RMSE analysis of BDAAI-SIP model of 12 h, with a true value of 530.577W/m 2 , the BDAAI-SIP model predicted the irradiance to be 528.489,whereas other methods such as LSTM, BPNN, Persistence, and LR approaches produced inferior outcomes with predictive irradiance values being 530.577, 449.208, 571.261, and 177.980W/m 2 respectively.Meanwhile, for a time period of 14 h, with a true value of 256.637W/m 2 , the BDAAI-SIP model predicted the irradiance to be 331.637,whereas other methods such as LSTM, BPNN, Persistence, and LR techniques portrayed inferior results with the predictive irradiance of 479.044, 454.633, 809.942, and 080.338W/m 2 correspondingly.

Figure 5 :
Figure 5: Results of the analysis of BDAAI-SIP model under distinct irradiance

Table 1 :
1, the BDAAI-SIP model achieved the least MSE values such as 8596.998,8596.998, and 8951.052 on the applied training, testing, and validation datasets respectively.Similarly, the BDAAI-SIP model obtained minimal RMSE values such as 92.72, 93.68, and 94.61 on the applied training, testing, and validation datasets respectively.Likewise, on the applied fold-3, the BDAAI-SIP technique accomplished minimum MSE values such as 8794.688,8998.420, and 8777.816 on the applied training, testing, and validation datasets correspondingly.Concurrently, the BDAAI-SIP approach obtained lesser RMSE values such as 93.78, 94.86, and 93.69 on the applied training, testing, and validation datasets respectively.At the same time, on the applied fold-5, the BDAAI-SIP model reached the least MSE values such as 8253.723,8675.060, and 8738.510 on the applied training, testing, and validation datasets respectively.Simultaneously, the BDAAI-SIP method accomplished low RMSE values such as 90.85, 93.14, and 93.48 on the applied training, testing, and validation datasets correspondingly.In addition, on the applied fold-7, the BDAAI-SIP method yielded minimum MSE values such as 8326.563,8535.912, and 8764.704 on the applied training, testing, and validation datasets correspondingly.Followed by, the BDAAI-SIP model obtained lesser RMSE values such as 91.25, 92.39, and 93.62 on the applied training, testing, and validation datasets respectively.Moreover, on the applied fold-10, the BDAAI-SIP technique obtained the least MSE values such as 8753.474,8222.862, and 8796.564 on the applied training, testing, and validation datasets respectively.At last, the BDAAI-SIP approach attained lesser RMSE values such as 93.56, 90.68, and 93.79 on the applied training, testing, and validation datasets correspondingly.Results of the analysis of the proposed BDAAI-SIP model on applied dataset in terms of MSE and RMSE under training, testing and validation datasets

Table 2 :
Results of the analysis of irradiance variation on different methods with true valuesBesides, for a time period of 10 h, with its true value being 183.405W/m 2 , the BDAAI-SIP model predicted the irradiance to be 253.405.While other methods such as LSTM, BPNN, Persistence, and LR approaches showcased inferior results with predictive irradiance values being 397.675, 400.387, 492.605, and 335.293W/m 2 respectively.Eventually, for a time period

Table 3 :
Results of the analysis of existing methods and the proposed BDAAI-SIP method on training and testing dataset