Electrical load forecasting is very crucial for electrical power systems’ planning and operation. Both electrical buildings’ load demand and meteorological datasets may contain hidden patterns that are required to be investigated and studied to show their potential impact on load forecasting. The meteorological data are analyzed in this study through different data mining techniques aiming to predict the electrical load demand of a factory located in Riyadh, Saudi Arabia. The factory load and meteorological data used in this study are recorded hourly between 2016 and 2017. These data are provided by King Abdullah City for Atomic and Renewable Energy and Saudi Electricity Company at a site located in Riyadh. After applying the data pre-processing techniques to prepare the data, different machine learning algorithms, namely Artificial Neural Network and Support Vector Regression (SVR), are applied and compared to predict the factory load. In addition, for the sake of selecting the optimal set of features, 13 different combinations of features are investigated in this study. The outcomes of this study emphasize selecting the optimal set of features as more features may add complexity to the learning process. Finally, the SVR algorithm with six features provides the most accurate prediction values to predict the factory load.

Global warming and energy security are the most critical problems that face the world today [

In electrical power systems, machine learning algorithms have been used to tackle a variety of forecasting applications, including load forecast, market price forecast, Photovoltaics power forecast, and fault detection forecast [

In Saudi Arabia, for example, the electricity consumption of residential, governmental, commercial, educational, industrial, and mosque buildings has increased with the steady growth in the population. According to the Saudi Energy Efficiency Center, 47% of Saudi Arabia's primary energy is consumed by the industry sector [

To forecast the electricity load, three main steps need to be fulfilled to ensure accurate forecasting outcomes [

Different load types have been forecasted in the literature, including commercial, residential, industrial, educational buildings, and distribution feeders. In [

The study in [

The authors in [

The electrical load is predicted for the next day in urban areas using weather data in [

The researchers in [

In [^{2}

Furthermore, the researchers in [

From the aforementioned discussion, the meteorological and historical load data are the two types of features that may have an impact on forecasting the electrical load. In addition, the machine learning algorithms prove their efficacy in a variety of forecasting applications. Hence, the main contributions of this study are summarized as follows:

To develop a data-driven forecasting model of the electrical load of a factory located in Riyadh, Saudi Arabia. A short-term (24 h ahead with one-hour time step) load forecast is accomplished in this study utilizing different machine learning algorithms, namely ANN and SVR based on Radial Basis, Polynomial, and Linear Kernel functions.

To study the impact of historical load and meteorological data on load forecast. The weather data considered in this study are Air temperature, Cloud Capacity, Global Horizontal Irradiance, Relative Humidity, Barometric Pressure, Wind Direction, and Wind Speed. The primary goal of considering such variables is to investigate their influence on the electrical load, which may help the decision-makers solve problems in an electrical load before they occur.

To compare the performance of ANN and SVR in forecasting the electrical load of the factory.

To build an electrical load forecasting framework that may help other researchers, who are interested in load forecast, to start with.

This study uses different feature schemes to evaluate the impact of the meteorological variables on load forecast. For that reason, all features schemes are compared with other selected features schemes to obtain the optimal set of features that led to the most accurate prediction results. The selection of these features is based on the Pearson autocorrelation values. Finally, the prediction results of the load of the factory are interpreted, compared, and discussed. A total of 52 forecasting models are created for the goal of forecasting the factory load. The forecasting objective and the forecasting models generated in this study are as follows:

Forecasting Objective:

ANN with 13 different sets of features (_{i}_{i}

SVR based on the Radial Basis function with 13 different sets of features (_{i}_{i}

SVR based on a Polynomial function with 13 sets of features (_{i}_{i}

SVR based on a Linear function with 13 sets of features (_{i}_{i}

_{i}

This study uses MATLAB R2021 and the

In the first step of the framework, the data are collected and organized. The data include the factory load and the meteorological datasets. Similarly, in the next step, the data preprocessing techniques are applied to prepare the data. These include data collection, data cleaning, data monitoring, and data normalization processes. The data pre-processing techniques are discussed in detail in Section 4. Moreover, in the third step, different sets of features (13), as shown in

Feature description | Unit | Abbreviation | Input feature |
---|---|---|---|

Month | month | M | _{1} |

Day | day | D | _{2} |

Hour | hour | H | _{3} |

Power at the previous hour | kW | P-1 | _{4} |

Power on last day at the same hour | kW | P-24 | _{5} |

Power last week at the same hour | kW | P-168 | _{6} |

Air temperature | ^{°}C |
Temp | _{7} |

Cloud capacity | CU | CC | _{8} |

Global horizontal irradiance | Wh/m^{2} |
GHI | _{9} |

Relative humidity | % | Hum | _{10} |

Surface pressure | hPa | Pre | _{11} |

Wind direction | °N | WD | _{12} |

Wind speed | m/s | WS | _{13} |

In the fourth step, the data are divided into two sets: training and testing. The training data are utilized to train the machine learning algorithms and the test set is used to examine the accuracy of the prediction models. Later, in the fifth step, SVR and ANN are applied to build the forecasting models for each forecasting objective using the formulated set of feature schemes. All the training models are exposed to the same set of data. Lastly, in the sixth step, the test dataset is used to evaluate each of the prediction models, and the best forecasting model is then selected. The evaluation is done based on the following evaluation metrics: RMSE, normalized Root Mean Square (nRMSE), MAE, and MAPE.

Machine learning algorithms need data pre-processing and cleaning stages to prepare the data for the learning algorithm. For instance, the data used in this research work are obtained from different sources. Therefore, the data are required to be organized to create the set of data that are fed to the learning algorithms. In the field of data analysis, three main observations need to be acquired: data, information, and knowledge [

The data to be analyzed can come from different data providers. This can add difficulty to the data and impact their reliability if they come from unreliable sources. In addition, these data are subjected to be imprecise which adds difficulties to deal with. Therefore, importing and collecting data from reliable sources is very significant that results in accurate data analysis [

After getting the data from reliable data sources, the data are required to be extracted. Some of the data providers make the data publicly available. In other words, the data can be downloaded freely. On the other hand, prior permission is required by some data resources due to their confidentiality. Other providers are offering the data with fees, which vary depending on the data amount. In this study, the data are collected from SEC and K.A.CARE with prior permission to use these data.

In this stage, the data are converted to a consistent form. That is, they are transformed from the form of the extraction to the structure of data to be utilized [

Data integration is a crucial step in data processing. The data need to be integrated uniquely and uniformly [

Data analysis is the step when we go deep into the data. In this step, many hidden features in the dataset will be revealed that helps in forecasting the future values of the factory load [

Data are subjected to be missing or imprecise. The data used to forecast the factory load are usually based on the past readings of different variables. Using faulty input data will impact the accuracy of the forecasting algorithms [

With data cleaning, we can attain different objectives in the interest of building a precise load forecasting model. Some of the benefits of applying data cleaning process are as follows [

In data extraction, a variety of observations is expected to be obtained. These observations may be redundant or irrelevant to the problem at hand. The data cleaning helps in removing such observations to create a set of data that are manageable and meaningful.

As mentioned previously, the data are transformed into one structure which could create structural errors. Data cleaning can help in capturing such errors that could be fixed in the data transformation step.

Both electrical buildings’ load and meteorological datasets are forms of data that may contain hidden patterns that are required to be investigated and studied. In this study, the meteorological data are analyzed through different data mining techniques aiming to predict the electrical load of a factory. The factory selected in this study is located in Riyadh, Saudi Arabia. Therefore, the meteorological data are gathered from a station located in Riyadh, and these data are recorded hourly from 2016 to 2017 by K.A.CARE. On the other hand, the factory load is recorded also hourly from 2016 to 2017 and is collected from the Saudi Electricity Company (SEC). The data of one year is sufficient since it covers all seasons (fall, winter, spring, and summer) and the electrical load variation due to the seasonality will be captured by machine learning algorithms.

Obtaining the correlation values is a critical step for comprehending and visualizing the datasets. The Pearson correlation coefficient is used to determine the degree of correlation between two variables. Pearson correlation is represented in _{Weather}_{Load}_{Weather}_{load}

As mentioned previously, some of these variables have no impact on the forecasting outcomes and may add complexity to the forecasting process. From this, a total of 13 sets of features are formulated that contain different features.

Set of features | |
---|---|

_{1} |
_{4} |

_{2} |
_{4}, _{6} |

_{3} |
_{4}, _{6}, _{5} |

_{4} |
_{4}, _{6}, _{5}, _{7} |

_{5} |
_{4}, _{6}, _{5}, _{7}, _{3} |

_{6} |
_{4}, _{6}, _{5}, _{7}, _{3}, _{9} |

_{7} |
_{4}, _{6}, _{5}, _{7}, _{3}, _{9}, _{10} |

_{8} |
_{4}, _{6}, _{5}, _{7}, _{3}, _{9}, _{10}, _{11} |

_{9} |
_{4}, _{6}, _{5}, _{7}, _{3}, _{9}, _{10}, _{11}, _{13} |

_{10} |
_{4}, _{6}, _{5}, _{7}, _{3}, _{9}, _{10}, _{11}, _{13}, _{12} |

_{11} |
_{4}, _{6}, _{5}, _{7}, _{3}, _{9}, _{10}, _{11}, _{13}, _{12}, _{1} |

_{12} |
_{4}, _{6}, _{5}, _{7}, _{3}, _{9}, _{10}, _{11}, _{13}, _{12}, _{1}, _{2} |

_{13} |
_{4}, _{6}, _{5}, _{7}, _{3}, _{9}, _{10}, _{11}, _{13}, _{12}, _{1}, _{2}, _{8} |

In this study, 13 sets of features are created to predict the factory load. These features are formulated based on their correlation values with the factory load, as presented in _{1}) contains one feature that has the highest correlation value. This feature is (P–1), which is the power at the previous hour in kW. Similarly, the F_{2} contains F_{1} and has the second-highest correlation value, which is the power last week at the same hour in kW (P–168). A similar analogy exists with the remaining set of features. The variables used in this study to forecast the factory load have different numerical scales. For example, the variable ^{2}. This variation of numerical values among the input variables affects negatively the learning process. That is, the machine learning algorithms will add higher weight to the variables with greater numerical values, which will eventually impact the prediction outcomes. Therefore, and to avoid this obstacle, all the prediction variables need to be normalized to be between a specific range. Through using data normalization or what is called dimensionality reduction, the data have the same weight without losing information in the input data. In this study, the input data listed in _{i}_{i}^{n}_{max}_{min}

ANN is an information computing system that mimics the approach that the human brain analyzes the information. ANN is created similar to the human brain where a huge number of neuron nodes are interconnected to tackle problems that represent the uniqueness of this network.

In this research, the number of hidden layers is selected and identified based on trial and error until we attain the most suitable number of hidden layers that provide an ANN model with high accuracy. The optimal number of hidden layers is obtained when the nodes are equal to the number of input features. For example, three nodes are used with F_{3}, while seven nodes with F_{7}. The model output, therefore, can be calculated using _{j}, j = 0, 1, …, n_{ij}, i = 1, 2, …, m; j = 0, 1, …, n}_{0}_{0j}

Support vector machine (SVM) is a supervised learning approach utilized for classification, regression problems, or outliers’ detection. When two classes cannot be separated, a kernel function is employed to map the input space to a higher-dimensional space. In that new space, the input space may be separated linearly [

In this study, the performance of the three kernel functions to predict the factory load is compared with ANN. The leading optimization can be formulated in

_{i}, x_{j}) = ϕ(x_{i})^{T}(x_{j})

To evaluate the models that are built in this study, different statistical indicators are used. These include Root Mean Square Error (RMSE), normalized Root Mean Square (nRMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE). These metrics can be expressed by _{i}_{max}

In this subsection, the results of forecasting the factory load are presented. This includes the results of the best forecasting algorithm and the best set of features that result in the best forecasting values for the considered load.

In addition, _{6} and F_{10}).

F_{6} |
||||
---|---|---|---|---|

ANN | SVR-RB | SVR-Poly | SVR-Linear | |

12.3961 | 13.8013 | 15.7639 | ||

3.4251 | 3.8134 | 4.3556 | ||

8.6882 | 9.5621 | 10.6447 | ||

6.5594 | 6.9861 | 7.8331 | ||

F_{10} |
||||

ANN | SVR-RB | SVR-Poly | SVR-Linear | |

13.3891 | 14.2854 | 15.8223 | ||

3.6995 | 3.9471 | 4.3718 | ||

9.6872 | 9.8615 | 10.6500 | ||

7.0207 | 7.0984 | 7.8341 |

As mentioned earlier, four forecasting algorithms are built to forecast the factory load. According to _{10}, the SVR-RB forecasting model has an RMSE value of 11.66 kW and an MAE of 8.25 kW, while the ANN model has an RMSE value of 13.39 kW and an MAE value of 9.69 kW. Similarly, with all other forecasting models, the SVR-RB leads to the best forecasting results followed by ANN and SVR-Poly models. The SVR-Linear model can be considered the worst as it leads to the largest statistical errors.

According to _{6} led to the best forecasting results as compared to the other sets. As listed in

_{1}, F_{6}, F_{10}, and F_{13}, respectively, when they are plotted against the measured factory load readings using the SVR-RB forecasting algorithm. It can be shown that SVR-RB with F_{6} has the best regression plot as the data concentrate around the regression line.

The best forecasting model to predict the factory load is by using the SVR-RB algorithm with the feature set F_{6} (SVR-RB-F_{6}). _{6} on three different test days. It can be noticed that SVR-RB has the best performance as it can track the measured factory reading compared to other forecasting models.

In this study, a factory load located in Riyadh, Saudi Arabia, is forecasted using four machine learning algorithms, namely Artificial Neural Network, Support Vector Regression based on Radial Basis function (SVR-RB), Support Vector Regression based on Polynomial function, and Support Vector Regression based on Linear function. To predict the factory load, 13 independent variables are used. However, and from the fact that more features do not always provide accurate forecasting outcomes, 13 sets of features are formulated to identify the set that provides the most accurate forecasting values. The selection of these sets was conducted based on the features correlation values with the actual reading of the factory load. To evaluate the performance of the built forecasting models, some statistical indicators are used, namely Root Mean Square Error (RMSE), normalized Root Mean Square Error (nRMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE).

The results show that SVR-RB is the best forecasting algorithm to forecast the factory load. For the factory load, the SVR-RB with six features (SVR-RB-F_{6}) resulted in the lowest statistical error with RMSE of 11.63 kW, nRMSE of 3.21%, MAE of 8.21 kW, and MAPE of 6.02%. The SVR-RB proves its ability to forecast the factory load with high accuracy results. Moreover, the selection of the best features is very important to create forecasting models that best predict the factory loads. Finally, in this study four machine learning algorithms are investigated. Other algorithms, such as random forest, decision trees, or deep learning algorithms, can be also investigated to forecast the load and their forecasting results can be compared with the results of this study.

The researchers would like to thank the Deanship of Scientific Research, Qassim University for funding the publication of this project.