Spectral Vacancy Prediction Using Time Series Forecasting for Cognitive Radio Applications

An identification of unfilled primary user spectrum using a novel method is presented in this paper. Cooperation among users with the utilization of machine learning methods is analyzed. Learning methods are applied to construct the classifier, which selects the suitable fusion algorithm for the considered environment so that the out of band sensing is performed efficiently. Sensing performance is looked into with the existence of fading and it is observed that sensing performance degrades with fading which coincides with earlier findings. From the simulation, it can be inferred that Weibull fading outperforms all the other fading models considered. To accomplish missed detection probability of 1% in the Rayleigh channel, the false alarm probability obtained is almost 0.8 however to obtain the same missed detection probability in the Weibull channel, false alarm probability is less than 0.1 which is very favorable for both indoor and outdoor scenarios. Numerical analyses are carried out here to predict Primary User (PU) channel condition using Hidden Markov Model with the help of Time series forecasting learning method. It is evident that the prediction performance has reached 100% as the result of using the Weibull Fading Model for a period of 200 ms when compared to the Rayleigh model which is achieving only 84.5% accuracy in prediction.

acquired energy is more than the threshold. Estimated energy is transferred to the fusion center to create an improved decision by using several hard [5][6][7][8] and soft-decision algorithms. The FC gives a global result by merging the individual decisions from each CRs.
Using extended generalized-K distribution fading channels are modeled for spectrum sensing using ED in [9]. The CSS model with an Improved ED (IED) scheme is reviewed in [10] uses several antennas at each CR. The CSS model with an antenna at every CR over the Rician channel is explored in [11]. System parameter performance with an IED scheme is examined over channel models is done in [12]. The analogous study in [13][14][15][16] using Nakagami-m and Weibull fading channel with ED method is done since it does not require prior information about the channel. A classifier [16] with energy statistics of PU along with decisions about the existence of PU is used for training. Based on learning methods classifiers can be classified as supervised and unsupervised methods. The unsupervised machine learning technique we used is the K-means algorithm to study and find out the primary user's data and patterns. The supervised machine learning techniques we used here is SVM to train the model with the data labeled in the previous step. To forecast the decisions on undetected PU channel conditions, the trained classifier is used. Evaluating the PU channel state [17][18][19][20][21] in CR is a tedious job since it is time varying. A lot of works approach on earlier discussed problem [22][23][24] using Fast Fourier transform (FFT). It is having high complexity because of large computations. Here we introduce the time series forecasting method for identifying the occupancy of the PU channel state. A time series [25] is used to study the PU channel state (i.e., "idle" or "occupied"). Then Hidden Markov Model (HMM) model is used to capture PU channel status [26,27].
Here, we suggest a novel model based on Machine Learning (ML) techniques to detect the unused spectrum of primary users effectively. We consider "energy vector", as a feature vector in which the energy level is calculated at each CR. Then, the classifier classifies the vector into either the "channel available class" or the "channel unavailable class". We investigated impact of small-scale fading such as multi-path fading, delay spread and Doppler spread are considered over different fading channels. We examined the performance of an unsupervised machine learning algorithm as well as a supervised machine learning algorithm to construct the classifier for decision making. A qualitative performance evaluation of various fusion methods with learning is presented. To acquire the advantage of multi-path propagation, energy detection with receiver diversity is considered and analysis in terms of missed detection and false alarm probabilities over various fading channels is done. Using HMM, primary user channel state prediction is carried out. To our knowledge, no literature considers both sensing and reporting channels as fading models and includes ML methods for classification and further to foresee next state of PU channel using the time series forecasting method.  The rest of the paper is organized as below. In Section 2 we introduce the signal model and the proposed cooperative spectrum sensing scheme. Different classification and fusion methods are presented. Simulation results and discussions are given in Section 3 discusses the experimental outcomes. Finally, conclusion is given at Section 4.

System Model
CR network with a single PU and seven SUs is considered. Here the PU alternates between the idle and busy states. The SU knows the presence of the PU by sensing signals using an ED technique. The signal energy is used to predict the presence of PU. We consider a cooperative CR network with K nodes considering N test points for the ED and M frames for training the ML classifier, as illustrated in Fig. 2.
The i th frame of the received signal at the j th cooperative node, Y ij (n) is given by where S ij (n) is the primary user's signal, which is a Gaussian i.i.d random process with zero mean, w ij (n) is the noise. The nodes are sensing frame and obtain the statistics. For i th frame at the fusion center, the energy samples Y ij, is expressed as jY ij ðnÞj 2 , i lies between 1 and M, where γ i is a random variable that has a pdf with chi-square distribution. γ ij is the signal-to-noise ratio of the i th frame observed at the j th cooperative node. r 2 j is standard deviation of noise samples w ij (n). For all frames, noise variance and SNR remains unchanged during the training process is our assumption. For each frame in the training set a threshold, j is selected then the probability of the false alarm, P fa is obtained as and probability of detection P d is where Q (.) is the complementary distribution function.

Fusion Center Threshold Calculation for Various Data Fusion Rules
In order to achieve optimal threshold, λ fusion rules are considered here. K nodes work together for computing the threshold to build the overall sensing decision. The following hard fusion rules are used in the decision center of the secondary users.

AND Fusion Rule
All nodes detect the signal and inform it to FC then, this rule decides that the signal is present so the threshold [2] is the detection probability P d AND [7] can be expressed as

OR Fusion Rule
If one among many users detect the signal, then this rule determines that signal is present so the threshold [2] is the detection probability P dOR [7] can be The soft fusion rules we considered here include maximum ratio combining (MRC) and square law selection (SLS).

Maximum Ratio Combining (MRC)
All CRs transmit their corresponding vectors to the FC. It gathers the data, combines them, creates a global decision by comparing the value with the detection threshold. The fusion center adds them after receiving these energy statistics c si ¼ P K j¼1 w j c ij for i varies from 1 to M where γ ij is i th frame energy test statistics. To maximize the detection probability, the optimum weight vector, w j has to be obtained. The weighting coefficient vector w j can be achieved as w j = sign(g T w 0 ) w 0 where g T ¼ ½ r 2 1 c 1; r 2 2 c 2; . . . ; r 2 K c K with instantaneous SNRs and noise variances. Then fusion threshold is the detection probability P dMRC is given by

Square Law Selection Rule (SLS)
FC chooses the user with maximum SNR, γ SLS [7] with noise variance, r 2 u . Then the threshold is given by the detection probability P dSLS is given by

ML Based Classification Methods
From the energy vectors obtained is compared with the threshold to obtain the decision, d i which is linked with i th frame is mentioned as where c c 2 fc i ; c si g and λε{λ AND , λ OR, λ MRC , λ SLS }. Here "−1 " expresses the PU at rest, and "1" expresses busy PU communication on channel as mentioned in Eq. (12). For classification problem ML based classifier can be used to perceive the decision linked with a new frame. Initially, K-means is applied to find primary users' transmission patterns and statistics. Then, to differentiate between two states i.e., active or inactive state of the primary user signal, the Support Vector Machine (SVM) method is adopted.
In the first step, unsupervised machine learning techniques are used to analyze the primary user's data and patterns. Further, supervised machine learning techniques are used to train the model with the data labeled in the previous step.

K-Means Clustering
This method divides a group of the training energy vectors into clusters. The set of those vectors that fit in to cluster k is denoted by C k . It's centroid is α k . It aims to select K clusters, C 1 , . . . , C K , which minimize argmin C 1 ; : : : The steps involved in finding clusters that satisfy Eq. (13) are 1) The centroids for clusters are initialized first and its value in cluster 1 is fixed as α 1 = μ Y|S . From the cluster set, one cluster is selected that PU is idle (i.e., S = 0) so it belongs to channel available, whereas further clusters belong to unavailable class. 2) From the energy vectors assigned earlier, centroids are calculated and are updated by obtaining the average of all vectors except for cluster 1.
3) The iterations are recalculated till it converges. By K-means algorithm centroid for cluster k is obtained as a Ã k = jC k j À1 P y l 2C k y l ; ∀k = 2,…,K. After the training it receives the test energy vector, y Ã for further classification. Then classifier decides whether it is a part of cluster 1 or not, by computing distance from the test vector to the centroids.
where parameter β is the threshold to limit the compromise between the misdetection and the false alarm probabilities.

Support Vector Machine
These are supervised learning models and is used to produce the best decision boundary which can separate space into classes. This boundary is called hyperplane. The way of choosing vectors that aid in creating the boundary. These extreme vectors are called support vectors. The dataset of n points is given in the form of (x 1 , y 1 ),…,(x n , y n ) and the final choice is either 1 or −1 that denotes the class to which x i belongs [16] based on the hyperplane equation as where w is the normal vector to the hyperplane and b is constant. For the training set (γ i , d i ) where i varies from 1 to M, the optimization is done to maximize the border between both classes (i.e., w γ i + b = ±1 ) corresponding hyper-plane equation w γ i + b = 0 can be found out using this classifier.
Using Lagrangian function the solution for quadratic optimization is mentioned as We can find α from Eq. (18) and we can be computed using w = P M i¼1 a i d i c i . By selecting α i is greater than 0 further, find b from the expression b = d j À P M i¼1 a i d i ðc i c j Þ then the new decision γ x using the given classification function is

Channel Classification
Based on the energy vectors received, the fusion rules are applied and the energy vectors are given as inputs to the above-mentioned algorithms. The ML algorithms classify the channel as available and unavailable class. The former defines that the PU is idle in the channel so SU can get an admission. The latter defines that the PU is present in the channel so the SU cannot access the channel and are redirected to other available channels by cognitive radio. Based on priority and observations at the decision center, it assigns the channel to the SU.

Fading Models 2.3.1 Log Normal Shadowing
This model tells the path loss a signal experiences inside a building or tightly inhabited areas over distance. The Probability Density Function (PDF) expression is specified as where μ is the mean and σ indicates standard deviation. Then approximated detection probability [14] is given by where α is the shape parameter and β denotes the expectation of γ. K v (.) signifies a modified Bessel function of second kind, À (.) is gamma function and its complementary incomplete function be À ( . , . ).

Rayleigh Fading
This model assumes signal to noise ratio, γ follows an exponential PDF given as The approximated detection probability [14] is given by where λ denotes the threshold of the energy detector, " c denotes average Signal to Noise Ration (SNR) and 1 F 1 (.;.;.) denotes confluent hypergeometric function.

Rician Fading/Nakagami-n Fading Model
In Rician fading, a strong line-of-sight wave is prevailing. The corresponding PDF is given by where K denotes Rician factor, n th order modified Bessel function of the first kind be I n (⋅). The approximated probability for detection of this model is Àðu þ n; k=2Þ Àðu þ nÞ where u is the time-bandwidth product.

Weibull Fading
Weibull fading model has been introduced to analyze rapid signal fluctuations in non-line-of-sight scenario. The PDF expression of the instantaneous SNR γ is specified as where c = v/2 and p = 1 + 1/c. The factor v, always greater than zero, is the Weibull fading parameter shows fading severity and if v = 2 Eq. (26) reduces to Rayleigh PDF. The approximated detection probability for Weibull fading channel [14] is where G m;n p;q xj

Nakagami-m Fading Model
Here the random variable γ obeys a Gamma PDF is mentioned as where m is the fading severity parameter which lies between 0.5 and ∞. The approximated detection probability [14] is given by If we put m = 1 this distribution becomes Rayleigh distribution and reduces to Eq. (23).

Hoyt Fading / Nakagami-q Fading Model
This model portrays fading severely than Rayleigh fading where q (0≤ q ≤1) is the severity parameter. The PDF is mentioned as where p = 4q 2 / (1 + q 2 ) 2 , p lies between 0 and 1. When p and q are equal to one, then the distribution becomes Rayleigh PDF. The approximated detection probability [14] is given by Àðu þ n; k=2Þ Àðu þ nÞ

Spectrum Prediction Model 2.4.1 Time Series Generation
Over time the state of PU is to be predicted whether it is in "Idle state" or "Occupied state" and to obtain the transition from one state to another. For that time series is created to assign every state of the detection series into a different sample space with random variables using Autoregressive Integrated Moving Average (ARIMA) model as shown in Fig. 3. The time series g t is expressed in (32) g t 2 u 1; u 2;... u l g t , k u lþ1... u m g t ! k (32) where g t 2 u 1; u 2;... u l represents PU absent and g t ∈ u l+1… u m represents PU present. We want to estimate the next state X t+1 of a PU channel state currently at X t .

Hidden Markov Model for Primary User Channel Condition Forecasting
In the Hidden Markov Model the system outcome is known and state change probabilities are unknown. Let X = {X 1, X t, ….X T } represents the hidden state series where X t ∈ s i , i = 1, 2,…K, s i represents states of PU can take a value of 0 and 1. The number of hidden states is denoted by K. Assume O = {O 1, O t, ….O T } exhibits the observation sequence where O t ∈ u 1, u 2,… u M and a number of the observations is mentioned as M. The transition and the emission probabilities matrix respectively are A and B also the initial state probability vector is represented by π. HMM is represented as θ = (π, A, B) and considered two unknown states i = 2, p ¼ ðp 1 ; p 2 Þ is used for calculating above matrices.
where a ij is the probability that for the present state s i , the next state is s j . The emission probabilities matrix is given as here b jm denotes probability for current observation u M , the current state is s j . Let the probability of state series t which ends at state i be d t ðiÞ ¼ max To maximize Eq. (37) we use a vector φ t (i) which stores the argument values. Following are the steps used in the Viterbi Algorithm to estimate the channel state.
Step 1: Initializes δ t (i) and φ t (i) Step 2: Repeats to upgrade δ t (i) and φ t (i) Step 3: Computes the likelihood probability P*, then estimated state q Ã T by P Ã ¼ max to obtain HMM parameters u ¼ ðp; A; BÞ statistically Baum-Welch Algorithm and maximum likelihood estimation is used. This estimation is done these ways.
Step 1: Obtain the training observation vectors O 1, O 2 O t, ….O L with length L Step 2: Obtain θ(1) when k = 1 and update k = k + 1 Step 3: Let the probability of current state s i be γ t (i) and its transition to next state s j be ζ t (i, j) Step 4: Obtain expected values of state s i and its transitions to s j as E(γ t (i)) and E(ζ t (i, j)) Step 5: Get the new approximation of b a ij , b i (k) and p i then label it as θ(k + 1) Step 6: If not converging then go to step 3.

Results and Discussions
We assume that PU and SUs are stationary. In our scenario, we considered two PUs and seven cooperative SUs. The simulation is carried out with the assumption of the following parameters as shown in Tab. 1.

Training Duration of Classifiers
500 energy vectors were chosen for training each classifier. For unsupervised learning details about channel availability are not required while for supervised learning the channel availability states are required for training. It involves dividing the vectors into clusters and its centroid calculation is done in K-means clustering. But in SVM, locating the hyper-plane which splits the training energy vectors distinctly is the primary task. After successful training of the classifier is completed, the test energy vector is provided for classification. The training duration of classifiers for the size of the energy vector is used for training is shown in Tab. 2. As we increase the number of samples up to 1000 for training the supervised learning technique K-Means technique shows a high training duration (0.117 s) for 1000 samples. Also, the time taken to decide the channel availability is calculated for 1000 samples and obtained as 1.8 × 10 −5 for K-Means and 5.5 × 10 −5 for SVM.

Machine Learning Classification
We represent the scatter plot of energy vectors here. In Fig. 4a the K-means method defines a threshold margin between the energy vector points by calculating centroids and classify them as the channel available and unavailable class. Fig. 4b shows SVM algorithm using Linear kernel separates the energy points based on channel availability and draws a decision boundary to determine the threshold for classification.  We consider SUs located at different places that are 1 km apart. Fig. 5 presents various CSS methods accuracy in terms of Receiver Operating Characteristic (ROC) curves. It portrays that SVM method is better than the other methods. So this method is suited for multiple primary user cases with high accuracy. The SVM with Linear kernel classifier achieves high detection probability even in the presence of seven secondary users. Also, the unsupervised learning K-Means method is having a comparable performance with the SVM method. By bringing intelligence to the system upgrades the classification process. Even though complexity is associated with SVM, the CR scenario is more prone to lose connectivity the supervised learning is more suitable. AND and OR methods will not produce exact outcomes since outcome depends on all and one decision by CRs.

Numerical Analysis of Spectrum Sensing Using Multiple CR
Numerical outcomes for decision fusion methods are illustrated here. The average snr " c is assumed to be 15 dB. Fig. 6 manifests the probability of missed detection (P md ) vs. the probability of false alarm (P fa ) under several fading scenarios. Fading channels like Rician, Nakagami-m, Hoyt, Weibull and log-normal shadowing are considered with parameters chosen to be K = 3, m = 3, q = 0.25, v = 6, and σ = 5 dB after  running simulations. These parameters are found to be optimal values and it is also coinciding with previous literature. Fading introduces missed detection. We also observe v = 2, Weibull fading channel performance is identical to Rayleigh channel. Apparently, Weibull fading shows the best result among all fading models as the slope of the curve drops quickly. To achieve a P md of 1% in a Rayleigh channel, P fa is found to be 0.8 while in the Weibull channel to obtain the same P md , P fa of even lesser value is obtained, which is highly desirable. It is observed that P md value reduces as P fa increases, hence increase detection probability P d . This reduction in false alarm probability is required in cognitive radar applications to increase detection probability in cluttered environment. So spectrum utilization can be enhanced. From Fig. 7 it is understood that the probability of detection P d of 75% is achieved even for very less P fa in the Weibull fading model when compared with the Rayleigh model which can produce upto 55% only.
As the Weibull Fading has shown better performance, we utilize this to improve the channel sensing and the fusion decision approach of the cognitive radio network. The Weibull fading has followed a chi-square distribution function. Fig. 8 represents ROC of various decision fusion methods. We consider seven cooperative nodes and for soft fusion rules the assumed SNR values vary from −25 to −15 dB.
We keep the probability of false alarm also low. It is evident that MRC outperforms the other schemes. It is due to the right selection of fading channels and learning methods. Detection is done with greater accuracy because the weight vector is chosen optimally. It is achieving a P d of 70% for a P fa of 0.5. Even in less SNR, it works well.  Once the energy vectors are obtained they are classified as the channel available and unavailable classes and the SVM-Linear algorithm is used for the classification. The ROC plot shows that the performance of the decision Rules is highly improved and almost reached an accuracy of ∼100% in Fig. 9. MRC reaches the best accuracy of 99.9% where the AND rule has the least accuracy of 97.8%. Over thousand frames this SVM method predicted efficiently when various thresholds are fed into the classifier. The true positive rate obtained for various methods is shown here. Hence it is evident that the Weibull fading along with Chi-Square Distribution has increased spectrum sensing and SVM classifier with MRC can classify the channels with great accuracy. We use 1000 testing frames for classification. Fig. 10 shows thresholds of the above rules is fed into SVM classifier and it achieves detection rate of 99.9%. It depicts the identification of holes in the spectrum which can be effectively utilized. Among various methods, MRC is showing the best performance.

Prediction of Primary User Channel State Using HMM Model
To estimate the next state of PU we examine a method and is divided into three steps. First, we find the PU channel state then obtain a model based on the detection that creates a time series to learn about its present condition and finally propose a model for predicting its next state using HMM. For a time period, T = 200 ms Fig. 10a represents the random distribution of PU channel conditions. It presents the state of the PU in a particular channel. It has both idle state (channel available state) and occupied state (channel unavailable state). These idle states are termed as the spectrum holes and can be effectively used for the SU communication in CRN. In Fig. 10b to differentiate the state of activity of the PU channel a margin is drawn. The points below the margin show the amount of the spectrum holes in a channel. While the points above the margin show the instances at which the channel is occupied by the PU. This sequenced time series is used as the input parameter for the HMM prediction model. Once the sequenced time series is generated, it is used along with emission and transmission probability for the prediction of the channel. The Hidden Markov Model along with Viterbi and Baum Welsh algorithm is used for the Spectrum Prediction. It is evident that the prediction performance has reached 100% as the result of using the Weibull Fading Model for a period of 200 ms from Fig. 10c when compared to the Rayleigh model which is achieving only 84.5% accuracy in prediction. Hence the spectrum prediction of the PU channel activity is done with maximum accuracy.

Conclusion
Thus an efficient machine learning algorithm for spectrum sensing and prediction is developed and simulated in the wireless cognitive environment. The spectrum sensing was initially done with the Rayleigh fading model and the channel was classified with the help of machine learning but it has shown a poor act in terms of detection. Hence an improvisation by considering the other channel fading models and tested them in the wireless environment is found. As a result, the Weibull fading model is the best fading model to use in both indoor and outdoor wireless environments. Further to find the best fusion rule algorithm to suit this scenario, we conducted the performance comparison of soft and hard fusion rules and found maximal ratio combining algorithm is performing well by making use of receiver diversity in multiple CRs. Thus we combined the Weibull fading model with the MRC fusion rule and tested it in the cognitive environment under the chi-square distribution for multiple secondary users and this resulted in better detection characteristics. Then HMM with Baum Welsh and Viterbi Algorithm is used for the spectrum prediction by generating a time series to capture the states of PU. This has given a high accuracy in forecasting the next state of the channel.
The chosen energy vectors correctly train the classifier to find the occupancy of the primary user. The duration of training is found to be very less. So SVM method can be a good choice to get a decision on the channel conditions ambiguity. Among various fading models, the Rayleigh model is due to multipath reception and all the other models are its variants. By doing simulations we obtain the best possible values of severity factor of all the methods and found out that detection is good in the Weibull method. It is observed that the P md value reduces as P fa increases, hence increase detection probability. This reduction in missed detection is required in cognitive radar applications to identify targets in a cluttered environment. Thus spectrum utilization can be enhanced. The simulation results revealed that the classifier trained with the ML algorithm helps the fusion center to attain well in terms of sensing accuracy. From the received statistics MRC method gives its best because of the accurate choice of weight parameter based on energy data obtained from all CR users. The fusion center can conclude channel status accurately. Then Hidden Markov model is used to study the occupancy of the primary user channel and predicts its near future to make use of spectrum is efficiently and found out that high accuracy in predicting the occupancy of the channel. Using this model the prediction in various fading models is done and found out that using time series-forecasting model completely fits on past data and using them to predict future interpretation. This work finds application in the field of 5G technology where the wireless spectrum is very important to be shared with lots of IoT machines and smart devices for communication.
Funding Statement: The authors received no specific funding for this study.
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.