Intelligent Machine Learning with Metaheuristics Based Sentiment Analysis and Classification

Sentiment Analysis (SA) is one of the subfields in Natural Language Processing (NLP) which focuses on identification and extraction of opinions that exist in the text provided across reviews, social media, blogs, news, and so on. SA has the ability to handle the drastically-increasing unstructured text by transforming them into structured data with the help of NLP and open source tools. The current research work designs a novel Modified Red Deer Algorithm (MRDA) Extreme Learning Machine Sparse Autoencoder (ELMSAE) model for SA and classification. The proposed MRDA-ELMSAE technique initially performs preprocessing to transform the data into a compatible format. Moreover, TF-IDF vectorizer is employed in the extraction of features while ELMSAE model is applied in the classification of sentiments. Furthermore, optimal parameter tuning is done for ELMSAE model using MRDA technique. Awide range of simulation analyses was carried out and results from comparative analysis establish the enhanced efficiency of MRDA-ELMSAE technique against other recent techniques.


Introduction
The continuous growth of internet has increased the amount of opinions experessed by users in social media and other digital platforms. Many users express their emotions and views through comments in internet platforms [1,2]. E.g., Product-related comments are given in e-commerce websites like Taobao and Jingdong while hospitality-related comments are produced in travel websites like ELong and Ctrip. These comments convey the views and emotions of internet users about hot events, products, and so on. Merchants can reap customer satisfaction with suitable product comments. Prospective buyers and potential users can estimate a product by looking at such product comments or reviews. With a dramatic increase in the number of comments, it is challenging to investigate the comments automatically. Therefore, Information Technology (IT) is exploited to mine the sentiment tendency enclosed in a text which is otherwise known as text sentimental analyses techniques. Text sentimental analyses represent the tendency to mine text sentiments using technology. Based on the granularity of text, text sentimental analyses can be separated under three stages such as chapter level, word level, and sentence level. SA aims at offering fast data by treating the posted reviews through Machine Learning (ML) methods rather than manual reading which is a difficult practice to follow, based on previous experiences posted online on web platforms [3].
Supervised and unsupervised ML approaches are utilized in SA which extracts the basic information from structured and unstructured text to assist the decision maker [4]. The supervised technique proves to be efficient in describing the polarity of sentiments. However, it needs huge amount of labelled information that is not simple for accomplishing. On the other hand, unsupervised techniques are still useful in treating the data without any labels, though the technique cannot be said as a better one. In classification process, the major phase is to elect the appropriate features from information. In addition, Parts of speech, Word2vec, Term Frequency (TF), and TF-IDF are the widely employed features from SA [5]. The application of certain features with different classification models yields a certain result. Therefore, appropriate method would be analyzed to use the special features with distinct classifiers and its performance should be examined. A high performance of uni-and bigram features, in terms of sentiment classification, is observed in literature. Nonetheless, a dedicated classifier is not suitable for twitter whereas a voting/integration ensemble of various classification methods shows remarkable performance for SA [6]. ML technique is generally used in the implementation of Sentiment Analysis (SA). It is broadly categorized into two types such as Supervised-and Unsupervised learning models [7]. Supervised learning approach depends on labelled datasets which are used to train the classification model whereas ML method executes the classification. Generally, partial and hierarchical clustering methods are utilized in unsupervised models [8]. In ML technique, SA precision mainly depends on the trained database based on which the classification is performed. When the size of database increases, the precision decreases upon using comparable trained database. Lexicon method does not depend upon a trained database which makes it appropriate for SA [9,10]. In literature, the researchers developed an effective pattern-based technique for feature extraction and opinion phrase extraction. SA was performed using Lexicon-based method. Various stages have been accomplished while performing feature level SA including feature selection, pre-processing, review extraction, POS tagging to positive or negative, and review classification.
Li et al. [11] presented a Sentiment Information-based Network Model (SINM) in which Transformer encoding and LSTM can be utilized as model mechanisms. Chinese emotional dictionary was used in this research to automatically define the sentiment skill from Chinese comments. In SINM, a hybrid task learning technique was planned for learning valuable emotional terms and forecast sentiment tendency. Initially, SINM needs to learn the sentiment skill from text. During the auxiliary control of emotional data, SINM has to pay further attention towards sentiment data compared to useless data. Wang et al. [12] examined sentiment diffusions by analyzing a phenomenon named sentiment reversal. The authors determined any stimulating characteristic in sentiment reversal. Afterwards, it can be regarded as interrelationship between textual data of Twitter message and sentiment diffusion pattern. This iterative technique was presented in the name of SentiDiff for predicting sentiment polarity observed in Twitter messages. In order to achieve the best of ability, this case is the initial step to utilize sentiment diffusion designs that can help Twitter SA to enhance.
A comprehensive sentiment dictionary was constructed in the study conducted earlier [13]. This wide sentiment dictionary comprises of fundamental sentiment word, field sentiment word, and polysemic sentiment word that enhance the accuracy of SA. Naive Bayesian (NB) technique was utilized to determine the polysemic sentiment and the field of text. Based on this model, the sentiment value of polysemic sentiment words from the fields is attained. By employing comprehensive sentiment dictionaries and the planned sentiment score rule, the sentiment of text is scored. In [14], a Multi-Attention Fusion Modeling (Multi-AFM) was presented that combines global as well as local attention with gating unit control so as to generate a reasonable contextual representation and gain enhanced classification outcomes. The experimental outcomes demonstrate that Multi-AFM technique is superior to the presented approaches in the field of education and other such fields.
Li et al. [15] made a danmaku sentiment dictionary and proposed a novel technique with the help of sentiment dictionary and NB to SA of danmaku review. This technique has been significantly useful in the supervision of entire expressive orientation of danmaku videos and forecasting its popularity. With procedures that involve the removal of expressive data in danmaku video, categorizing sentiment, and visualization information, the time distribution of seven sentiment dimensions was attained. Wu et al. [16] presented a technique for Chinese micro-blog sentiment computation with difficult sentences for clauses to clauses-to-words, and with emoji. This technique accurately categorized the comments in Chinese micro-blog as positive, negative, and neutral. Liang et al. [17] introduced the construction of a Gaussian Process Dynamic Bayesian Network for modelling dynamic as well as interactive sentimental issues on social media like Twitter. It utilized Dynamic Bayesian Network for modelling the time series of sentiments expressed for the compared issues and learns connections amongst them. The network technique itself implemented Gaussian Process Regression to model the sentiments at provided time point based on the issues compared at preceding time.
The current research work proposes a new Modified Red Deer Algorithm (MRDA) Extreme Learning Machine Sparse Autoencoder (ELMSAE) model for Sentiment Analysis and classification. The proposed MRDA-ELMSAE technique initially performs pre-processing to transform the data into a compatible format. In addition, TF-IDF vectorizer is employed for extraction of features. Followed by, ELMSAE model is applied for the classification of sentiments. Besides, optimal parameter tuning of ELMSAE model is conducted with the help of MRDA technique. An extensive experimental validation was performed on benchmark dataset and the results were examined under varying aspects. In short, the contributions of the paper are summarized herewith.
A novel MRDA-ELMSAE technique is proposed to detect and classify the sentiments under different classes. Encompasses different sub-processes namely pre-processing, extraction of features, classification, and parameter optimization. An effective TF-IDF-based feature vector transformation and ELMSAE-based classification for SA are designed. A novel MRDA-based parameter optimization technique is developed for optimal adjustment of the parameters in ELMSAE model. The proposed MRDA-ELMSAE technique is validated for its performance using benchmark dataset.
Rest of the paper is organized as briefed herewith. Section 2 elaborates the proposed MRDA-ELMSAE technique while Section 3 offers a detailed overview about performance validation for the proposed model. Lastly, Section 4 draws the conclusion.

The Proposed Model
In this study, a new MRDA-ELMSAE technique is designed for the classification of sentiments under positive and negative polarities. The proposed MRDA-ELMSAE technique encompasses four different stages namely, pre-processing, feature extraction using TF-IDF technique, classification using ELMSAE, and parameter tuning using MRDA technique. Fig. 1 exhibits the overall working process of the proposed MRDA-ELMSAE technique.

Pre-Processing
Data pre-processing is the foremost step in this model which is employed for the removal of incomplete and noisy data. The dataset, utilized in this case, has enormous amount of redundant data which tend not to act another role from the forecasted one. Both testing as well as training time enhances, when dataset is the most important entity. The removal of unwanted data accelerates the trained model. Primary phase is an analytics step with missed values that recognizes and removes the missed information since it reduces the classification performance. Afterward, the mathematical value is divided into text as it does not give information about learning of the classifier. It decreases the complexity of the trained classifier. At few instances, the analysis process may have to encounter special symbols like thumb sign, hear sign, and so on. These symbols are also detached to decrease the feature dimensions and improve performance. Then, subsequent punctuations []() /|; . ' are divided during analysis as it does not give meaning to text studies. It cripples the ability of the model in separating punctuation and other character. In the succeeding stage, the words are altered for lowercase. If this phase is not implemented, the ML modules sum up the character. Lastly, stemming is applied which is a vital pre-processing phase. During this phase, the attaches in the word are removed. Stemming modifies this word to its original/root procedures and uses it for increasing the performance of classification.

TF-IDF Based Feature Extraction
At this stage, the preprocessed data is provided as input to TF-IDF model. A python library called Scikit learns is utilized [18] as per the literature. This library is a perfect choice to execute a few tasks with TF-IDF vectorizer method. This model consists of TF-IDF vector which denotes a term 'relative importance' in the record or altogether. The second feature of this model is Term Frequency (TF) which is significant. It implies the frequency of a term that appears from the datasets (defined as 'term frequency' while facing data exploration). The equation to find the TF is given herewith. IDF, Inverse Document Frequency is the next entity that should be defined to ensure appropriate functioning of the algorithm. It is utilized to measure the importance of a word in whole dataset. The equation for IDF is given herewith.
Hereafter, it is defined as TF-IDF. TF-IDF is equivalent to Inverse Document Frequency, incorporated with TF, as given herewith.
The TF-IDF method removes the feature engineering and calculates the appropriate term from fake and real news under this dataset. These reasons help in attaining high efficiency. TF-IDF vectorizer method is the next step in this working mechanism. TF-IDF Vectorizer employs an in-memory jargon (python dictionary) for planning the most consecutive word to emphasize the files and streamline the word event recurrence (scanty) networks. Using TF-IDF vectorizer, recurrent weightings and tokenized records are archieved.

ELMSAE-Based Classification
ELMSAE model receives the extracted features as input and performs the classification process. In this stage, the universal approximation capacity of the ELM is used for the purpose of AE. Furthermore, sparse constraints are included in AE optimization due to which it is called ELM sparse autoencoder. Different from AE viz:; BP related NN ð Þ which is applied in conventional DL methods, the input weight of the presented ELM sparse AE is determined by seeking the path back from arbitrary space. ELM concept has shown that the ELM trained by randomly-mapped input weight is highly effective in approximating the input data. In other words, if AE is trained after ELM concept after AE is initiated, then there is no need to fine-tune the parameters. Additionally, for generating compact and sparse features of the input, ' 1 optimization is performed for ELM-AE establishment. However, it is dissimilar to ELM-AE and ' 2 -norm singular value is estimated for feature depiction. Fig. 2 depicts the architecture of ELM. The optimization method for the presented ELM sparse AE is given herewith Here, X represents the input data, H denotes the random mapping output, and b indicates the hidden layer weights to be attained. In present DL method, X is the encoding output of the base b that should be altered at the time of iteration. But, in the presented AE, random mapping is employed for hidden layer feature depiction. X represents the actual information and H indicates the randomly-initiated output which needs no enhancement [19].
Moreover, the following subsection shows that it will not assist in enhancing the trained time and learning precision. Hereafter, the optimization method is explained for ' 1 optimization issue. For certain depictions, the objective function is altered as given herewith.
where b ð Þ ¼ j Hb À X j j j 2 , and q b ð Þ ¼ b j j j j' 1 is the ' 1 penalty term of the training model.
FISTA method is adapted to resolve the problems. FISTA decreases a smooth convex function using difficulty of 1=j 2 ð Þ, whereas j indicates the iteration time. The performance of FISTA is given herewith.
(1) Estimate the Lipschitz constant c of the gradient of smooth convex function, rp: (2) Begin the iteration by considering y 1 ¼ b 0 2 R n ; t 1 ¼ 1 as the primary point. Next, for j j ! 1 ð Þ.
(a) b j ¼ s c y j À Á , whereas s c denotes the following equation.
After the iteration process is computed, it can handle the retrieval of data point from the corrupted one completely. With resulting base b as the weight of the presented AE, the inner products of both input and learned feature reflect the compact representation of actual information.

MRDA-Based Parameter Tuning
For optimal adjustment of the parameters involved in ELMSAE model, MRDA technique is applied. The mathematical modeling of MRDA technique and distinct phases of the modified method are discussed in this section. Similar to original RDA, this altered form is directed towards the identification of global or local optimum solution for the problems considered. In solution space (P), possible solutions are regarded as Red Deer (RD) which is determined by Here, D indicates the dimension of solutions. The badness or goodness of a solution is defined by welldetermined fitness function as expressed in Eq.
Initially, the population is generated by making NP number of RD solutions. According to the fitness values, optimal RD is regarded as N male whereas the remaining solutions are considered as N hind . Now, the elitism property is monitored to choose the RD in these two sets. According to this property, the number of males is regarded as follows.
Here, a 1 represents an arbitrary constant value i.e., a 1 2 0:2; 0:5 ½ . The amount of hinds N hind ð Þ is represented as N hind ¼ NP À N male . The position of male RDs, regarded as the best solution, makes an attempt to upgrade the best results in neighborhood too. The natural observations of the new method are preserved whereas the arbitrariness for upgrading the locations of male RDs is also integrated. Instead of utilizing two constant values which are arbitrarily made from the original form of approach, a single arbitrary constant value is employed at this point. In order to provide a gradient in adaptability and assist an increasingly low rate of change in parameter value, the arithmetic equation is designed precisely as given herewith.
Here, a represents a haphazardly-made real number ða 2 ½0; 1 the present generation and the overall amount of generations are represented as i and n, correspondingly. Different kinds of males exist in nature. According to original RD method, both stags and commanders are two kinds of RD which are considered under this altered form. The amount of commander males N cm ð Þ is determined as follows.
whereas a 2 represents the arbitrary constant values i.e., a 2 2 [0.2,0.5]. So, 20% to 50% of males are chosen as commanders. Rest of the males are regarded as stags and the quantity of stags N stag À Á is calculated as given herewith.
In view of original RD method, the fighting process between stags and commanders is based on a significant randomization number. The two arbitrary constant parameters perform a significant part. Based on novel RDA, a set of hinds in male commanders is called 'harem'. The size of harem is based on the strength of commanders. During harem formation process in novel RDA, the fundamental concept is to assist the accessibility of Female-Hinds, according to the objective fitness of stags and commanders. This occurs in the ascending order of magnitude based on its natural and particular domain behavior. During MRDA, the allocation of hinds is distinctively conceived based on the number of commanders within the population and randomization parameter, R. In order to achieve the structure of harem, the following steps are performed to select the hinds in control of commander P ¼ Overall amount of hinds/Amount of unallocated commanders At first, the N number of hinds is allocated to the initial commander. In the following generation, the N number of hinds is allocated to the following commanders.
In novel RDA, a commander's mate exists with specific kind of hind in his harem. The arithmetical equation is given below.
ÞÃhn À a Ã ub À lb ð ÞÃ n À i n i 2 Ã n ; a > 0: 5 (13) The arithmetical model is engineered distinctively to conserve less recessive features, and more of dominant features with certain goals of availing a gradient alteration from the value. However, stags are permitted to randomly mate with every hind under the population. However, a specific stag does not take part in mating several times in a certain generation. In order to choose the population for upcoming generations, the quality of offspring RDs is estimated based on fitness value. According to the fitness value of present offspring and the RD of preceding generation, the best RD is chosen for upcoming generation. Roulette wheel selection [20] approach is utilized for this election procedure. This step is repeated for a specific number of iterations for time interval or optimal solution is not altered for a longer period.

Experimental Analysis Results
The current section discusses the results of experimental investigation conducted upon MRDA-ELMSAE technique using a benchmark dataset including mobile application analyses for Google app. The dataset encompasses 64,295 instances with 'App', 'Translated_ Reviews', and 'Sentiments'. It includes three kinds of sentiments namely, positive, negative, and neutral.
Tab. 1 demonstrates the results of comparative analysis accomplished by MRDA-ELMSAE model using distinct types of features under different measures. Fig. 3 provides the results of the analysis achieved by MRDA-ELMSAE technique under TF-IDF features. The figure conveys that the proposed MRDA-ELMSAE technique accomplished improved outcome over existing methods. For instance, MRDA-ELMSAE technique classified the negative label with Pre n , Rec l , and F score values being 98.7%, 92.3%, and 95.4% respectively. Moreover, MRDA-ELMSAE method classified the neutral label with Pre n , Rec l , and F score values being 93.6%, 95%, and 94.2% correspondingly. Furthermore, MRDA-ELMSAE method categorized the positive label with Pre n , Rec l , and F score values being 99%, 99.5%, and 99.1% correspondingly. Similarly, RF model accomplished poor performance on the classification of sentiments. RF model classified the negative labels with Pre n , Rec l , and F score values being 87%, 71%, and 78% respectively. Meanwhile, RF method categorized the neutral labels with Pre n , Rec l , and F score values such as 79%, 82%, and 81% correspondingly. Eventually, RF method classified the positive labels with Pre n , Rec l , and F score values being 89%, 94%, and 92% correspondingly.   Fig. 4 gives the results of the analyses accomplished by MRDA-ELMSAE system of the projected method under TF-IDF Bi-gram features. The figure shows that the proposed MRDA-ELMSAE approach achieved maximal results over existing models. For instance, MRDA-ELMSAE approach categorized the negative label with Pre n , Rec l , and F score values such as 88%, 64.8%, and 73.1% correspondingly. Furthermore, MRDA-ELMSAE system categorized the neutral label with Pre n , Rec l , and F score values being 69.3%, 46.4%, and 42.9% correspondingly. Moreover, the proposed MRDA-ELMSAE approach classified the positive label with Pre n , Rec l , and F score values such as 85.9%, 99.7%, and 93.3% correspondingly. Likewise, RF approach achieved poor performance on the classification of sentiments. RF method classified the negative labels with Pre n , Rec l , and F score values being 78%, 49%, and 60% correspondingly. Meanwhile, RF approach classified the neutral labels with Pre n , Rec l , and F score values being 55%, 26%, and 35% correspondingly. In the end, RF method classified the positive labels with Pre n , Rec l , and F score values such as 75%, 93%, and 83% correspondingly.   correspondingly. Moreover, MRDA-ELMSAE method classified the positive label with Pre n , Rec l , and F score values being 80.8%, 100%, and 88.1% correspondingly. Likewise, RF system reached poor efficiency on the classification of sentiments. RF method classified negative labels with Pre n , Rec l , and F score values being 41%, 32%, and 47% correspondingly. Meanwhile, RF approach classified the neutral labels with Pre n , Rec l , and F score values such as 84%, 18%, and 30% correspondingly. At the end, the RF method identified the positive class with Pre n , Rec l , and F score of 70%, 99%, and 82% correspondingly.
Finally, a comparative accuracy analysis was conducted between the proposed MRDA-ELMSAE technique against existing approaches and the results are shown in Tab. 2 and Fig. 6. The figure shows that R_NB_KNN technique gained poor outcomes with low accuracy (73%). At the same time, Stochastic GDCLR, GCN, SGCN, and NABoE techniques obtained moderate performance with accuracy values being 88%, 88%, 89%, and 86% respectively. Moreover, BBSO-FCM and Gradient Boosted SVM techniques too accomplished near optimal accuracy i.e., 96% and 93% respectively. However, the proposed MRDA-ELMSAE technique resulted in increased accuracy of 98%. Based on the results and discussion so far, it can be confirmed that the proposed MRDA-ELMSAE technique is an effective tool for Sentiment Analysis.

Conclusion
In current research a novel MRDA-ELMSAE technique is derived for the classification of sentiments under positive and negative polarities. The proposed MRDA-ELMSAE technique encompasses four different stages such as pre-processing, feature extraction using TF-IDF technique, classification using ELMSAE, and parameter tuning using MRDA technique. The application of MRDA technique to adjust the weight and bias of ELMSAE model helps in achieving enhanced classification performance. A wide range of simulation analyses was carried out and the results from comparative analysis established the enhanced performance of MRDA-ELMSAE technique than other recent approaches. In future, feature selection approaches can be included to boost the overall classification results.
Funding Statement: We acknowledge Taif University for Supporting this study through Taif University Researchers Supporting Project number (TURSP-2020/173), Taif University, Taif, Saudi Arabia.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.