<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xml:lang="en" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CMC</journal-id>
<journal-id journal-id-type="nlm-ta">CMC</journal-id>
<journal-id journal-id-type="publisher-id">CMC</journal-id>
<journal-title-group>
<journal-title>Computers, Materials &#x0026; Continua</journal-title>
</journal-title-group>
<issn pub-type="epub">1546-2226</issn>
<issn pub-type="ppub">1546-2218</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">52666</article-id>
<article-id pub-id-type="doi">10.32604/cmc.2024.052666</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Sentiment Analysis Using E-Commerce Review Keyword-Generated Image with a Hybrid Machine Learning-Based Model</article-title>
<alt-title alt-title-type="left-running-head">Sentiment Analysis Using E-Commerce Review Keyword-Generated Image with a Hybrid Machine Learning-Based Model</alt-title>
<alt-title alt-title-type="right-running-head">Sentiment Analysis Using E-Commerce Review Keyword-Generated Image with a Hybrid Machine Learning-Based Model</alt-title>
</title-group>
<contrib-group>
<contrib id="author-1" contrib-type="author">
<name name-style="western"><surname>Li</surname><given-names>Jiawen</given-names></name><xref ref-type="aff" rid="aff-1">1</xref><xref ref-type="aff" rid="aff-2">2</xref></contrib>
<contrib id="author-2" contrib-type="author">
<name name-style="western"><surname>Huang</surname><given-names>Yuesheng</given-names></name><xref ref-type="aff" rid="aff-1">1</xref></contrib>
<contrib id="author-3" contrib-type="author">
<name name-style="western"><surname>Lu</surname><given-names>Yayi</given-names></name><xref ref-type="aff" rid="aff-1">1</xref></contrib>
<contrib id="author-4" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Wang</surname><given-names>Leijun</given-names></name><xref ref-type="aff" rid="aff-1">1</xref><email>wangleijun@gpnu.edu.cn</email></contrib>
<contrib id="author-5" contrib-type="author">
<name name-style="western"><surname>Ren</surname><given-names>Yongqi</given-names></name><xref ref-type="aff" rid="aff-1">1</xref></contrib>
<contrib id="author-6" contrib-type="author">
<name name-style="western"><surname>Chen</surname><given-names>Rongjun</given-names></name><xref ref-type="aff" rid="aff-1">1</xref></contrib>
<aff id="aff-1"><label>1</label><institution>School of Computer Science, Guangdong Polytechnic Normal University</institution>, <addr-line>Guangzhou, 510665</addr-line>, <country>China</country></aff>
<aff id="aff-2"><label>2</label><institution>Hubei Province Key Laboratory of Occupational Hazard Identification and Control, Wuhan University of Science and Technology</institution>, <addr-line>Wuhan, 430065</addr-line>, <country>China</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>&#x002A;</label>Corresponding Author: Leijun Wang. Email: <email>wangleijun@gpnu.edu.cn</email></corresp>
</author-notes>
<pub-date date-type="collection" publication-format="electronic">
<year>2024</year></pub-date>
<pub-date date-type="pub" publication-format="electronic"><day>18</day><month>7</month><year>2024</year></pub-date>
<volume>80</volume>
<issue>1</issue>
<fpage>1581</fpage>
<lpage>1599</lpage>
<history>
<date date-type="received">
<day>10</day>
<month>4</month>
<year>2024</year>
</date>
<date date-type="accepted">
<day>14</day>
<month>6</month>
<year>2024</year>
</date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2024 Li et al.</copyright-statement>
<copyright-year>2024</copyright-year>
<copyright-holder>Li et al.</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CMC_52666.pdf"></self-uri>
<abstract>
<p>In the context of the accelerated pace of daily life and the development of e-commerce, online shopping is a mainstream way for consumers to access products and services. To understand their emotional expressions in facing different shopping experience scenarios, this paper presents a sentiment analysis method that combines the e-commerce review keyword-generated image with a hybrid machine learning-based model, in which the Word2Vec-TextRank is used to extract keywords that act as the inputs for generating the related images by generative Artificial Intelligence (AI). Subsequently, a hybrid Convolutional Neural Network and Support Vector Machine (CNN-SVM) model is applied for sentiment classification of those keyword-generated images. For method validation, the data randomly comprised of 5000 reviews from Amazon have been analyzed. With superior keyword extraction capability, the proposed method achieves impressive results on sentiment classification with a remarkable accuracy of up to 97.13%. Such performance demonstrates its advantages by using the text-to-image approach, providing a unique perspective for sentiment analysis in the e-commerce review data compared to the existing works. Thus, the proposed method enhances the reliability and insights of customer feedback surveys, which would also establish a novel direction in similar cases, such as social media monitoring and market trend research.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Sentiment analysis</kwd>
<kwd>keyword-generated image</kwd>
<kwd>machine learning</kwd>
<kwd>Word2Vec-TextRank</kwd>
<kwd>CNN-SVM</kwd>
</kwd-group>
<funding-group>
<award-group id="awg1">
<funding-source>Guangzhou Science and Technology Plan Project</funding-source>
<award-id>2024B03J1361</award-id>
<award-id>2023B03J1327</award-id>
<award-id>2023A04J0361</award-id>
</award-group>
<award-group id="awg2">
<funding-source>Hubei Province Key Laboratory of Occupational Hazard Identification and Control</funding-source>
<award-id>OHIC2023Y10</award-id>
</award-group>
<award-group id="awg3">
<funding-source>Guangdong Province Ordinary Colleges and Universities Young Innovative Talents</funding-source>
<award-id>2023KQNCX036</award-id>
</award-group>
<award-group id="awg4">
<funding-source>Science and Technology Innovation Strategy of Guangdong Province</funding-source>
<award-id>pdjh2024a226</award-id>
</award-group>
<award-group id="awg5">
<funding-source>Improvement Project of Guangdong Province</funding-source>
<award-id>2022ZDJS015</award-id>
</award-group>
<award-group id="awg6">
<funding-source>Guangdong Polytechnic Normal University</funding-source>
<award-id>22GPNUZDJS17</award-id>
<award-id>2022SDKYA015</award-id>
</award-group>
</funding-group>
</article-meta>
</front>
<body>
<sec id="s1">
<label>1</label>
<title>Introduction</title>
<p>The development of e-commerce and the accelerating pace of daily life have established online shopping as a mainstream way to access services and products [<xref ref-type="bibr" rid="ref-1">1</xref>], as the e-commerce platform facilitates convenient home shopping. Despite its convenience, challenges like the differences between actual items and their descriptions, poor quality, and insufficient after-sales services persist [<xref ref-type="bibr" rid="ref-2">2</xref>]. Usually, on the e-commerce platform, consumers show a lively interest in product reviews for evaluations. These reviews, containing valuable insights into consumer experiences, serve as a vital source of cognitive value, such as considering them before purchasing decisions and seeking information on the advantages and disadvantages to refine their intentions [<xref ref-type="bibr" rid="ref-3">3</xref>]. Previous studies [<xref ref-type="bibr" rid="ref-4">4</xref>&#x2013;<xref ref-type="bibr" rid="ref-6">6</xref>] have emphasized the importance of these reviews in shaping consumer behavior, as they offer firsthand accounts of product performance and satisfaction. However, there exists a research gap in the depth and accuracy of understanding consumer emotional expressions, especially in how to effectively utilize consumer-generated content to gain insights into consumer behavior. In this regard, the sentiment analysis of the e-commerce reviews provides deep insights into consumer emotions, assisting in understanding the impact on purchasing decisions by mitigating uncertainty and risk. Meanwhile, businesses can obtain more information concerning consumer emotional expressions. This, in turn, can lead to improved customer satisfaction, enhance product offerings, and offer a more responsive approach to market dynamics. Therefore, the employment of Artificial Intelligence (AI) delivers prompt properties by analyzing the e-commerce reviews utilizing machine learning and deep learning techniques [<xref ref-type="bibr" rid="ref-7">7</xref>], which help to apply positive sentiments for marketing, optimize advertising strategies, and predict trends advance contributes to a customer-centric mode in e-commerce [<xref ref-type="bibr" rid="ref-8">8</xref>&#x2013;<xref ref-type="bibr" rid="ref-10">10</xref>].</p>
<p>Currently, it is considered that analyzing contextual semantic information is beneficial for interpreting emotions from online reviews [<xref ref-type="bibr" rid="ref-11">11</xref>]. To this end, two typical approaches have been developed, one is the supervised method [<xref ref-type="bibr" rid="ref-12">12</xref>], and another is the lexicon-based method [<xref ref-type="bibr" rid="ref-13">13</xref>], where the first one employs labeled training data to establish a well-fitted model and adopts the machine learning classifier to test the dataset. The second one involves using predefined lists of words, referred to as lexicons or sentiment dictionaries, to recognize the emotions from a piece of text [<xref ref-type="bibr" rid="ref-14">14</xref>]. Then, each word in the lexicon is correlated with an emotional score like negative or positive. Therefore, the entire text can be computed based on the cumulative scores of its component words [<xref ref-type="bibr" rid="ref-15">15</xref>]. Besides, the topic commonly represents the essential properties conveyed in the text data, meaning its analysis aids in obtaining valuable clues. So, statistical-based approaches are also widely applied, such as word frequency co-occurrence matrix, word frequency statistics, and synonym forest [<xref ref-type="bibr" rid="ref-16">16</xref>&#x2013;<xref ref-type="bibr" rid="ref-19">19</xref>].</p>
<p>Undoubtedly, the key to accurate sentiment analysis lies in feature extraction, which directly influences the understanding of the online review, as well as the classification ability. However, the main limitation of conventional methods is that text recognition alone improperly captures the complex emotions and subtleties within the text. This challenge has been highlighted in several previous studies [<xref ref-type="bibr" rid="ref-12">12</xref>,<xref ref-type="bibr" rid="ref-16">16</xref>], which reveal that incorporating high-level characteristics can enhance performance. In this regard, if the properties of review data are further enriched, more details that are beneficial to the classification can be found. Then, with the help of a machine learning-based model, sentiment analysis can be accomplished through such valuable features. Therefore, the motivation of this paper is to propose a sentiment analysis method that uses the text-to-image approach, where the visual data derived from the e-commerce review keyword captures additional layers of emotional context that pure text analysis might miss, which offers a novel perspective for customer experience analysis, addressing the limitations of conventional methods and paving the way for accurate classification. To achieve this goal, inspired by the existing image-based studies, we first apply Word2Vec-TextRank to generate sentence representations from the reviews and extract keyword features. Subsequently, these keywords act as the inputs to generate the corresponding images by generative AI, so the keyword-generated images are obtained, which enrich the text to the image accordingly. Finally, by feeding these images into a hybrid Convolutional Neural Network and Support Vector Machine (CNN-SVM) model, the sentiment classification can be realized, which brings an innovative solution for analyzing sentiment in online product reviews, also enhancing the reliability of feedback analysis in similar cases like social media monitoring and market trend research. For better illustration, <xref ref-type="fig" rid="fig-1">Fig. 1</xref> presents the overview of the proposed method, including data collection, text preprocessing, keyword feature extraction, text-to-image generation, and sentiment classification. In particular, this paper has the following contributions:</p>

<p><list list-type="simple">
<list-item><label>1)</label><p>Novel text-to-image approach: proposing a sentiment analysis method that converts e-commerce reviews into images using generative AI, which helps to capture additional layers of emotional context that pure text analysis might miss.</p></list-item>
<list-item><label>2)</label><p>Enhanced feature extraction: utilizing Word2Vec-TextRank for generating sentence representations and extracting keyword features, which enrich the reviews with visual properties.</p></list-item>
<list-item><label>3)</label><p>Hybrid classification model: implementing a hybrid CNN-SVM model that processes the keyword-generated images for sentiment classification, offering a reliable solution for analyzing sentiment in online product reviews with a remarkable accuracy of up to 97.13%.</p></list-item>
</list></p>
<fig id="fig-1">
<label>Figure 1</label>
<caption>
<title>Overview of the proposed sentiment analysis method</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_52666-fig-1.tif"/>
</fig>
<p>The rest is organized as follows: <xref ref-type="sec" rid="s2">Section 2</xref> presents the related works. <xref ref-type="sec" rid="s3">Section 3</xref> describes the data collection and the proposed method. <xref ref-type="sec" rid="s4">Section 4</xref> shows the experimental results from the online product reviews on Amazon. <xref ref-type="sec" rid="s5">Section 5</xref> discusses the properties of the proposed method based on the results and a comparative study. Finally, the conclusions are drawn in <xref ref-type="sec" rid="s6">Section 6</xref>.</p>
</sec>
<sec id="s2">
<label>2</label>
<title>Related Works</title>
<p>Previously, numerous studies have integrated various computational models employing Natural Language Processing (NLP) and text mining to detect individual emotions and opinions. Mihalcea et al. [<xref ref-type="bibr" rid="ref-20">20</xref>] designed the TextRank algorithm, a graph-based text summarization method that represents words or phrases as nodes in a graph, with edge weights capturing semantic similarity. This algorithm has laid a significant foundation in NLP. However, it lacks true semantic understanding, struggles with varying text lengths and structures, and has difficulty handling synonyms or polysemy effectively due to its reliance on preprocessing quality and statistical co-occurrence without deeper linguistic analysis. Mikolov et al. [<xref ref-type="bibr" rid="ref-21">21</xref>] introduced Word2Vec, a method for learning distributed word representations by capturing semantic relationships within a continuous vector space. While powerful for generating word embeddings, Word2Vec requires substantial computational resources and large amounts of training data to produce high-quality embeddings. Besides, it can be challenging to capture long-range dependencies in text. Huang et al. [<xref ref-type="bibr" rid="ref-22">22</xref>] developed a polymerization topic sentiment model for analyzing text in online product reviews, where semantic information is extracted and filtered from the reviews. By integrating this with machine learning-based classifiers, precise classification of emotions can be achieved, demonstrating that the topics hidden in the review data significantly influence sentiment analysis. Nevertheless, this method relies heavily on high-quality data and involves complex and computationally intensive semantic extraction. Zhang et al. [<xref ref-type="bibr" rid="ref-23">23</xref>] applied cognitive appraisal theory for sentiment classification by combining SVM with latent semantic analysis, while Obiedat et al. [<xref ref-type="bibr" rid="ref-24">24</xref>] presented a hybrid model that combines Particle Swarm Optimization (PSO) with SVM, alongside diverse oversampling methods to address imbalanced data issues in sentiment analysis. These two approaches enhance the performance of sentiment classification but introduce additional complexity and computational demands. By comparing the above techniques, it is clear that while advancements in NLP and text mining have significantly improved sentiment classification, challenges remain in accomplishing semantic understanding, managing computational complexity, and ensuring the scalability of these models across varied domains.</p>
<p>On the other hand, deep learning-based approaches have been extensively employed in sentiment analysis, including Deep Neural Network (DNN) [<xref ref-type="bibr" rid="ref-25">25</xref>], CNN [<xref ref-type="bibr" rid="ref-26">26</xref>], Recurrent Neural Network (RNN) [<xref ref-type="bibr" rid="ref-27">27</xref>], and attention mechanism-based network [<xref ref-type="bibr" rid="ref-28">28</xref>]. These models excel at learning intricate patterns from online review data, making them well-suited for sentiment classification tasks. CNN, in particular, has demonstrated exceptional capability in segmenting and classifying images and achieved significant success in NLP tasks due to its proficiency in processing spatial data. One well-known application of CNN in sentiment analysis was proposed by Meena et al. [<xref ref-type="bibr" rid="ref-29">29</xref>], who aimed to classify sentiment polarity in social media data. They categorized comments preferred by people of different ethnicities into sentiment polarities such as positive, negative, and neutral, achieving an impressive accuracy of 95.4%. Similarly, Kruspe et al. [<xref ref-type="bibr" rid="ref-30">30</xref>] conducted sentiment analysis on European COVID-19-related Twitter messages using a neural network with pre-trained word and sentence embeddings, incorporating skip-gram Word2Vec and multilingual Bidirectional Encoder Representations from Transformers (BERT). This model analyzed a total of 4.6 million tweets and identified 79,000 of them containing COVID-19 keywords with semantic information. Alharbi et al. [<xref ref-type="bibr" rid="ref-31">31</xref>] performed sentiment analysis on Amazon reviews using various RNN variants, including Long Short-Term Memory (LSTM), group LSTM, Gated Recurrent Unit (GRU), and Update Recurrent Unit (URU), to classify customer sentiment as negative, neutral, or positive. These RNNs were combined with different word embeddings (e.g., GloVe, Word2Vec, FastText) for feature extraction, with the group LSTM-based model and FastText achieving the highest accuracy of 93.75%. Bansal et al. [<xref ref-type="bibr" rid="ref-32">32</xref>] proposed a hybrid attribute-based sentiment classification method that integrates Optical Character Recognition (OCR) sentiment orientation by investigating implicit keyword relationships and domain-specific characteristics, validated on Amazon mobile phone reviews and TripAdvisor hotel reviews. Alzahrani et al. [<xref ref-type="bibr" rid="ref-33">33</xref>] studied LSTM and CNN-LSTM for sentiment analysis on Amazon reviews. After preprocessing the data through lowercase conversion, stop-word and punctuation removal, and tokenization, they employed these two models to train the cleaned data for sentiment classification. Mohbey [<xref ref-type="bibr" rid="ref-34">34</xref>] utilized an LSTM model to predict customer review sentiments, accomplishing an accuracy of 93.66%, which reveals the superiority of deep learning-based approaches due to their abilities to handle extensive real-time data and robust feature extraction results.</p>
<p>Furthermore, aspect-based sentiment analysis has a wide range of applications in consumer behavior analysis, market research, and product development. It focuses on identifying and extracting specific emotions or opinions related to particular entities or attributes, known as "aspects," within the text. Unlike traditional sentiment analysis, which typically only determines whether the overall sentiment of the text is positive, negative, or neutral, aspect-based analysis delves deeper into recognizing the specific sentiments directed at different aspects of an entity. For instance, Hajek et al. [<xref ref-type="bibr" rid="ref-35">35</xref>] developed a fake review detection model using an aspect-based method while considering the impact of product types. Utilizing a dataset of Amazon reviews, this model revealed that two aspects, the product category, and the verified purchase attribute, are useful for detecting fake reviews, with the greatest contribution observed for credence and experience product types. In another work, Chen et al. [<xref ref-type="bibr" rid="ref-36">36</xref>] presented an attention-based deep learning approach to capture semantic information. They achieved good results by incorporating syntactic information, which is noteworthy for understanding the structure of sentences. This model integrates a Graph Convolutional Network (GCN) and a co-attention mechanism to handle aspect-based information and eliminate noise from irrelevant contextual words. It allows both semantic and syntactic information to be conveyed to the sentiment analysis, improving the overall accuracy. The aforementioned works indicated that aspect-based methods, particularly when combined with advanced deep learning techniques like attention mechanism and GCN, can provide insightful depth of sentiment analysis.</p>
<p>Both machine learning-based and deep learning-based methods find extensive applications in sentiment analysis, each catering to different needs and scenarios. Machine learning-based methods are well-suited for rapid development and situations where interpretability is vital, making them appropriate to smaller datasets or limited computational resources. Deep learning-based methods excel in handling large-scale datasets and intricate pattern recognition tasks, particularly in capturing the nuanced semantic information within text. However, they require substantial labeled data and computational resources, while the interpretability of deep learning models remains an ongoing challenge. The choice between these methods depends on the specific project requirements, including data size and complexity, resource availability, and the importance of model interpretability. In this regard, a hybrid model is preferred, combining machine learning for feature extraction with deep learning for intricate pattern recognition and comprehensive sentiment analysis, which is also the main purpose of this paper.</p>
</sec>
<sec id="s3">
<label>3</label>
<title>Materials and Methods</title>
<sec id="s3_1">
<label>3.1</label>
<title>Data Collection</title>
<p>The data applied in this paper was collected from Amazon from September 2021 to April 2023, totaling 179,673, including customer ID, review time, overall rating, customer name, product review data, and ASIN, where the ASIN is a unique identifier assigned by Amazon to its products, consists of 10 characters containing letters and numbers. For sentiment analysis, the aim is to categorize the review data as positive or negative. To this end, those reviews with ratings of 5 and 4 were regarded as positive, indicating that customers perceive the product can satisfy their expectations and needs. Conversely, the reviews with ratings of 1 and 2 were marked as negative, meaning that customers found the product unsatisfactory, resulting in a dissatisfied view. Based on that, 162,243 reviews were labeled. In order to achieve the subsequent training and testing through a time-saving manner, 2500 reviews were randomly chosen from the positive and negative labels, respectively, i.e., 5000 reviews were applied for method validation in this paper.</p>
</sec>
<sec id="s3_2">
<label>3.2</label>
<title>Text Preprocessing</title>
<p>Text preprocessing is a vital step to eliminate redundant vocabulary and noise interference. In this paper, the first stage is text segmentation, which involves splitting the acquired data into JavaScript Object Notation (JSON) format to extract the review text required. To this end, we utilize Python&#x2019;s built-in JSON module to parse and process the JSON data. Then, the JSON string is converted to a Python dictionary and segmented using the slicing method. The second stage is case folding, which entails converting all uppercase and lowercase letters to lowercase. We employ Python&#x2019;s built-in lower() to achieve this goal. The third stage is deactivated word filtering, which refers to removing common words that regularly occur in the text but are meaningless to sentiment analysis. For instance, words like &#x201C;a&#x201D;, &#x201C;the&#x201D;, and &#x201C;is&#x201D; often lack meaningful information. Hence, the size of the review data can be reduced to improve sentence representation, enabling the selection of more valuable keyword feature from the text. We use a list of English stopwords downloaded from the nltk library for this operation. The next stage is tokenization, which segments text into tokens, providing finer-grained input for text-processing tasks. This process aids in better understanding and processing of review data. The last stage is lexical stemming, which reduces words to their base forms, addressing morphological variations such as plurals and tenses. For example, &#x201C;dogs&#x201D; is reduced to &#x201C;dog&#x201D;. WordNet, an English vocabulary database derived from the nltk library, is involved in performing the lexical stemming.</p>
</sec>
<sec id="s3_3">
<label>3.3</label>
<title>Keyword Feature Extraction</title>
<p>Word2Vec is a neural word embedding model that learns representations by predicting target words from surrounding words. It can be accomplished through two ways: the Continuous Bag of Words (CBOW) model and the continuous skip-gram model. Both of them focus on reducing the dimensionality of the data and making dense word vectors [<xref ref-type="bibr" rid="ref-37">37</xref>]. Besides, the jump grid assigns more weight to nearer context words compared to more distant context words. Therefore, it can predict the center word with the help of a weighted window of surrounding words [<xref ref-type="bibr" rid="ref-38">38</xref>]. If a sequence of words <inline-formula id="ieqn-1"><mml:math id="mml-ieqn-1"><mml:mi>W</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:msub><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula>, its learning objective function can be expressed by maximizing the log-likelihood probability of <xref ref-type="disp-formula" rid="eqn-1">(1)</xref> and <xref ref-type="disp-formula" rid="eqn-2">(2)</xref>:
<disp-formula id="eqn-1"><label>(1)</label><mml:math id="mml-eqn-1" display="block"><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>C</mml:mi><mml:mi>B</mml:mi><mml:mi>O</mml:mi><mml:mi>W</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>W</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mi>N</mml:mi></mml:mfrac><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:munderover><mml:mrow><mml:mrow><mml:mtext>log Pr</mml:mtext></mml:mrow></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>c</mml:mi><mml:mi>x</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula>
<disp-formula id="eqn-2"><label>(2)</label><mml:math id="mml-eqn-2" display="block"><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>S</mml:mi><mml:mi>k</mml:mi><mml:mi>i</mml:mi><mml:mi>p</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mi>g</mml:mi><mml:mi>r</mml:mi><mml:mi>a</mml:mi><mml:mi>m</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>W</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mi>N</mml:mi></mml:mfrac><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:munderover><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mi>c</mml:mi><mml:mo>&#x2264;</mml:mo><mml:mi>j</mml:mi><mml:mo>&#x2264;</mml:mo><mml:mi>c</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>&#x2260;</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:munder><mml:mrow><mml:mrow><mml:mtext>log Pr</mml:mtext></mml:mrow></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>+</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula>where <inline-formula id="ieqn-2"><mml:math id="mml-ieqn-2"><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>c</mml:mi><mml:mi>x</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> denotes the context word of the current word <inline-formula id="ieqn-3"><mml:math id="mml-ieqn-3"><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, <italic>c</italic> refers to the window size of the context, and the conditional probability <inline-formula id="ieqn-4"><mml:math id="mml-ieqn-4"><mml:mo movablelimits="true" form="prefix">Pr</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> can be obtained by <xref ref-type="disp-formula" rid="eqn-3">(3)</xref>:
<disp-formula id="eqn-3"><label>(3)</label><mml:math id="mml-eqn-3" display="block"><mml:mo movablelimits="true" form="prefix">Pr</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>exp</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x22C5;</mml:mo><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>w</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mi>W</mml:mi></mml:mrow></mml:munder><mml:mi>exp</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mi>w</mml:mi><mml:mo>&#x22C5;</mml:mo><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:math></disp-formula></p>
<p>However, a main limitation of Word2Vec is that it determines a word to be more semantically similar to its neighboring words, although the surrounding words may not exhibit semantic similarity to the current word. To solve it, the TextRank algorithm, an unsupervised automatic keyword extraction technique, is applied. Its primary aim is to decide the most relevant keyword by assessing the importance of each vocabulary node in the text. During the construction of the weighted undirected graph, the edge weights between vocabulary nodes are computed employing various methods, such as the co-occurrence matrix and the cosine similarity [<xref ref-type="bibr" rid="ref-39">39</xref>]. Meanwhile, through the iterative calculation process of PageRank, the score value of each vocabulary node is continuously updated until it converges to a stable state. Mathematically, the TextRank is represented as <xref ref-type="disp-formula" rid="eqn-4">(4)</xref>:
<disp-formula id="eqn-4"><label>(4)</label><mml:math id="mml-eqn-4" display="block"><mml:mi>W</mml:mi><mml:mi>S</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>V</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mi>d</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mi>d</mml:mi><mml:mo>&#x2217;</mml:mo><mml:msub><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>V</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:mi>ln</mml:mi><mml:mo>&#x2217;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>V</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msub><mml:mfrac><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>V</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:mi>O</mml:mi><mml:mi>u</mml:mi><mml:mi>t</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>V</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:munder><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac><mml:mi>W</mml:mi><mml:mi>S</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>V</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula>where <inline-formula id="ieqn-5"><mml:math id="mml-ieqn-5"><mml:mi>W</mml:mi><mml:mi>S</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>V</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> denotes the weight of node <inline-formula id="ieqn-6"><mml:math id="mml-ieqn-6"><mml:msub><mml:mi>V</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, <inline-formula id="ieqn-7"><mml:math id="mml-ieqn-7"><mml:mi>ln</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>V</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> means the set of nodes pointing to <inline-formula id="ieqn-8"><mml:math id="mml-ieqn-8"><mml:msub><mml:mi>V</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, <inline-formula id="ieqn-9"><mml:math id="mml-ieqn-9"><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is the weight of the edge between the two nodes, <inline-formula id="ieqn-10"><mml:math id="mml-ieqn-10"><mml:mi>W</mml:mi><mml:mi>S</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>V</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> refers to the weight of node <inline-formula id="ieqn-11"><mml:math id="mml-ieqn-11"><mml:msub><mml:mi>V</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, and <italic>d</italic> is the damping coefficient, which can be usually set at 0.85.</p>
<p>When a keyword appears in the vocabulary list, its weight will be multiplied by 1.5 on top of the original weight. The construction of the edge set in the graph considers both the positional information of the words, as well as the semantic similarity between them, to construct the transformation matrix. The weight transfer between any two nodes in the graph is realized by <xref ref-type="disp-formula" rid="eqn-5">(5)</xref>:
<disp-formula id="eqn-5"><label>(5)</label><mml:math id="mml-eqn-5" display="block"><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>f</mml:mi><mml:mi>t</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>v</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula>where the eigenvalue <inline-formula id="ieqn-12"><mml:math id="mml-ieqn-12"><mml:mi>f</mml:mi><mml:mi>t</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is shown as <xref ref-type="disp-formula" rid="eqn-6">(6)</xref>:
<disp-formula id="eqn-6"><label>(6)</label><mml:math id="mml-eqn-6" display="block"><mml:mi>f</mml:mi><mml:mi>t</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>v</mml:mi><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>m</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2217;</mml:mo><mml:mi>c</mml:mi><mml:mi>o</mml:mi><mml:mi>c</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula>with <inline-formula id="ieqn-13"><mml:math id="mml-ieqn-13"><mml:mi>c</mml:mi><mml:mi>o</mml:mi><mml:mi>c</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> refers to the co-occurrence frequency of two words, <inline-formula id="ieqn-14"><mml:math id="mml-ieqn-14"><mml:mi>v</mml:mi><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>m</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> indicates the semantic similarity between <inline-formula id="ieqn-15"><mml:math id="mml-ieqn-15"><mml:msub><mml:mi>V</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula id="ieqn-16"><mml:math id="mml-ieqn-16"><mml:msub><mml:mi>V</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, which can be computed using the cosine similarity presented in <xref ref-type="disp-formula" rid="eqn-7">(7)</xref>:
<disp-formula id="eqn-7"><label>(7)</label><mml:math id="mml-eqn-7" display="block"><mml:mi>v</mml:mi><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>m</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>cos</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x03B8;</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x22C5;</mml:mo><mml:msub><mml:mi>v</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mtable columnalign="left left" rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mrow><mml:mo symmetric="true">&#x2016;</mml:mo><mml:msub><mml:mi>v</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo symmetric="true">&#x2016;</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mo symmetric="true">&#x2016;</mml:mo><mml:msub><mml:mi>v</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo symmetric="true">&#x2016;</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mfrac></mml:math></disp-formula></p>
<p>In constructing a word graph model, co-occurrence frequency is an important parameter. The word graph model organizes the words in the text into a network structure, with words as nodes and the semantic relationships between them represented by boundaries. Typically, co-occurrence frequency measures the connection weights between words based on the number of co-occurrences, which helps to construct a more accurate word graph model.</p>
<p>Besides, <inline-formula id="ieqn-17"><mml:math id="mml-ieqn-17"><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>v</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> in <xref ref-type="disp-formula" rid="eqn-5">(5)</xref> represents the importance of the words in the text, which can be calculated by the following <xref ref-type="disp-formula" rid="eqn-8">(8)</xref> and <xref ref-type="disp-formula" rid="eqn-9">(9)</xref>:
<disp-formula id="eqn-8"><label>(8)</label><mml:math id="mml-eqn-8" display="block"><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>v</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:munder><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mi>O</mml:mi><mml:mi>u</mml:mi><mml:mi>t</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>v</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:munder><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:math></disp-formula></p>
<p><disp-formula id="eqn-9"><label>(9)</label><mml:math id="mml-eqn-9" display="block"><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mtable columnalign="left left" rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mn>1.0</mml:mn><mml:mo>,</mml:mo></mml:mtd><mml:mtd><mml:mi>j</mml:mi><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mrow><mml:mtext>appears at the beginning of the paragraph</mml:mtext></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mn>0.8</mml:mn><mml:mo>,</mml:mo></mml:mtd><mml:mtd><mml:mrow><mml:mtext>else</mml:mtext></mml:mrow></mml:mtd></mml:mtr></mml:mtable><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo></mml:mrow></mml:math></disp-formula>where <inline-formula id="ieqn-18"><mml:math id="mml-ieqn-18"><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is the weight of the occurrence position. When <italic>j</italic> appears at the beginning of a paragraph, its weight will be assigned as 1.0, and in other positions, it will be 0.8. This is because the opening word is likely to be the beginning of a paragraph topic or keyword. Conversely, in other positions, it is relatively less important.</p>
<p>The Word2Vec-TextRank integrates the representation of word vectors from Word2Vec with the graph-theoretic algorithm of TextRank to discern pivotal concepts within texts. In particular, Word2Vec generates dense vectors that encapsulate the contextual meaning of words through extensive learning from textual data. Then, TextRank constructs a directed graph where nodes denote words and edge weights are determined by the similarity between these vectors. Through the iterative computation, TextRank assigns an importance score to each word and delineates the keywords. As a result, the main advantage of Word2Vec-TextRank for keyword extraction is that it can take into account both the semantic relationships and contextual information. That means this combination transcends mere word frequency, showing the semantic roles and contextual significance of words within the text, which allows for the precision of keyword identification by incorporating semantic and contextual cues.</p>
</sec>
<sec id="s3_4">
<label>3.4</label>
<title>Text-to-Image Generation</title>
<p>After the feature extraction, we can obtain a series of keywords related to emotions and products that provide the consumer&#x2019;s point of view. In order to enrich the properties of keywords and represent them as image data, a stable diffusion generative AI is trained using the pre-labeled images derived from the keywords. Subsequently, we input the extracted keywords into this generative AI to gain a series of point-to-point images, realizing text-to-image generation.</p>
<p>To this end, first, the generative AI model gradually expands on the low-resolution images to form high-resolution images, in which the model fine-tunes the details of the generated images according to the inputted keywords so that they can correctly express the meanings represented by the keywords. Second, we compare the generated images with the inputted keywords to assess the performance in terms of text-to-image generation, i.e., well-matched with the keywords or not. This step requires not only examining the correspondences between the images and the keywords but also assessing the quality of the generated images. Finally, we accomplish the transformation from natural language to visualized images, providing the text-to-image source for subsequent sentiment classification. In short, the phase of generating images is the core of the proposed method, where the keywords extracted by Word2Vec-TextRank are employed as the inputs, and the generative AI is adopted to generate images associated with the inputs.</p>
</sec>
<sec id="s3_5">
<label>3.5</label>
<title>Sentiment Classification</title>
<p>CNN is a feed-forward neural network triggered by biological visual cognitive mechanisms. By accepting raw graphical data and avoiding complex preprocessing steps, CNN has a wide scope of applications. In this paper, it is the fundamental component utilized within the neural network for processing the keyword-generated images, as displayed in <xref ref-type="fig" rid="fig-2">Fig. 2</xref>.</p>
<fig id="fig-2">
<label>Figure 2</label>
<caption>
<title>The CNN used for processing the keyword-generated images</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_52666-fig-2.tif"/>
</fig>
<p>In <xref ref-type="fig" rid="fig-2">Fig. 2</xref>, the keyword-generated images act as inputs to the neural network. Concerning the convolutional layer, a kernel is used to perform a convolutional operation on the input image to obtain the features of this image. As for the outcome of the convolutional layer, a nonlinear activation function is necessary, and we adopt the ReLU function to increase the nonlinearity of the network. Next, regarding the pooling layer, downsampling is performed and maximum pooling is included to decrease the dimensionality of the feature map and retain vital semantic information. Finally, the feature maps are spread in the fully connected layer, and the outputs are denoted as feature vectors, which can be used to train the subsequent SVM model for conducting the sentiment classification.</p>
<p>Regarding the SVM, its principle is to determine an optimal hyperplane that maximizes the separation between samples of different types. For linearly divisible datasets (i.e., perceptrons), there are infinitely hyperplanes. However, the separating hyperplane with the largest geometric spacing is unique. Therefore, the SVM is to identify the support vectors, i.e., the sample points closest to the hyperplane. These support vectors determine the location and orientation of the hyperplane and play a vital role in determining the optimal hyperplane. That means the hyperparameter optimization is crucial for SVM. To this end, it is essential to identify the optimal combination of hyperparameters tailored to the characteristics of different feature vectors, ensuring a better fit to the training data for improving predictive capability. Meanwhile, optimizing hyperparameters maximizes model performance while mitigating the risks of overfitting or underfitting, which helps to enhance generalization to unseen data and saves time and computational resources. In this regard, fine-tuning key hyperparameters, such as the penalty parameters (<italic>C</italic> values) and kernel function parameters, can lead to optimal results, which improve the robustness and classification performance in sentiment analysis [<xref ref-type="bibr" rid="ref-40">40</xref>]. To achieve this goal, we implement hyperparameter optimization by the GridSearchCV in the sklearn library, which forms a parameter grid by exhaustively enumerating the given candidate hyperparameters and then cross-validates each set of hyperparameters to assess the performance. Hence, by comparing different combinations, the best one can be selected as the optimal configuration.</p>
<p>In short, we capitalize on the image feature extraction capabilities from CNN and the discriminative power from SVM to enhance accuracy. This hybrid CNN-SVM architecture offers a synergistic approach to sentiment classification, combining the strengths of both models to realize improved performance.</p>
</sec>
</sec>
<sec id="s4">
<label>4</label>
<title>Experimental Results</title>
<sec id="s4_1">
<label>4.1</label>
<title>Review Data Results</title>
<p>The data initially collected 179,673 product reviews and then performed preprocessing, where the reviews with a rating of 3 were removed, resulting in 162,243 data. These reviews were subsequently labeled as positive and negative, respectively. For training and testing through a time-saving way, we randomly selected 2500 reviews each for the positive and negative emotions, resulting in 5000 data for method validation, which also guarantees a balanced sample distribution. <xref ref-type="table" rid="table-1">Table 1</xref> lists the examples of the collected product reviews from Amazon.</p>
<table-wrap id="table-1">
<label>Table 1</label>
<caption>
<title>The examples of the collected product reviews from Amazon</title>
</caption>
<table frame="hsides" >
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th>Customer ID</th>
<th>Review data</th>
<th>Overall rating</th>
<th>Category</th>
</tr>
</thead>
<tbody>
<tr>
<td>A14BTJRH9VNLJJ</td>
<td>One CD,&#x2026;, I think that you will like this one as well. Buy it!</td>
<td>5</td>
<td>Positive</td>
</tr>
<tr>
<td>A3DL686B8JEM8A</td>
<td>People tend to glorify their rock and roll idols,&#x2026;, Overall 10 out of 10.</td>
<td>5</td>
<td>Positive</td>
</tr>
<tr>
<td>AVOKN4NDZCN78</td>
<td>Don&#x2019;t last but a couple of weeks,..., Will never buy it again even if only 5 bucks.</td>
<td>1</td>
<td>Negative</td>
</tr>
<tr>
<td>AHF7SNSZPRNCE</td>
<td>Second one I bought,&#x2026;, but these are junk.</td>
<td>1</td>
<td>Negative</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Word clouds are commonly used as a tool for analyzing qualitative sentiment data, helping to visualize which words are most frequent in consumer reviews. Now, we concentrate on the preprocessed review data (i.e., de-deactivated words) and the words after word-splitting for frequency counting. The results of word clouds are depicted in <xref ref-type="fig" rid="fig-3">Fig. 3</xref>.</p>
<fig id="fig-3">
<label>Figure 3</label>
<caption>
<title>Word clouds of sentiment data from the online product review on Amazon</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_52666-fig-3.tif"/>
</fig>
<p>In <xref ref-type="fig" rid="fig-3">Fig. 3</xref>, the common words in the word clouds like &#x201C;show&#x201D;, &#x201C;season&#x201D;, &#x201C;one&#x201D;, &#x201C;like&#x201D;, &#x201C;good&#x201D;, &#x201C;great&#x201D;, and &#x201C;series&#x201D; suggest that consumers focus on aspects like product presentation, seasonal demand, individual product ratings, as well as their satisfaction and liking of the product in their reviews. Thus, it indicates that the data collection for sentiment classification is proper.</p>

</sec>
<sec id="s4_2">
<label>4.2</label>
<title>Keyword Extraction Results</title>
<p>Regarding the 5000 review data extracted, 70% is utilized as the training set, while the rest 30% serves as the test set, i.e., 3500 reviews as training data and 1500 as test data. Then, keywords are extracted by Word2Vec-TextRank, which is beneficial for acquiring the most representative keywords from the reviews. These keywords are adopted for establishing the word vectors, facilitating a comprehensive reflection of contextual semantic information. This process enables each review to bring its corresponding list characterized by representativeness. The keywords extracted from the examples of online product reviews are displayed in <xref ref-type="table" rid="table-2">Table 2</xref>.</p>
<table-wrap id="table-2">
<label>Table 2</label>
<caption>
<title>The examples of online product reviews after keyword extraction</title>
</caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th>Customer ID</th>
<th>Review data</th>
<th>Overall rating</th>
<th>Keyword extraction</th>
</tr>
</thead>
<tbody>
<tr>
<td>A2I6KBFBFIORH7</td>
<td>We bought,..., them continually ever since,..., have been well worth the cost.</td>
<td>Positive</td>
<td>&#x2018;good&#x2019;, &#x2018;product&#x2019;, &#x2018;excellent&#x2019;, &#x2018;customer&#x2019;, &#x2018;service&#x2019;</td>
</tr>
<tr>
<td>A30S5MTFNNNOH0</td>
<td>My daughter,..., the kids, and the dog are now getting their own.</td>
<td>Positive</td>
<td>&#x2018;great&#x2019;, &#x2018;idea&#x2019;, &#x2018;keep&#x2019;, &#x2018;animals&#x2019;, &#x2018;kids&#x2019;</td>
</tr>
<tr>
<td>A1TJ393OFP21I8</td>
<td>Total flimsy junk,&#x2026;, DO NOT BUY THIS PIECE OF JUNK.</td>
<td>Negative</td>
<td>&#x2018;terrible&#x2019;, &#x2018;wouldn&#x2019;t&#x2019;, &#x2018;guitar&#x2019;, &#x2018;stand&#x2019;, &#x2018;forget&#x2019;</td>
</tr>
<tr>
<td>A34O0KQV4QXWNQ</td>
<td>I don&#x2019;t know who designed the leg feature of this thing,..., just don&#x2019;t like the features of this stand at all.</td>
<td>Negative</td>
<td>&#x2018;locking&#x2019;, &#x2018;legs&#x2019;, &#x2018;place&#x2019;, &#x2018;caused&#x2019;, &#x2018;scuffed&#x2019;</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s4_3">
<label>4.3</label>
<title>Sentiment Analysis Results</title>
<p>The performance evaluation is presented by the confusion matrix assessed by True Negative (TN), True Positive (TP), False Negative (FN), and False Positive (FP). We can also employ them to calculate the typical evaluation metrics used in the classification task, including accuracy, precision, recall, F1-score, and specificity. It is improper to evaluate the robustness of a classification model solely by a specific metric. For example, in an unbalanced dataset, relying only on accuracy fails to indicate the performance adequately. In this regard, hyperparameter tuning becomes imperative to identify an optimal scenario across various evaluation metrics. That is why we need hyperparameter optimization. Besides, we evaluate various classifiers, including Random Forest (RF), Logistic Regression Classification (Logi), Decision Tree (DT), Naive Bayesian (NB), and SVM, aiming to validate the advances of using the hybrid CNN-SVM model as an optimized classifier in the proposed method. <xref ref-type="table" rid="table-3">Table 3</xref> shows the evaluation metrics of each classifier, and <xref ref-type="fig" rid="fig-4">Fig. 4</xref> illustrates the results of confusion matrices.</p>
<table-wrap id="table-3">
<label>Table 3</label>
<caption>
<title>The evaluation metrics using various classifiers and the proposed hybrid CNN-SVM model</title>
</caption>
<table frame="hsides" >
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th>Classifier</th>
<th>Accuracy (%)</th>
<th>Precision (%)</th>
<th>Recall (%)</th>
<th>F1-score (%)</th>
<th>Specificity (%)</th>
</tr>
</thead>
<tbody>
<tr>
<td>RF</td>
<td>94.00</td>
<td>94.84</td>
<td>93.36</td>
<td>94.09</td>
<td>94.67</td>
</tr>
<tr>
<td>Logi</td>
<td>94.07</td>
<td>95.57</td>
<td>92.71</td>
<td>94.12</td>
<td>95.49</td>
</tr>
<tr>
<td>DT</td>
<td>92.27</td>
<td>90.95</td>
<td>94.27</td>
<td>92.58</td>
<td>90.16</td>
</tr>
<tr>
<td>NB</td>
<td>88.53</td>
<td>84.49</td>
<td>95.05</td>
<td>89.46</td>
<td>81.69</td>
</tr>
<tr>
<td>SVM</td>
<td>94.93</td>
<td>95.77</td>
<td>94.27</td>
<td>95.01</td>
<td>95.63</td>
</tr>
<tr>
<td><bold>CNN-SVM</bold></td>
<td><bold>97.13</bold></td>
<td><bold>97.73</bold></td>
<td><bold>96.57</bold></td>
<td><bold>97.15</bold></td>
<td><bold>97.71</bold></td>
</tr>
</tbody>
</table>
</table-wrap><fig id="fig-4">
<label>Figure 4</label>
<caption>
<title>The confusion matrices using different classification methods: (a) RF; (b) Logi; (c) DT; (d) NB; (e) SVM; (f) CNN-SVM</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_52666-fig-4.tif"/>
</fig>
<p>In this evaluation, various classifiers adopt consistent feature extraction procedures, and the comparisons in <xref ref-type="table" rid="table-3">Table 3</xref> and <xref ref-type="fig" rid="fig-4">Fig. 4</xref> demonstrate that our method performs best, achieving 97.13%, 97.73%, 96.57%, 97.15%, and 97.71% on the accuracy, precision, recall, F1-score, and specificity, respectively. Such performances indicate that all the metrics of sentiment analysis are improved when CNN-SVM is employed in conjunction with the Word2Vec-TextRank.</p>

<p>Next, it is crucial to emphasize that hyperparameter tuning significantly impacts the results, and hyperparameter optimization can enhance the predictive capability and generalizability across various cases, so it is vital to assess them in detail. The hyperparameter optimization employing Python yields different results varied with penalty parameters (<italic>C</italic> values), as shown in <xref ref-type="table" rid="table-4">Table 4</xref>. Here, by adjusting the <italic>C</italic> value, we can make a trade-off between overfitting and underfitting the model. <xref ref-type="table" rid="table-4">Table 4</xref> displays that the CNN-SVM model accomplishes the best when the <italic>C</italic> value is 1, which indicates that the proposed method offers remarkable ability in sentiment analysis for both positive and negative categories. It also reveals the importance of keyword-generated image, which can supply trustworthy features for sentiment classification.</p>
<table-wrap id="table-4">
<label>Table 4</label>
<caption>
<title>The results of CNN-SVM under various penalty parameters (<italic>C</italic> values)</title>
</caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th>Penalty parameters</th>
<th>Accuracy (%)</th>
<th>Precision (%)</th>
<th>Recall (%)</th>
<th>F1-score (%)</th>
<th>Specificity (%)</th>
</tr>
</thead>
<tbody>
<tr>
<td>0.1</td>
<td>96.67</td>
<td>97.20</td>
<td>96.17</td>
<td>96.68</td>
<td>97.17</td>
</tr>
<tr>
<td><bold>1</bold></td>
<td><bold>97.13</bold></td>
<td><bold>97.73</bold></td>
<td><bold>96.57</bold></td>
<td><bold>97.15</bold></td>
<td><bold>97.71</bold></td>
</tr>
<tr>
<td>10</td>
<td>96.27</td>
<td>96.80</td>
<td>95.78</td>
<td>96.29</td>
<td>96.77</td>
</tr>
<tr>
<td>100</td>
<td>94.87</td>
<td>95.47</td>
<td>94.33</td>
<td>94.90</td>
<td>95.41</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s4_4">
<label>4.4</label>
<title>Training Loss</title>
<p>The hybrid CNN-SVM model is performed to train keyword-generated image derived from the Words2Vec-TextRank and generative AI. Then, as for the SVM, its kernel function is the Radial Basis Function (RBF) with penalty parameter <italic>C</italic> &#x003D; 1. After 100 rounds of training, the loss values are stabilized. The loss and accuracy iteration curves are drawn in <xref ref-type="fig" rid="fig-5">Fig. 5</xref>.</p>
<fig id="fig-5">
<label>Figure 5</label>
<caption>
<title>The loss and accuracy iteration curves of the proposed model</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_52666-fig-5.tif"/>
</fig>
<p>After the optimization, the parameters are determined, and the variation in the loss value is mostly affected by the structure of the model. In particular, the dynamic variation of the loss value reveals how well the model fits the training data, where a higher loss value implies that the model improperly describes the data accurately enough, while a lower loss value indicates higher accuracy. <xref ref-type="fig" rid="fig-5">Fig. 5</xref> demonstrates that the hybrid CNN-SVM model exhibits stability when the training iterations approach 50, and almost remains unchanged until 100. Consequently, it reaches convergence on the current training data and provides effectiveness for sentiment analysis.</p>

</sec>
</sec>
<sec id="s5">
<label>5</label>
<title>Discussion</title>
<p>First, the choice of the hybrid CNN-SVM architecture for sentiment classification stems from the complementary strengths of both models, where CNN is adept at capturing local patterns and spatial relationships within the keyword-generated image data. On the other side, SVM exhibits its ability to handle high-dimensional feature spaces and non-linear decision boundaries. As a result, by integrating CNN and SVM, we employ the feature extraction capabilities of CNN to capture meaningful data representations, and SVM offers a robust mechanism for classifying the feature vectors into sentiment categories. It can be said that the hierarchical feature extraction process of CNN followed by the discriminative classification of SVM is beneficial for finding the sentiment patterns within the e-commerce review keyword-generated image, supplying more trustworthy features for improving sentiment classification accuracy accordingly. That is why the CNN-SVM model shows better than others.</p>
<p>Second, a comparative study with previous works is presented in <xref ref-type="table" rid="table-5">Table 5</xref>. It indicates that the proposed model yields an impressive performance in the relevant tasks compared to traditional text-based sentiment analysis approaches, where almost all of them achieved the sentiment analysis by considering NLP and text mining techniques. The reason may be the limitation of text features in sentiment analysis based on online product review data since their susceptibility to noise and ambiguity inherent in natural language. Online product reviews often contain informal language, spelling errors, sarcasm, and other linguistic nuances that can affect the accuracy of sentiment analysis. Besides, text features may struggle to capture the sentiment expressed in images, videos, or other non-textual elements presented in online reviews. Now, we convert text into image, which aids in deep insights into emotions within online reviews. This advance can be attributed to the keyword-generated image based on Word2Vec-TextRank and generative AI, as the text-to-image feature effectively represents valuable details from large amounts of review data and reflects its semantic information during training, so improving the understanding and prediction of online product reviews.</p>
<table-wrap id="table-5">
<label>Table 5</label>
<caption>
<title>A comparative study with previous works</title>
</caption>
<table frame="hsides" >
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead valign="top">
<tr>
<th>Work</th>
<th>Main methodology</th>
<th>Dataset</th>
<th>Highest<break/> accuracy (%)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Hammou et al. [<xref ref-type="bibr" rid="ref-27">27</xref>]</td>
<td>FastText with LSTM, BiLSTM, and GRU</td>
<td>Yelp and Twitter</td>
<td>93.28</td>
</tr>
<tr>
<td>Meena et al. [<xref ref-type="bibr" rid="ref-29">29</xref>]</td>
<td>Keras embedding with CNN</td>
<td>Twitter</td>
<td>95.40</td>
</tr>
<tr>
<td>Alharbi et al. [<xref ref-type="bibr" rid="ref-31">31</xref>]</td>
<td>FastText with LSTM</td>
<td>Amazon reviews</td>
<td>93.75</td>
</tr>
<tr>
<td>Alzahrani et al. [<xref ref-type="bibr" rid="ref-33">33</xref>]</td>
<td>Keras text tokenizer with LSTM, and CNN-LSTM</td>
<td>Amazon reviews</td>
<td>94.00</td>
</tr>
<tr>
<td>Mohbey [<xref ref-type="bibr" rid="ref-34">34</xref>]</td>
<td>NB, SVM, decision tree, Logistic regression, and LSTM</td>
<td>Amazon reviews</td>
<td>93.66</td>
</tr>
<tr>
<td>Hajek et al. [<xref ref-type="bibr" rid="ref-35">35</xref>]</td>
<td>W2VLDA with DNN</td>
<td>Amazon reviews</td>
<td>82.89</td>
</tr>
<tr>
<td>Chen et al. [<xref ref-type="bibr" rid="ref-36">36</xref>]</td>
<td>GCN with co-attention mechanism</td>
<td>Twitter, Lap14, Rest14, Rest15, and Rest16</td>
<td>89.94</td>
</tr>
<tr>
<td>Labhsetwar [<xref ref-type="bibr" rid="ref-41">41</xref>]</td>
<td>Extra trees classifier, XGBoosting, and SVM</td>
<td>Telecommunication reviews</td>
<td>89.87</td>
</tr>
<tr>
<td>Li et al. [<xref ref-type="bibr" rid="ref-42">42</xref>]</td>
<td>Word2Vec with BERT &#x002B; Fully connected (FC) layer</td>
<td>Stock reviews</td>
<td>92.65</td>
</tr>
<tr>
<td>Gaye et al. [<xref ref-type="bibr" rid="ref-43">43</xref>]</td>
<td>Regression Vector-stochastic gradient descent classifier (RV-SGDC)</td>
<td>Employee reviews</td>
<td>97.00</td>
</tr>
<tr>
<td><bold>This work</bold></td>
<td><bold>Word2Vec-TextRank with hybrid CNN-SVM model</bold></td>
<td><bold>Amazon reviews</bold></td>
<td><bold>97.13</bold></td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Furthermore, to demonstrate the superiority of the proposed approach, benchmarking using other state-of-the-art sentiment analysis methods has been conducted on the same 5000 reviews from Amazon evaluated in this paper, as summarized in <xref ref-type="table" rid="table-6">Table 6</xref>. As seen, the proposed approach shows superiority by combining lightweight feature extraction with text-to-image generation and the hybrid model. Concerning the Word2Vec-TextRank applied in the feature extraction, it not only accomplishes lightweight text processing but also appropriately captures the core semantic information of the text. Compared with large-scale language models such as BERT [<xref ref-type="bibr" rid="ref-30">30</xref>,<xref ref-type="bibr" rid="ref-42">42</xref>] and Sentiment Knowledge Enhanced Pre-training (SKEP) [<xref ref-type="bibr" rid="ref-44">44</xref>], the Word2Vec-TextRank is resource-saving and performs well in keyword extraction. This is evident from the fact that the proposed method outperforms the two models in the evaluation metrics. Besides, SenticNet-based GCN [<xref ref-type="bibr" rid="ref-45">45</xref>] and Graph Attention Network Model Incorporating Syntactic, Semantic, and Knowledge (SSK-GAT) [<xref ref-type="bibr" rid="ref-46">46</xref>] have achieved impressive results previously, where SenticNet-based GCN shows good results in aspect-level sentiment analysis by utilizing sentiment knowledge enhancement through an augmented GCN, while SSK-GAT employs graph attention networks and focuses on aspect-level sentiment classification. In this regard, the proposed approach outperforms them, mainly due to the innovative feature extraction that performs on keyword-generated image, which helps the model to concentrate on learning the most representative and distinguishing information in both visual and textual sides. Consequently, it can be said that text-to-image is a key point to improving sentiment analysis in the proposed method, especially when the dataset has a small size of 5000.</p>
<table-wrap id="table-6">
<label>Table 6</label>
<caption>
<title>Benchmarking using other state-of-the-art sentiment analysis methods</title>
</caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th></th>
<th>Accuracy (%)</th>
<th>Precision (%)</th>
<th>Recall (%)</th>
<th>F1-score (%)</th>
<th>Specificity (%)</th>
</tr>
</thead>
<tbody>
<tr>
<td>BERT [<xref ref-type="bibr" rid="ref-30">30</xref>,<xref ref-type="bibr" rid="ref-42">42</xref>]</td>
<td>90.68</td>
<td>90.37</td>
<td>89.32</td>
<td>89.47</td>
<td>90.68</td>
</tr>
<tr>
<td>SKEP [<xref ref-type="bibr" rid="ref-44">44</xref>]</td>
<td>88.52</td>
<td>88.97</td>
<td>87.84</td>
<td>88.12</td>
<td>89.20</td>
</tr>
<tr>
<td>SenticNet-based GCN [<xref ref-type="bibr" rid="ref-45">45</xref>]</td>
<td>90.84</td>
<td>91.43</td>
<td>90.16</td>
<td>90.43</td>
<td>91.52</td>
</tr>
<tr>
<td>SSK-GAT [<xref ref-type="bibr" rid="ref-46">46</xref>]</td>
<td>92.08</td>
<td>92.65</td>
<td>91.40</td>
<td>91.97</td>
<td>92.76</td>
</tr>
<tr>
<td><bold>This work</bold></td>
<td><bold>97.13</bold></td>
<td><bold>97.73</bold></td>
<td><bold>96.57</bold></td>
<td><bold>97.15</bold></td>
<td><bold>97.71</bold></td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Finally, although we have achieved relatively remarkable results, the proposed method still has potential for improvement in the future. On one hand, the use of generative AI models to convert keywords into images faces challenges in consistency and quality control, as its process is dependent on the size of pre-labeled images derived from the keywords. On the other hand, while the dataset of 5,000 reviews from Amazon provides a solid foundation for this study, its specificity limits the generalizability of the findings. Thus, we would like to collect a large dataset of reviews from a variety of e-commerce platforms, such as Flipkart, Alibaba, Shopify, and eBay, to comprehensively assess the results in recognizing more complex sentiments concerning online feedback of product reviews.</p>
</sec>
<sec id="s6">
<label>6</label>
<title>Conclusions</title>
<p>In this paper, we contribute a sentiment analysis method of online product reviews using a hybrid machine learning-based model with text-to-image features, where TextRank is applied to acquire the importance of sentences throughout the text, and Word2Vec is used to extract the representations by predicting the target word from the surrounding words. The keyword features are then adopted to generate images through generative AI, so the keyword-generated images can be made into the inputs for training the CNN-SVM model for sentiment classification, along with determining an optimal hyperplane that maximizes the margins between samples of different sentiment categories.</p>
<p>As for the experiments, the reviews of various products on Amazon from September 2021 to April 2023, totaling 179,673 reviews, are collected at first. Considering the sample balance and time-saving issues, we randomly used 2500 positive and negative reviews each, totaling 5000 data for method validation. Besides, Python is used for programming based on the keras library and the sklearn library to process the reviews data and classify them into positive and negative categories. The results demonstrate that the proposed method holds the advantage of recognizing the keyword-generated image, accomplishing 97.13%, 97.73%, 96.57%, 97.15%, and 97.71% on the accuracy, precision, recall, F1-score, and specificity, respectively, which realizes the robustness to analyze the e-commerce reviews. As a result, it can be concluded that the text-to-image manner is beneficial for data representations of the e-commerce reviews, and the hybrid CNN-SVM model is appropriate to improve the classification performance through the keyword-generated image. This brings an innovative solution to enhance the accuracy and depth of sentiment analysis, and also establish a direction in conducting similar cases, such as social media monitoring and market trend research. In the future, several state-of-the-art models in emotion recognition will be employed to test the keyword-generated image, which expects high performance while reducing computational time and cost.</p>
</sec>
</body>
<back>
<ack>
<p>The authors would like to appreciate the special support from Digital Content Processing and Security Technology of Guangzhou Key Laboratory.</p>
</ack>
<sec><title>Funding Statement</title>
<p>This work was supported in part by the Guangzhou Science and Technology Plan Project under Grants 2024B03J1361, 2023B03J1327, and 2023A04J0361, in part by the Open Fund Project of Hubei Province Key Laboratory of Occupational Hazard Identification and Control under Grant OHIC2023Y10, in part by the Guangdong Province Ordinary Colleges and Universities Young Innovative Talents Project under Grant 2023KQNCX036, in part by the Special Fund for Science and Technology Innovation Strategy of Guangdong Province (Climbing Plan) under Grant pdjh2024a226, in part by the Key Discipline Improvement Project of Guangdong Province under Grant 2022ZDJS015, and in part by the Research Fund of Guangdong Polytechnic Normal University under Grants 22GPNUZDJS17 and 2022SDKYA015.</p>
</sec>
<sec><title>Author Contributions</title>
<p>Study conception and design: Jiawen Li, Yuesheng Huang, Yayi Lu; data collection: Yuesheng Huang, Yongqi Ren; analysis and interpretation of results: Jiawen Li, Leijun Wang, Rongjun Chen; draft manuscript preparation: Jiawen Li, Yuesheng Huang, Yayi Lu. All authors reviewed the results and approved the final version of the manuscript.</p>
</sec>
<sec sec-type="data-availability"><title>Availability of Data and Materials</title>
<p>The data that support the findings of this study are openly available in <ext-link ext-link-type="uri" xlink:href="https://github.com/Yorkson-huang/Text-to-image-SA">https://github.com/Yorkson-huang/Text-to-image-SA</ext-link> (accessed on 17/05/2024).</p>
</sec>
<sec sec-type="COI-statement"><title>Conflicts of Interest</title>
<p>The authors declare that they have no conflicts of interest to report regarding the present study.</p>
</sec>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>L.</given-names> <surname>Yang</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Wang</surname></string-name>, and <string-name><given-names>R. S.</given-names> <surname>Sherratt</surname></string-name></person-group>, &#x201C;<article-title>Sentiment analysis for e-commerce product reviews in Chinese based on sentiment lexicon and deep learning</article-title>,&#x201D; <source>IEEE Access</source>, vol. <volume>8</volume>, pp. <fpage>23522</fpage>&#x2013;<lpage>23530</lpage>, <year>2020</year>. doi: <pub-id pub-id-type="doi">10.1109/ACCESS.2020.2969854</pub-id>.</mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>R.</given-names> <surname>Liang</surname></string-name> and <string-name><given-names>J. Q.</given-names> <surname>Wang</surname></string-name></person-group>, &#x201C;<article-title>A linguistic intuitionistic cloud decision support model with sentiment analysis for product selection in e-commerce</article-title>,&#x201D; <source>Int. J. Fuzzy Syst.</source>, vol. <volume>21</volume>, no. <issue>3</issue>, pp. <fpage>963</fpage>&#x2013;<lpage>977</lpage>, <year>2019</year>. doi: <pub-id pub-id-type="doi">10.1007/s40815-019-00606-0</pub-id>.</mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>P.</given-names> <surname>Ji</surname></string-name>, <string-name><given-names>H. Y.</given-names> <surname>Zhang</surname></string-name>, and <string-name><given-names>J. Q.</given-names> <surname>Wang</surname></string-name></person-group>, &#x201C;<article-title>A fuzzy decision support model with sentiment analysis for items comparison in e-commerce: The case study of http://PConline.com</article-title>,&#x201D; <source>IEEE Trans. Syst. ManCybern. Syst.</source>, vol. <volume>49</volume>, no. <issue>10</issue>, pp. <fpage>1993</fpage>&#x2013;<lpage>2004</lpage>, <year>2018</year>. doi: <pub-id pub-id-type="doi">10.1109/TSMC.2018.2875163</pub-id>.</mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>P.</given-names> <surname>Puengwattanapong</surname></string-name> and <string-name><given-names>A.</given-names> <surname>Leelasantitham</surname></string-name></person-group>, &#x201C;<article-title>A holistic perspective model of plenary online consumer behaviors for sustainable guidelines of the electronic business platforms</article-title>,&#x201D; <source>Sustainability</source>, vol. <volume>14</volume>, no. <issue>10</issue>, pp. <fpage>6131</fpage>, <year>2022</year>. doi: <pub-id pub-id-type="doi">10.3390/su14106131</pub-id>.</mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>E. G.</given-names> <surname>Dias</surname></string-name>, <string-name><given-names>L. K.</given-names> <surname>de Oliveira</surname></string-name>, and <string-name><given-names>C. A.</given-names> <surname>Isler</surname></string-name></person-group>, &#x201C;<article-title>Assessing the effects of delivery attributes on e-shopping consumer behaviour</article-title>,&#x201D; <source>Sustainability</source>, vol. <volume>14</volume>, no. <issue>1</issue>, pp. <fpage>13</fpage>, <year>2022</year>. doi: <pub-id pub-id-type="doi">10.3390/su14010013</pub-id>.</mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>N.</given-names> <surname>Punetha</surname></string-name> and <string-name><given-names>G.</given-names> <surname>Jain</surname></string-name></person-group>, &#x201C;<article-title>Bayesian game model based unsupervised sentiment analysis of product reviews</article-title>,&#x201D; <source>Expert. Syst. Appl.</source>, vol. <volume>214</volume>, no. <issue>4</issue>, pp. <fpage>119128</fpage>, <year>2023</year>. doi: <pub-id pub-id-type="doi">10.1016/j.eswa.2022.119128</pub-id>.</mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>U.</given-names> <surname>Singh</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Saraswat</surname></string-name>, <string-name><given-names>H. K.</given-names> <surname>Azad</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Abhishek</surname></string-name>, and <string-name><given-names>S.</given-names> <surname>Shitharth</surname></string-name></person-group>, &#x201C;<article-title>Towards improving e-commerce customer review analysis for sentiment detection</article-title>,&#x201D; <source>Sci. Rep.</source>, vol. <volume>12</volume>, no. <issue>1</issue>, pp. <fpage>21983</fpage>, <year>2022</year>. doi: <pub-id pub-id-type="doi">10.1038/s41598-022-26432-3</pub-id>; <pub-id pub-id-type="pmid">36539524</pub-id></mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Saeed</surname></string-name></person-group>, &#x201C;<article-title>A customer-centric view of e-commerce security and privacy</article-title>,&#x201D; <source>Appl. Sci.</source>, vol. <volume>13</volume>, no. <issue>2</issue>, pp. <fpage>1020</fpage>, <year>2023</year>. doi: <pub-id pub-id-type="doi">10.3390/app13021020</pub-id>.</mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>E.</given-names> <surname>Deniz</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Erbay</surname></string-name>, and <string-name><given-names>M.</given-names> <surname>Co&#x015F;ar</surname></string-name></person-group>, &#x201C;<article-title>Multi-label classification of e-commerce customer reviews via machine learning</article-title>,&#x201D; <source>Axioms</source>, vol. <volume>11</volume>, no. <issue>9</issue>, pp. <fpage>436</fpage>, <year>2022</year>. doi: <pub-id pub-id-type="doi">10.3390/axioms11090436</pub-id>.</mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>N.</given-names> <surname>Shafiabady</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Hadjinicolaou</surname></string-name>, <string-name><given-names>F. U.</given-names> <surname>Din</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Bhandari</surname></string-name>, <string-name><given-names>R. M.</given-names> <surname>Wu</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Vakilian</surname></string-name></person-group>, &#x201C;<article-title>Using artificial intelligence (AI) to predict organizational agility</article-title>,&#x201D; <source>PLoS One</source>, vol. <volume>18</volume>, no. <issue>5</issue>, pp. <fpage>e0283066</fpage>, <year>2023</year>. doi: <pub-id pub-id-type="doi">10.1371/journal.pone.0283066</pub-id>; <pub-id pub-id-type="pmid">37163532</pub-id></mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Kumar</surname></string-name></person-group>, &#x201C;<article-title>Contextual semantics using hierarchical attention network for sentiment classification in social internet-of-things</article-title>,&#x201D; <source>Multimed. Tools Appl.</source>, vol. <volume>81</volume>, no. <issue>26</issue>, pp. <fpage>36967</fpage>&#x2013;<lpage>36982</lpage>, <year>2022</year>. doi: <pub-id pub-id-type="doi">10.1007/s11042-021-11262-8</pub-id>.</mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Wankhade</surname></string-name>, <string-name><given-names>A. C. S.</given-names> <surname>Rao</surname></string-name>, and <string-name><given-names>C.</given-names> <surname>Kulkarni</surname></string-name></person-group>, &#x201C;<article-title>A survey on sentiment analysis methods, applications, and challenges</article-title>,&#x201D; <source>Artif. Intell. Rev.</source>, vol. <volume>55</volume>, no. <issue>7</issue>, pp. <fpage>5731</fpage>&#x2013;<lpage>5780</lpage>, <year>2022</year>. doi: <pub-id pub-id-type="doi">10.1007/s10462-022-10144-1</pub-id>.</mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>N.</given-names> <surname>Mukhtar</surname></string-name> and <string-name><given-names>M. A.</given-names> <surname>Khan</surname></string-name></person-group>, &#x201C;<article-title>Effective lexicon-based approach for Urdu sentiment analysis</article-title>,&#x201D; <source>Artif. Intell. Rev.</source>, vol. <volume>53</volume>, no. <issue>4</issue>, pp. <fpage>2521</fpage>&#x2013;<lpage>2548</lpage>, <year>2019</year>. doi: <pub-id pub-id-type="doi">10.1007/s10462-019-09740-5</pub-id>.</mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Kamyab</surname></string-name>, <string-name><given-names>G.</given-names> <surname>Liu</surname></string-name>, and <string-name><given-names>M.</given-names> <surname>Adjeisah</surname></string-name></person-group>, &#x201C;<article-title>Attention-based CNN and Bi-LSTM model based on TF-IDF and GloVe word embedding for sentiment analysis</article-title>,&#x201D; <source>Appl. Sci.</source>, vol. <volume>11</volume>, no. <issue>23</issue>, pp. <fpage>11255</fpage>, <year>2021</year>. doi: <pub-id pub-id-type="doi">10.3390/app112311255</pub-id>.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>K.</given-names> <surname>Hu</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>A domain keyword analysis approach extending term frequency-keyword active index with Google Word2Vec model</article-title>,&#x201D; <source>Scientometrics</source>, vol. <volume>114</volume>, no. <issue>3</issue>, pp. <fpage>1031</fpage>&#x2013;<lpage>1068</lpage>, <year>2018</year>. doi: <pub-id pub-id-type="doi">10.1007/s11192-017-2574-9</pub-id>.</mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>P.</given-names> <surname>Mehta</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Pandya</surname></string-name></person-group>, &#x201C;<article-title>A review on sentiment analysis methodologies, practices and applications</article-title>,&#x201D; <source>Int. J. Sci. Technol. Res.</source>, vol. <volume>9</volume>, no. <issue>2</issue>, pp. <fpage>601</fpage>&#x2013;<lpage>609</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Kumar</surname></string-name> and <string-name><given-names>A.</given-names> <surname>Jaiswal</surname></string-name></person-group>, &#x201C;<article-title>Systematic literature review of sentiment analysis on Twitter using soft computing techniques</article-title>,&#x201D; <source>Concurr. Comput. Pract. Exp.</source>, vol. <volume>32</volume>, no. <issue>1</issue>, pp. <fpage>e5107</fpage>, <year>2020</year>. doi: <pub-id pub-id-type="doi">10.1002/cpe.5107</pub-id>.</mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S. D.</given-names> <surname>Cardoso</surname></string-name>, <string-name><given-names>M.</given-names> <surname>da Silveira</surname></string-name>, and <string-name><given-names>C.</given-names> <surname>Pruski</surname></string-name></person-group>, &#x201C;<article-title>Construction and exploitation of an historical knowledge graph to deal with the evolution of ontologies</article-title>,&#x201D; <source>Knowl. Based Syst.</source>, vol. <volume>194</volume>, no. <issue>2</issue>, pp. <fpage>105508</fpage>, <year>2020</year>. doi: <pub-id pub-id-type="doi">10.1016/j.knosys.2020.105508</pub-id>.</mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Birjali</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Kasri</surname></string-name>, and <string-name><given-names>A. B.</given-names> <surname>Hssane</surname></string-name></person-group>, &#x201C;<article-title>A comprehensive survey on sentiment analysis: Approaches, challenges and trends</article-title>,&#x201D; <source>Knowl. Based Syst.</source>, vol. <volume>226</volume>, pp. <fpage>107134</fpage>, <year>2021</year>. doi: <pub-id pub-id-type="doi">10.1016/j.knosys.2021.107134</pub-id>.</mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>R.</given-names> <surname>Mihalcea</surname></string-name> and <string-name><given-names>P.</given-names> <surname>Tarau</surname></string-name></person-group>, &#x201C;<article-title>Textrank: Bringing order into text</article-title>,&#x201D; in <conf-name>Proc. EMNLP</conf-name>, <publisher-loc>Barcelona, Spain</publisher-loc>, <year>2004</year>, pp. <fpage>404</fpage>&#x2013;<lpage>411</lpage>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><given-names>T.</given-names> <surname>Mikolov</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>G.</given-names> <surname>Corrado</surname></string-name>, and <string-name><given-names>J.</given-names> <surname>Dean</surname></string-name></person-group>, &#x201C;<article-title>Efficient estimation of word representations in vector space</article-title>,&#x201D; <comment>arXiv preprint arXiv:1301.3781</comment>, <year>2013</year>.</mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>L.</given-names> <surname>Huang</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Dou</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Hu</surname></string-name>, and <string-name><given-names>R.</given-names> <surname>Huang</surname></string-name></person-group>, &#x201C;<article-title>Textual analysis for online reviews: A polymerization topic sentiment model</article-title>,&#x201D; <source>IEEE Access</source>, vol. <volume>7</volume>, pp. <fpage>91940</fpage>&#x2013;<lpage>91945</lpage>, <year>2019</year>. doi: <pub-id pub-id-type="doi">10.1109/ACCESS.2019.2920091</pub-id>.</mixed-citation></ref>
<ref id="ref-23"><label>[23]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>W.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Kong</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Zhu</surname></string-name>, and <string-name><given-names>X.</given-names> <surname>Wang</surname></string-name></person-group>, &#x201C;<article-title>Sentiment classification and computing for online reviews by a hybrid SVM and LSA based approach</article-title>,&#x201D; <source>Clust. Comput.</source>, vol. <volume>22</volume>, no. <issue>S5</issue>, pp. <fpage>12619</fpage>&#x2013;<lpage>12632</lpage>, <year>2018</year>. doi: <pub-id pub-id-type="doi">10.1007/s10586-017-1693-7</pub-id>.</mixed-citation></ref>
<ref id="ref-24"><label>[24]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>R.</given-names> <surname>Obiedat</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>Sentiment analysis of customers&#x2019; reviews using a hybrid evolutionary SVM-based approach in an imbalanced data distribution</article-title>,&#x201D; <source>IEEE Access</source>, vol. <volume>10</volume>, no. <issue>1</issue>, pp. <fpage>22260</fpage>&#x2013;<lpage>22273</lpage>, <year>2022</year>. doi: <pub-id pub-id-type="doi">10.1109/ACCESS.2022.3149482</pub-id>.</mixed-citation></ref>
<ref id="ref-25"><label>[25]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Onan</surname></string-name></person-group>, &#x201C;<article-title>Sentiment analysis on product reviews based on weighted word embeddings and deep neural networks</article-title>,&#x201D; <source>Concurr. Comput. Pract. Exp.</source>, vol. <volume>33</volume>, no. <issue>23</issue>, pp. <fpage>e5909</fpage>, <year>2021</year>. doi: <pub-id pub-id-type="doi">10.1002/cpe.5909</pub-id>.</mixed-citation></ref>
<ref id="ref-26"><label>[26]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Gim&#x00E9;nez</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Palanca</surname></string-name>, and <string-name><given-names>V.</given-names> <surname>Botti</surname></string-name></person-group>, &#x201C;<article-title>Semantic-based padding in convolutional neural networks for improving the performance in natural language processing. A case of study in sentiment analysis</article-title>,&#x201D; <source>Neurocomputing</source>, vol. <volume>378</volume>, no. <issue>7</issue>, pp. <fpage>315</fpage>&#x2013;<lpage>323</lpage>, <year>2020</year>. doi: <pub-id pub-id-type="doi">10.1016/j.neucom.2019.08.096</pub-id>.</mixed-citation></ref>
<ref id="ref-27"><label>[27]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>B. A.</given-names> <surname>Hammou</surname></string-name>, <string-name><given-names>A. A.</given-names> <surname>Lahcen</surname></string-name>, and <string-name><given-names>S.</given-names> <surname>Mouline</surname></string-name></person-group>, &#x201C;<article-title>Towards a real-time processing framework based on improved distributed recurrent neural network variants with fastText for social big data analytics</article-title>,&#x201D; <source>Inf. Process. Manag.</source>, vol. <volume>57</volume>, no. <issue>1</issue>, pp. <fpage>102122</fpage>, <year>2020</year>. doi: <pub-id pub-id-type="doi">10.1016/j.ipm.2019.102122</pub-id>.</mixed-citation></ref>
<ref id="ref-28"><label>[28]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M. E.</given-names> <surname>Basiri</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Nemati</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Abdar</surname></string-name>, <string-name><given-names>E.</given-names> <surname>Cambria</surname></string-name>, and <string-name><given-names>U. R.</given-names> <surname>Acharrya</surname></string-name></person-group>, &#x201C;<article-title>ABCDM: An attention-based bidirectional CNN-RNN deep model for sentiment analysis</article-title>,&#x201D; <source>Future Gener. Comput. Syst.</source>, vol. <volume>115</volume>, no. <issue>3</issue>, pp. <fpage>279</fpage>&#x2013;<lpage>294</lpage>, <year>2021</year>. doi: <pub-id pub-id-type="doi">10.1016/j.future.2020.08.005</pub-id>.</mixed-citation></ref>
<ref id="ref-29"><label>[29]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>G.</given-names> <surname>Meena</surname></string-name>, <string-name><given-names>K. K.</given-names> <surname>Mohbey</surname></string-name>, and <string-name><given-names>A.</given-names> <surname>Indian</surname></string-name></person-group>, &#x201C;<article-title>Categorizing sentiment polarities in social networks data using convolutional neural network</article-title>,&#x201D; <source>SN Comput. Sci.</source>, vol. <volume>3</volume>, no. <issue>2</issue>, pp. <fpage>116</fpage>, <year>2022</year>. doi: <pub-id pub-id-type="doi">10.1007/s42979-021-00993-y</pub-id>.</mixed-citation></ref>
<ref id="ref-30"><label>[30]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Kruspe</surname></string-name>, <string-name><given-names>M.</given-names> <surname>H&#x00E4;berle</surname></string-name>, <string-name><given-names>I.</given-names> <surname>Kuhn</surname></string-name>, and <string-name><given-names>X. X.</given-names> <surname>Zhu</surname></string-name></person-group>, &#x201C;<article-title>Cross-language sentiment analysis of European Twitter messages during the COVID-19 pandemic</article-title>,&#x201D; <comment>arXiv preprint arXiv:2008.12172</comment>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-31"><label>[31]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>N. M.</given-names> <surname>Alharbi</surname></string-name>, <string-name><given-names>N. S.</given-names> <surname>Alghamdi</surname></string-name>, <string-name><given-names>E. H.</given-names> <surname>Alkhammash</surname></string-name>, and <string-name><given-names>J. F.</given-names> <surname>Al Amri</surname></string-name></person-group>, &#x201C;<article-title>Evaluation of sentiment analysis via word embedding and RNN variants for Amazon online reviews</article-title>,&#x201D; <source>Math. Probl. Eng.</source>, vol. <volume>2021</volume>, pp. <fpage>5536560</fpage>, <year>2021</year>. doi: <pub-id pub-id-type="doi">10.1155/2021/5536560</pub-id>.</mixed-citation></ref>
<ref id="ref-32"><label>[32]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>B.</given-names> <surname>Bansal</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Srivastava</surname></string-name></person-group>, &#x201C;<article-title>Hybrid attribute based sentiment classification of online reviews for consumer intelligence</article-title>,&#x201D; <source>Appl. Intell.</source>, vol. <volume>49</volume>, no. <issue>1</issue>, pp. <fpage>137</fpage>&#x2013;<lpage>149</lpage>, <year>2019</year>. doi: <pub-id pub-id-type="doi">10.1007/s10489-018-1299-7</pub-id>.</mixed-citation></ref>
<ref id="ref-33"><label>[33]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M. E.</given-names> <surname>Alzahrani</surname></string-name>, <string-name><given-names>T. H.</given-names> <surname>Aldhyani</surname></string-name>, <string-name><given-names>S. N.</given-names> <surname>Alsubari</surname></string-name>, <string-name><given-names>M. M.</given-names> <surname>Althobaiti</surname></string-name>, and <string-name><given-names>A.</given-names> <surname>Fahad</surname></string-name></person-group>, &#x201C;<article-title>Developing an intelligent system with deep learning algorithms for sentiment analysis of e-commerce product reviews</article-title>,&#x201D; <source>Comput. Intell. Neurosci.</source>, vol. <volume>2022</volume>, pp. <fpage>3840071</fpage>, <year>2022</year>. doi: <pub-id pub-id-type="doi">10.1155/2022/3840071</pub-id>; <pub-id pub-id-type="pmid">35669644</pub-id></mixed-citation></ref>
<ref id="ref-34"><label>[34]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>K. K.</given-names> <surname>Mohbey</surname></string-name></person-group>, &#x201C;<article-title>Sentiment analysis for product rating using a deep learning approach</article-title>,&#x201D; in <conf-name>Proc. ICAIS</conf-name>, <publisher-loc>Coimbatore, India</publisher-loc>, <year>2021</year>, pp. <fpage>121</fpage>&#x2013;<lpage>126</lpage>.</mixed-citation></ref>
<ref id="ref-35"><label>[35]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>P.</given-names> <surname>Hajek</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Hikkerova</surname></string-name>, and <string-name><given-names>J. M.</given-names> <surname>Sahut</surname></string-name></person-group>, &#x201C;<article-title>Fake review detection in e-commerce platforms using aspect-based sentiment analysis</article-title>,&#x201D; <source>J. Bus. Res.</source>, vol. <volume>167</volume>, no. <issue>5</issue>, pp. <fpage>114143</fpage>, <year>2023</year>. doi: <pub-id pub-id-type="doi">10.1016/j.jbusres.2023.114143</pub-id>.</mixed-citation></ref>
<ref id="ref-36"><label>[36]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>Z.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Xue</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Xiao</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Chen</surname></string-name>, and <string-name><given-names>H.</given-names> <surname>Zhang</surname></string-name></person-group>, &#x201C;<article-title>Aspect-based sentiment analysis using graph convolutional networks and co-attention mechanism</article-title>,&#x201D; in <conf-name>Proc. ICONIP</conf-name>, <publisher-loc>Bali, Indonesia</publisher-loc>, <year>2021</year>,pp. <fpage>441</fpage>&#x2013;<lpage>448</lpage>.</mixed-citation></ref>
<ref id="ref-37"><label>[37]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>F.</given-names> <surname>Mehmood</surname></string-name>, <string-name><given-names>M. U.</given-names> <surname>Ghani</surname></string-name>, <string-name><given-names>M. A.</given-names> <surname>Ibrahim</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Shahzadi</surname></string-name>, <string-name><given-names>W.</given-names> <surname>Mahmood</surname></string-name> and <string-name><given-names>M. N.</given-names> <surname>Asim</surname></string-name></person-group>, &#x201C;<article-title>A precisely xtreme-multi channel hybrid approach for roman Urdu sentiment analysis</article-title>,&#x201D; <source>IEEE Access</source>, vol. <volume>8</volume>, pp. <fpage>192740</fpage>&#x2013;<lpage>192759</lpage>, <year>2020</year>. doi: <pub-id pub-id-type="doi">10.1109/ACCESS.2020.3030885</pub-id>.</mixed-citation></ref>
<ref id="ref-38"><label>[38]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>D.</given-names> <surname>Baishya</surname></string-name>, <string-name><given-names>J. J.</given-names> <surname>Deka</surname></string-name>, <string-name><given-names>G.</given-names> <surname>Dey</surname></string-name>, and <string-name><given-names>P. K.</given-names> <surname>Singh</surname></string-name></person-group>, &#x201C;<article-title>SAFER: Sentiment analysis-based fake review detection in e-commerce using deep learning</article-title>,&#x201D; <source>SN Comput. Sci.</source>, vol. <volume>2</volume>, no. <issue>6</issue>, pp. <fpage>479</fpage>, <year>2021</year>. doi: <pub-id pub-id-type="doi">10.1007/s42979-021-00918-9</pub-id>.</mixed-citation></ref>
<ref id="ref-39"><label>[39]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M. B. A.</given-names> <surname>Miah</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Awang</surname></string-name>, <string-name><given-names>M. M.</given-names> <surname>Rahman</surname></string-name>, <string-name><given-names>A. S. M. S.</given-names> <surname>Hosen</surname></string-name>, and <string-name><given-names>I. H.</given-names> <surname>Ra</surname></string-name></person-group>, &#x201C;<article-title>A new unsupervised technique to analyze the centroid and frequency of key phrases from academic articles</article-title>,&#x201D; <source>Electronics</source>, vol. <volume>11</volume>, no. <issue>17</issue>, pp. <fpage>2773</fpage>, <year>2022</year>. doi: <pub-id pub-id-type="doi">10.3390/electronics11172773</pub-id>.</mixed-citation></ref>
<ref id="ref-40"><label>[40]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>I.</given-names> <surname>Jamaleddyn</surname></string-name>, <string-name><given-names>R.</given-names> <surname>El ayachi</surname></string-name>, and <string-name><given-names>M.</given-names> <surname>Biniz</surname></string-name></person-group>, &#x201C;<article-title>An improved approach to Arabic news classification based on hyperparameter tuning of machine learning algorithms</article-title>,&#x201D; <source>J. Eng. Res.</source>, vol. <volume>11</volume>, no. <issue>2</issue>, pp. <fpage>100061</fpage>, <year>2023</year>. doi: <pub-id pub-id-type="doi">10.1016/j.jer.2023.100061</pub-id>.</mixed-citation></ref>
<ref id="ref-41"><label>[41]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S. R.</given-names> <surname>Labhsetwar</surname></string-name></person-group>, &#x201C;<article-title>Predictive analysis of customer churn in telecom industry using supervised learning</article-title>,&#x201D; <source>ICTACT J. Soft Comput.</source>, vol. <volume>10</volume>, no. <issue>2</issue>, pp. <fpage>2054</fpage>&#x2013;<lpage>2060</lpage>, <year>2020</year>. doi: <pub-id pub-id-type="doi">10.21917/ijsc.2020.0291</pub-id>.</mixed-citation></ref>
<ref id="ref-42"><label>[42]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Zhao</surname></string-name>, and <string-name><given-names>Q.</given-names> <surname>Li</surname></string-name></person-group>, &#x201C;<article-title>Sentiment analysis of Chinese stock reviews based on BERT model</article-title>,&#x201D; <source>Appl. Intell.</source>, vol. <volume>51</volume>, no. <issue>7</issue>, pp. <fpage>5016</fpage>&#x2013;<lpage>5024</lpage>, <year>2021</year>. doi: <pub-id pub-id-type="doi">10.1007/s10489-020-02101-8</pub-id>.</mixed-citation></ref>
<ref id="ref-43"><label>[43]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>B.</given-names> <surname>Gaye</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Zhang</surname></string-name>, and <string-name><given-names>A.</given-names> <surname>Wulamu</surname></string-name></person-group>, &#x201C;<article-title>Sentiment classification for employees reviews using regression vector-stochastic gradient descent classifier (RV-SGDC)</article-title>,&#x201D; <source>PeerJ Comput. Sci.</source>, vol. <volume>7</volume>, pp. <fpage>e712</fpage>, <year>2021</year>; <pub-id pub-id-type="pmid">34712795</pub-id></mixed-citation></ref>
<ref id="ref-44"><label>[44]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>P.</given-names> <surname>Savci</surname></string-name> and <string-name><given-names>B.</given-names> <surname>Das</surname></string-name></person-group>, &#x201C;<article-title>Prediction of the customers&#x2019; interests using sentiment analysis in e-commerce data for comparison of Arabic, English, and Turkish languages</article-title>,&#x201D; <source>J. King Saud. Univ.&#x2013;Comput. Inf. Sci.</source>, vol. <volume>35</volume>, pp. <fpage>227</fpage>&#x2013;<lpage>237</lpage>, <year>2023</year>.</mixed-citation></ref>
<ref id="ref-45"><label>[45]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>H. T.</given-names> <surname>Phan</surname></string-name> and <string-name><given-names>N. T.</given-names> <surname>Nguyen</surname></string-name></person-group>, &#x201C;<article-title>A fuzzy graph convolutional network model for sentence-level sentiment analysis</article-title>,&#x201D; <source>IEEE Trans. Fuzzy Syst.</source>, vol. <volume>32</volume>, no. <issue>5</issue>, pp. <fpage>2953</fpage>&#x2013;<lpage>2965</lpage>, <year>2024</year>. doi: <pub-id pub-id-type="doi">10.1109/TFUZZ.2024.3364694</pub-id>.</mixed-citation></ref>
<ref id="ref-46"><label>[46]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Gong</surname></string-name>, and <string-name><given-names>L.</given-names> <surname>She</surname></string-name></person-group>, &#x201C;<article-title>An aspect sentiment classification model for graph attention networks incorporating syntactic, semantic, and knowledge</article-title>,&#x201D; <source>Knowl. Based Syst.</source>, vol. <volume>275</volume>, pp. <fpage>110662</fpage>, <year>2023</year>. doi: <pub-id pub-id-type="doi">10.1016/j.knosys.2023.110662</pub-id>.</mixed-citation></ref>
</ref-list>
</back></article>