<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CMC</journal-id>
<journal-id journal-id-type="nlm-ta">CMC</journal-id>
<journal-id journal-id-type="publisher-id">CMC</journal-id>
<journal-title-group>
<journal-title>Computers, Materials &#x0026; Continua</journal-title>
</journal-title-group>
<issn pub-type="epub">1546-2226</issn>
<issn pub-type="ppub">1546-2218</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">16896</article-id>
<article-id pub-id-type="doi">10.32604/cmc.2021.016896</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Machine Learning Approach for COVID-19 Detection on Twitter</article-title>
<alt-title alt-title-type="left-running-head">Machine Learning Approach for COVID-19 Detection on Twitter</alt-title>
<alt-title alt-title-type="right-running-head">Machine Learning Approach for COVID-19 Detection on Twitter</alt-title>
</title-group>
<contrib-group content-type="authors">
<contrib id="author-1" contrib-type="author" corresp="yes">
<name name-style="western">
<surname>Amin</surname>
<given-names>Samina</given-names>
</name>
<xref ref-type="aff" rid="aff-1">1</xref>
<email>kustsameena@gmail.com</email></contrib>
<contrib id="author-2" contrib-type="author">
<name name-style="western">
<surname>Uddin</surname>
<given-names>M. Irfan</given-names>
</name>
<xref ref-type="aff" rid="aff-1">1</xref></contrib>
<contrib id="author-3" contrib-type="author">
<name name-style="western">
<surname>Al-Baity</surname>
<given-names>Heyam H.</given-names>
</name>
<xref ref-type="aff" rid="aff-2">2</xref></contrib>
<contrib id="author-4" contrib-type="author">
<name name-style="western">
<surname>Zeb</surname>
<given-names>M. Ali</given-names>
</name>
<xref ref-type="aff" rid="aff-1">1</xref></contrib>
<contrib id="author-5" contrib-type="author">
<name name-style="western">
<surname>Khan</surname>
<given-names>M. Abrar</given-names>
</name>
<xref ref-type="aff" rid="aff-1">1</xref></contrib>
<aff id="aff-1"><label>1</label><institution>Institute of Computing, Kohat University of Science and Technology</institution>, <addr-line>Kohat, 26000</addr-line>, <country>Pakistan</country></aff>
<aff id="aff-2"><label>2</label><institution>Department of Information Technology, College of Computer and Information Sciences, King Saud University</institution>, <addr-line>Riyadh, 11543</addr-line>, <country>Saudi Arabia</country></aff>
</contrib-group>
<author-notes><corresp id="cor1">&#x002A;Corresponding Author: Samina Amin. Email: <email>kustsameena@gmail.com</email></corresp></author-notes>
<pub-date pub-type="epub" date-type="pub" iso-8601-date="2021-03-01">
<day>01</day>
<month>03</month>
<year>2021</year>
</pub-date>
<volume>68</volume>
<issue>2</issue>
<fpage>2231</fpage>
<lpage>2247</lpage>
<history>
<date date-type="received">
<day>14</day>
<month>01</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>22</day>
<month>02</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2021 Amin et al.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Amin et al.</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CMC_16896.pdf"></self-uri>
<abstract>
<p>Social networking services (SNSs) provide massive data that can be a very influential source of information during pandemic outbreaks. This study shows that social media analysis can be used as a crisis detector (e.g., understanding the sentiment of social media users regarding various pandemic outbreaks). The novel Coronavirus Disease-19 (COVID-19), commonly known as coronavirus, has affected everyone worldwide in 2020. Streaming Twitter data have revealed the status of the COVID-19 outbreak in the most affected regions. This study focuses on identifying COVID-19 patients using tweets without requiring medical records to find the COVID-19 pandemic in Twitter messages (tweets). For this purpose, we propose herein an intelligent model using traditional machine learning-based approaches, such as support vector machine (SVM), logistic regression (LR), na&#x00EF;ve Bayes (NB), random forest (RF), and decision tree (DT) with the help of the term frequency inverse document frequency (TF-IDF) to detect the COVID-19 pandemic in Twitter messages. The proposed intelligent traditional machine learning-based model classifies Twitter messages into four categories, namely, confirmed deaths, recovered, and suspected. For the experimental analysis, the tweet data on the COVID-19 pandemic are analyzed to evaluate the results of traditional machine learning approaches. A benchmark dataset for COVID-19 on Twitter messages is developed and can be used for future research studies. The experiments show that the results of the proposed approach are promising in detecting the COVID-19 pandemic in Twitter messages with overall accuracy, precision, recall, and F1 score between 70% and 80% and the confusion matrix for machine learning approaches (i.e., SVM, NB, LR, RF, and DT) with the TF-IDF feature extraction technique.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Artificial intelligence</kwd>
<kwd>coronavirus</kwd>
<kwd>COVID-19</kwd>
<kwd>pandemic</kwd>
<kwd>social network</kwd>
<kwd>Twitter</kwd>
<kwd>machine learning</kwd>
<kwd>natural language processing</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<label>1</label>
<title>Introduction</title>
<p>Online social network sites (SNSs) like online blogs, Facebook, Instagram, and microblogging services (i.e., Tumbler and Twitter) are web forums or online platforms that are spread over long distances all around the world. Millions of people worldwide currently use SNSs to share images and videos, update their current status, and post regular comments on various topics. SNSs can also provide massive data that can be a very influential source of information during pandemic outbreaks [<xref ref-type="bibr" rid="ref-1">1</xref>,<xref ref-type="bibr" rid="ref-2">2</xref>]. Early warning on outbreak detection can decrease the influence of epidemic outbreaks on public health. SNSs can now be used for disease surveillance to monitor the rate of epidemic outbreaks quicker than health care specialists and health organizations [<xref ref-type="bibr" rid="ref-2">2</xref>&#x2013;<xref ref-type="bibr" rid="ref-4">4</xref>].</p>
<p>COVID-19 and the coronavirus pandemic have started spreading around the globe since the start of 2020. The disease is contagious and, in extreme cases, can proceed to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), a novel human bacterium that epidemiologists (virologists) consider to have originated from bats and suddenly transferred to humans through an intermediary host [<xref ref-type="bibr" rid="ref-1">1</xref>,<xref ref-type="bibr" rid="ref-5">5</xref>]. Due to its prompt spread, the COVID-19 pandemic was deemed a &#x201C;Public Health Emergency of International Concern&#x201D; by the World Health Organization (WHO) on January 30, 2020 [<xref ref-type="bibr" rid="ref-6">6</xref>]. The disease has influenza-like symptoms (pneumonia) and has become a major challenge for healthcare professionals in terms of system development and diagnosis for monitoring the pandemic. The early detection of COVID-19 is essential in monitoring and tracking its future dissemination. SNSs can be considered as a quick detection and monitoring tool for COVID-19 to provide awareness and overcome the dissemination of the coronavirus pandemic.</p>
<p>Information on COVID-19 and the coronavirus pandemic have not been promptly circulated by healthcare organizations. On the contrary, SNSs have gained great attention for equally spreading awareness about COVID-19 [<xref ref-type="bibr" rid="ref-5">5</xref>,<xref ref-type="bibr" rid="ref-7">7</xref>,<xref ref-type="bibr" rid="ref-8">8</xref>]. The massive proliferation of COVID-19 and the coronavirus pandemic has developed a strong necessity for the exploration of reliable methods of analytical research to understand information dissemination and pandemic crisis formation in social media. Various research studies have examined epidemic outbreaks and monitored healthcare to more rapidly and efficiently obtain informed decisions from healthcare organizations using SNS data [<xref ref-type="bibr" rid="ref-4">4</xref>,<xref ref-type="bibr" rid="ref-9">9</xref>].</p>
<p>Therefore, emphasis is focused on suggesting techniques that would empower SNSs to track and detect early cautions relevant to pandemic outbreaks to realize a real-time analysis [<xref ref-type="bibr" rid="ref-4">4</xref>,<xref ref-type="bibr" rid="ref-10">10</xref>]. Through SNSs, health care practitioners can be informed to deliver basic resources to monitor pandemic outbreaks. Nowadays, people regularly use SNSs to upload images and videos, update their current status, and post regular comments on a health status, specifically during pandemic in a region. The SMA provides a piece of effectual information for outbreak tracking and a convenient approach for communicating with the public to decrease pandemic outbreaks with machine learning approaches [<xref ref-type="bibr" rid="ref-10">10</xref>&#x2013;<xref ref-type="bibr" rid="ref-17">17</xref>].</p>
<p>A consistent feature in today&#x2019;s technology is that artificial intelligence plays an important role in this new wave of approaches for public health. From a methodological point of view, the machine learning approach is one of the most applicable with artificial intelligence. This study proposes an intelligent model that will retrieve text related to COVID-19 or the coronavirus pandemic from Twitter messages (tweets) using machine learning approaches, such as SVM, NB, LR, RF, and DT [<xref ref-type="bibr" rid="ref-18">18</xref>&#x2013;<xref ref-type="bibr" rid="ref-23">23</xref>] with TF-IDF [<xref ref-type="bibr" rid="ref-24">24</xref>], Glove [<xref ref-type="bibr" rid="ref-25">25</xref>], and n-grams [<xref ref-type="bibr" rid="ref-26">26</xref>]. The tweets are categorized into four groups of COVID-19, namely confirmed (a tweet about a person with coronavirus), death (a tweet expressing death from COVID-19), recovered (a tweet expressing a person&#x2019;s recovery from COVID-19), and suspected (a tweet expressing COVID-19 symptoms). The main contributions of the proposed work are as follows:
<list list-type="bullet">
<list-item><p>provide awareness about COVID-19 by identifying the dissemination of the latest information on COVID-19 from online social media to help prevent the dissemination of COVID-19;</p></list-item>
<list-item><p>automate COVID-19 analysis by detecting the COVID-19 pandemic from SNSs to perform a real-time analysis;</p></list-item>
<list-item><p>categorize Twitter messages related to the coronavirus and COVID-19 pandemic into four groups as &#x201C;confirmed,&#x201D; &#x201C;death,&#x201D; &#x201C;recovered,&#x201D; and &#x201C;suspected;&#x201D;</p></list-item>
<list-item><p>explore traditional machine learning approaches, namely, SVM, LR, NB, RF, and DT, for tweet identification with the help of TF-IDF with the n-grams approach (e.g., following a unigram, the approach means considering the detection of the COVID-19 spread using an individual word in tweets); and</p></list-item>
<list-item><p>build a benchmark dataset for COVID-19 from Twitter messages that will be available online for future research studies.</p></list-item>
</list></p>
<p>This study aims to evaluate COVID-19-related tweets with &#x201C;confirmed,&#x201D; &#x201C;death,&#x201D; &#x201C;recovered,&#x201D; and &#x201C;suspected&#x201D; patients to analyze the pandemic outbreak from the SMA. The proposed traditional machine learning-based approach is tested and evaluated on various domains to measure its performance, accuracy, and efficiency (Section 4).</p>
<p>The remainder of this paper is structured as follows: Section 2 provides a brief overview of the related work in the literature; Section 3 validates the approach followed to obtain the experimental results; Section 4 presents the analysis evaluation; and Section 5 concludes the research and delivers further research results.</p>
</sec>
<sec id="s2">
<label>2</label>
<title>Related Work</title>
<p>Several disease detection approaches for coronavirus and the COVID-19 pandemic are used by researchers around the globe to create informed decisions and develop appropriate monitoring systems [<xref ref-type="bibr" rid="ref-27">27</xref>&#x2013;<xref ref-type="bibr" rid="ref-29">29</xref>]. Kouzy et al. [<xref ref-type="bibr" rid="ref-30">30</xref>] and Singh et al. [<xref ref-type="bibr" rid="ref-31">31</xref>] proposed intelligent models for the dissemination of information and measurements relevant to COVID-19 using online social media data.</p>
<p>Early detection and public awareness about outbreaks, especially the COVID-19 outbreak and coronavirus, and the techniques for monitoring the COVID-19 pandemic are major contemplations [<xref ref-type="bibr" rid="ref-32">32</xref>,<xref ref-type="bibr" rid="ref-33">33</xref>]. Kabir et al. [<xref ref-type="bibr" rid="ref-34">34</xref>] presented a method that discovers the user sentiment and posts shared by the public on COVID-19 in social media and modeled public opinion using machine learning and topic modeling techniques. They mainly investigated the psychology and actions of the public, which can be facilitated in handling financial and social crises during the current outbreak of COVID-19 and its major side effect.</p>
<p>Hung et al. [<xref ref-type="bibr" rid="ref-1">1</xref>] developed an artificial intelligence-based model to analyze Twitter discussion associated with public sentiment on the COVID-19 pandemic. Khanday et al. [<xref ref-type="bibr" rid="ref-32">32</xref>] developed an effective model for textual clinical data classification by empowering machine learning approaches. They classified clinical textual data into three classes that are COVID, severe acute respiratory syndrome, and acute repository distress syndrome. In addition, they presented a comparative analysis among machine learning techniques and showed that the multinomial na&#x00EF;ve Bayes model outperformed the other models.</p>
<p>Mistrust of social media affects the propagation of disaster information because it not only includes changes in the interpretation and sharing of media; variations in the way individuals and administrations interpret the information in crisis circumstances also have an impact [<xref ref-type="bibr" rid="ref-35">35</xref>]. In their work, Mirbabaie et al. [<xref ref-type="bibr" rid="ref-35">35</xref>] tried to understand the crises created during the COVID-19 pandemic and the coronavirus, as well as the potential circumstances, from Twitter to decrease the mistrust of SNS content and promote the context (sense-making) of the SMA.</p>
<p>Aggarwal et al. [<xref ref-type="bibr" rid="ref-36">36</xref>] developed a model for a multi-criterion decision support system for COVID-19 and used the COVID-19 dataset from the government official link for result validation. Similarly, Yun et al. [<xref ref-type="bibr" rid="ref-37">37</xref>] performed a COVID-19 screening laboratory data analysis. From plasmid acid and hematology data, they gathered 2510 cases for a cumulative examination for COVID-19 infection detection. They conducted the results on influenza infections and planned to explore the effect of fecal matter. Mediating 2510 cases, they suggested clinical and medical actions. However, the data could vary from one place to another; therefore, immunity and several other factors inside the body differ from one area to another.</p>
<p>SNSs can be efficiently used to classify disease infected information and influences on health campaigns with interference to improve public health [<xref ref-type="bibr" rid="ref-9">9</xref>]. Motivated by literature studies, the usage of the SMA patterns of early warnings on pandemic outbreaks can be detected, consequently reducing the time that passes between onset and detection. To the best of our knowledge, previous studies have not considered the alarming situation of COVID-19 and important features like categorization of COVID-19 patients into &#x201C;confirmed,&#x201D; &#x201C;death,&#x201D; &#x201C;recovered,&#x201D; and &#x201C;suspected&#x201D; to analyze the pandemic outbreak from the SMA. Furthermore, no benchmark dataset has been made available on the COVID-19 pandemic that delivers analysis on public sentiment. This study performs a textual analysis of Twitter data by identifying information from social sensors (referred to as tweets). Specifically, tracking of the awareness related to the prompt dissemination of the COVID-19 pandemic is analyzed. To find information on the COVID-19 pandemic in Twitter messages (tweets), the proposed work focuses on the problem of identifying COVID-19 patients using tweets without requiring medical records. Accordingly, this work proposes an intelligent model using traditional machine learning-based approaches. It also outlines an artificial intelligence approach to design an intelligent model for analyzing Twitter data in detail to identify and track the key word association and trends for disaster situations similar to the novel coronavirus and COVID-19 pandemic.</p>
</sec>
<sec id="s3">
<label>3</label>
<title>Proposed Approach</title>
<p><xref ref-type="fig" rid="fig-1">Fig. 1</xref> illustrates the proposed methodology adopted to make an intelligent approach for detecting the spread of COVID-19 pandemic in Twitter messages using machine learning techniques. The proposed model incorporates various components, including data gathering, preprocessing, data visualization, classifier, and results from the evaluation. The pseudocode for the proposed approach is also presented at the end of this section. The component details are presented below.</p>
<fig id="fig-1">
<label>Figure 1</label>
<caption>
<title>Proposed COVID-19 detection approach</title>
</caption><graphic mimetype="image" mime-subtype="png" xlink:href="fig-1.png"/>
</fig>
<sec id="s3_1">
<label>3.1</label>
<title>Data Gathering</title>
<p>We used the Twitter streaming application programming interface (API) to retrieve tweets from Twitter [<xref ref-type="bibr" rid="ref-38">38</xref>]. We gathered about 900,000 tweets during the period between May 13, 2020 and September 30, 2020 using the Twitter API. We selected keywords, including #covid-19, #coronavirus, #corona, covid19, and #covid to collect the relevant tweets. <xref ref-type="fig" rid="fig-2">Fig. 2</xref> depicts the other most commonly discussed words about COVID-19 found in a COVID-19 corpus.</p>
<fig id="fig-2">
<label>Figure 2</label>
<caption>
<title>Commonly discussed keywords about COVID-19 founded in the corpus</title>
</caption><graphic mimetype="image" mime-subtype="png" xlink:href="fig-2.png"/>
</fig>
</sec>
<sec id="s3_2">
<label>3.2</label>
<title>Data Preprocessing</title>
<p>After the tweet data collection from Twitter, the collected data are promoted to certain preprocessing steps in NLP [<xref ref-type="bibr" rid="ref-39">39</xref>]:
<list list-type="bullet">
<list-item><p>eliminating non-English tweets (e.g., all tweets written in English are considered);</p></list-item>
<list-item><p>eliminating stop words: stop words, such as &#x201C;a&#x201D;, &#x201C;is,&#x201D; &#x201C;be,&#x201D; and &#x201C;the,&#x201D; do not convey meaningful information;</p></list-item>
<list-item><p>eliminating retweet entities: meaningful analytics would be affected by redundant (repetitive) tweets;</p></list-item>
<list-item><p>eliminating punctuation marks, special characters, and numbers: they do not express an opinion regarding the disease outbreak;</p></list-item>
<list-item><p>eliminating URLs or hyperlinks: only tweets containing text are considered herein;</p></list-item>
<list-item><p>eliminating people in @mention: the names of people reported in @mention are irrelevant for the disease exploration;</p></list-item>
<list-item><p>stemming: to transform the words into base or root words utilizing stemming techniques [<xref ref-type="bibr" rid="ref-40">40</xref>]; and</p></list-item>
<list-item><p>tokenizing: break a sentence or phrase into tokens, such as words, by using Natural Language Tool Kit (NLTK) modules [<xref ref-type="bibr" rid="ref-40">40</xref>].</p></list-item>
</list></p>
<p>These preprocessing steps were incorporated to enhance the performance of the proposed model and improve the processing speed. The tweet data were stored in a common separated value file after preprocessing.</p>
</sec>
<sec id="s3_3">
<label>3.3</label>
<title>Data Annotation and Building a Benchmark Dataset</title>
<p>A total of 3102 sample tweets on COVID-19 are selected for tagging after the preprocessing step. The sample tweets are tagged with the help of three annotators to eradicate the gaps or prejudice in an annotation. The tweets are then categorized into four groups of COVID-19, namely confirmed, death, recovered, and suspected, by the three annotators. This means that a label confirmed is assigned when someone is infected with COVID-19. For instance, tweets are considered as confirmed to reflect people with COVID-19. The suspected tweets are considered to represent the COVID-19 symptoms in people. In the annotation phase, tagged tweets are approved with the help of an inter-annotator agreement level using Cohen&#x2019;s Kappa test [<xref ref-type="bibr" rid="ref-41">41</xref>] and calculated as strong (i.e., <inline-formula id="ieqn-1"><!--<alternatives><inline-graphic xlink:href="ieqn-1.png"/>--><!--<tex-math id="tex-ieqn-1"><![CDATA[$\mathrm{kappa}= 0.841$]]></tex-math>--><mml:math id="mml-ieqn-1"><mml:mstyle mathvariant="normal"><mml:mi>k</mml:mi><mml:mi>a</mml:mi><mml:mi>p</mml:mi><mml:mi>p</mml:mi><mml:mi>a</mml:mi></mml:mstyle><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>841</mml:mn></mml:math><!--</alternatives>--></inline-formula>) [<xref ref-type="bibr" rid="ref-42">42</xref>]. <xref ref-type="table" rid="table-1">Tab. 1</xref> shows the representation of tweets with the assigned category.</p>
</sec>
<sec id="s3_4">
<label>3.4</label>
<title>Feature Engineering</title>
<p>Machine learning approaches are not efficient in directly tackling the text data. For this purpose, different features are retrieved from the preprocessed annotated tweet data and transferred into probabilistic numbers. To retrieve the related features, the TF-IDF [<xref ref-type="bibr" rid="ref-24">24</xref>] feature extraction approach is utilized while unigrams and bigrams are extracted. The proposed approach is trained on approximately 5000 feature weights. Thus, we have 5000 features for the whole training set presented as <inline-formula id="ieqn-2"><!--<alternatives><inline-graphic xlink:href="ieqn-2.png"/>--><!--<tex-math id="tex-ieqn-2"><![CDATA[$\text{max\_features}= 5000$]]></tex-math>--><mml:math id="mml-ieqn-2"><mml:mstyle class="text"><mml:mtext>max_features</mml:mtext></mml:mstyle><mml:mo>=</mml:mo><mml:mn>5000</mml:mn></mml:math><!--</alternatives>--></inline-formula>. After assigning   the appropriate weight to the features, the numeric values of the features are moved into machine learning approaches for further analysis because machine learning approaches cannot directly analyze the text data.</p>
<p><disp-formula id="eqn-1">
<label>(1)</label>
<!--<alternatives><graphic mimetype="image" mime-subtype="png" xlink:href="eqn-1.png"/>-->
<!--<tex-math id="tex-eqn-1"><![CDATA[$$\begin{equation}
\boldsymbol{W}_{{\boldsymbol{m}},\,{\boldsymbol{n}}}=\boldsymbol{tf}_{{\boldsymbol{m}},\,{\boldsymbol{n}}}\times \boldsymbol{\log} \left(\frac{\boldsymbol{N}}{\boldsymbol{tf}_{{\boldsymbol{m}}}}\right)
 \label{eqn-1}
\end{equation}$$]]></tex-math>-->
<mml:math id="mml-eqn-1" display="block"><mml:msub><mml:mrow><mml:mi>W</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mo>,</mml:mo><mml:mspace width="0.3em"/><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>t</mml:mi><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mo>,</mml:mo><mml:mspace width="0.3em"/><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo>&#x00D7;</mml:mo><mml:mo>log</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>t</mml:mi><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math><!--</alternatives>--></disp-formula></p>
<p>where <italic>m</italic> in <italic>n</italic>, are the numbers of amounts and <italic>tf<sub>m</sub></italic> shows numbers of documents consisting m while N shows total numbers of documents.</p>
<table-wrap id="table-1"> 
<label>Table 1</label>
<caption>
<title>Representation of COVID-19 tweets tagged by three annotators. The annotations are acknowledged via an inter-annotator agreement level</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>S#</th>
<th>Tweet</th>
<th>Category</th>
</tr>
</thead>
<tbody>
<tr>
<td>1.</td>
<td>Unfortunately one of our team has tested positive for Coronavirus.</td>
<td>Confirmed</td>
</tr>
<tr>
<td>2</td>
<td>Jaehyun tested positive for coronavirus I hope he feels good and I hope he will recover fast. Get well soon.</td>
<td>Confirmed</td>
</tr>
<tr>
<td>3.</td>
<td>Please can you keep my mum in your prayers? She has tested positive for corona.</td>
<td>Confirmed</td>
</tr>
<tr>
<td>4.</td>
<td>People are really testing positive and not telling anyone like Corona is a private party.</td>
<td>Suspected</td>
</tr>
<tr>
<td>5.</td>
<td>114986 recovered from corona so far.</td>
<td>Recovered</td>
</tr>
<tr>
<td>6.</td>
<td>1 in 200 Americans over age 65 have died from #COVID19. My god.</td>
<td>Death</td>
</tr>
<tr>
<td>7.</td>
<td>A 41 year-old, healthy man with a young family just died from COVID.</td>
<td>Death</td>
</tr>
<tr>
<td>8.</td>
<td>I felt equally positive after both of my parents recovered from COVID19-knowing that recovering from the disease produces the same immunity as the vaccine.</td>
<td>Recovered</td>
</tr>
<tr>
<td>9.</td>
<td>I have now officially recovered from #COVID19 and have been cleared to come to work today.</td>
<td>Recovered</td>
</tr>
<tr>
<td>10.</td>
<td>Many with suspected COVID19 (number not provided) ICU census unknown.</td>
<td>Suspected</td>
</tr>
<tr>
<td>11.</td>
<td>This is so utterly sad. COVID claims the life of someone so young, age 38, who just got elected to serve the country.</td>
<td>Death</td>
</tr>
<tr>
<td>12.</td>
<td>Three of my friends had Corona Vaccination and they are down with fever and body aches.</td>
<td>Suspected</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Another technique adopted herein for feature extraction is n-gram [<xref ref-type="bibr" rid="ref-20">20</xref>]. Following a unigram (1 gram), the approach means that an individual word in a tweet is considered to detect the spread of COVID-19, while a bigram (2-gram) considers two words in a tweet as it defines its corresponding word (<inline-formula id="ieqn-3"><!--<alternatives><inline-graphic xlink:href="ieqn-3.png"/>--><!--<tex-math id="tex-ieqn-3"><![CDATA[$\text{N}- 1 = 1$]]></tex-math>--><mml:math id="mml-ieqn-3"><mml:mstyle class="text"><mml:mtext>N</mml:mtext></mml:mstyle><mml:mo>-</mml:mo><mml:mn>1</mml:mn><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:math><!--</alternatives>--></inline-formula>) as the presence of the word in a suggested sentence. Consider the following example tweet to understand the n-gram approach: &#x201C;I have tested positive for  COVID-19.&#x201D; Therefore, the n-gram formulation for 2-gram (2 &#x2212;1 = 1, in this context, it determines the appearance of a word) dependent on the previous work would transform the stated example as &#x201C;I have,&#x201D; &#x201C;have tested,&#x201D; &#x201C;tested positive,&#x201D; &#x201C;positive for,&#x201D; and &#x201C;for COVID-19.&#x201D;</p>
</sec>
<sec id="s3_5">
<label>3.5</label>
<title>Data Splitting</title>
<p>A random split approach is adopted to split the data into training and testing. In  random splitting, a pre-specified proportion of the data set is split into the train and test data samples. For instance, in the 80:20 split, the samples were spontaneously selected. Compared to the other approaches, the randomly split approach was more stable because the dataset was more correctly split up. From the 80:20 ratio, 80% of the data samples were used to train the model. The remaining 20% of the data samples were kept to test the model performance using performance evaluation metrics.</p>
</sec>
<sec id="s3_6">
<label>3.6</label>
<title>Machine Learning Approaches</title>
<p>Different machine learning approaches are used to detect the COVID-19 tweets and classify them into four categories of COVID-19 (i.e., confirmed, death, recovered, and suspected). In this work, machine learning approaches like LR, SVM, NB, DT, and RF are empowered to validate the proposed objectives.</p>
<sec id="s3_6_1">
<label>3.6.1</label>
<title>Support Vector Machine</title>
<p>SVM is a machine learning-based approach most commonly used for classification tasks [<xref ref-type="bibr" rid="ref-18">18</xref>]. By organizing data into different groups, the SVM operates by finding a state line boundary often called a hyperplane, which separates the data set into groups. The state line boundary between vectors is related to a specific class. It is mathematically defined as follows:</p>
<p><disp-formula id="eqn-2">
<label>(2)</label>
<!--<alternatives><graphic mimetype="image" mime-subtype="png" xlink:href="eqn-2.png"/>-->
<!--<tex-math id="tex-eqn-2"><![CDATA[$$\begin{equation}{\boldsymbol{y}}={\boldsymbol{a}}.{\boldsymbol{x}}+{\boldsymbol{b}}
 \label{eqn-2} \end{equation}$$]]></tex-math>-->
<mml:math id="mml-eqn-2" display="block"><mml:mrow></mml:mrow><mml:mrow><mml:mi>y</mml:mi><mml:mo>=</mml:mo><mml:mi>a</mml:mi><mml:mo>.</mml:mo><mml:mi>x</mml:mi><mml:mo>+</mml:mo><mml:mi>b</mml:mi></mml:mrow><mml:mrow></mml:mrow></mml:math>
<!--</alternatives>--></disp-formula></p>
<p><disp-formula id="eqn-3">
<label>(3)</label>
<!--<alternatives><graphic mimetype="image" mime-subtype="png" xlink:href="eqn-3.png"/>-->
<!--<tex-math id="tex-eqn-3"><![CDATA[$$\begin{equation}{\boldsymbol{a}}.{\boldsymbol{x}}+{\boldsymbol{b}}-{\boldsymbol{y}}=\mathbf{0}
 \label{eqn-3}\end{equation}$$]]></tex-math>-->
<mml:math id="mml-eqn-3" display="block"><mml:mrow></mml:mrow><mml:mrow><mml:mi>a</mml:mi><mml:mo>.</mml:mo><mml:mi>x</mml:mi><mml:mo>+</mml:mo><mml:mi>b</mml:mi><mml:mo>-</mml:mo><mml:mi>y</mml:mi><mml:mo>=</mml:mo><mml:mstyle mathvariant="bold"><mml:mn>0</mml:mn></mml:mstyle></mml:mrow><mml:mrow></mml:mrow></mml:math>
<!--</alternatives>--></disp-formula></p>
<p>Suppose that vector <inline-formula id="ieqn-4"><!--<alternatives><inline-graphic xlink:href="ieqn-4.png"/>--><!--<tex-math id="tex-ieqn-4"><![CDATA[$X= \left(x,  y\right)$]]></tex-math>--><mml:math id="mml-ieqn-4"><mml:mi>X</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math><!--</alternatives>--></inline-formula> and <inline-formula id="ieqn-5"><!--<alternatives><inline-graphic xlink:href="ieqn-5.png"/>--><!--<tex-math id="tex-ieqn-5"><![CDATA[$W= \left(a,  -1\right)$]]></tex-math>--><mml:math id="mml-ieqn-5"><mml:mi>W</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>a</mml:mi><mml:mo>,</mml:mo><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math><!--</alternatives>--></inline-formula>. We form a hyperplane in vector written as follows:</p>
<p><disp-formula id="eqn-4">
<label>(4)</label>
<!--<alternatives><graphic mimetype="image" mime-subtype="png" xlink:href="eqn-4.png"/>-->
<!--<tex-math id="tex-eqn-4"><![CDATA[$$\begin{equation}
\boldsymbol{W}.\boldsymbol{X}+{\boldsymbol{b}}=\mathbf{0}
 \label{eqn-4}
\end{equation}$$]]></tex-math>-->
<mml:math id="mml-eqn-4" display="block"><mml:mi>W</mml:mi><mml:mo>.</mml:mo><mml:mi>X</mml:mi><mml:mo>+</mml:mo><mml:mi>b</mml:mi><mml:mo>=</mml:mo><mml:mstyle mathvariant="bold"><mml:mn>0</mml:mn></mml:mstyle></mml:math><!--</alternatives>--></disp-formula></p>
<p>where, <inline-formula id="ieqn-6"><!--<alternatives><inline-graphic xlink:href="ieqn-6.png"/>--><!--<tex-math id="tex-ieqn-6"><![CDATA[${\boldsymbol{x}}$]]></tex-math>--><mml:math id="mml-ieqn-6"><mml:mi>x</mml:mi></mml:math><!--</alternatives>--></inline-formula> denotes the input features; <inline-formula id="ieqn-7"><!--<alternatives><inline-graphic xlink:href="ieqn-7.png"/>--><!--<tex-math id="tex-ieqn-7"><![CDATA[${\boldsymbol{w}}$]]></tex-math>--><mml:math id="mml-ieqn-7"><mml:mi>w</mml:mi></mml:math><!--</alternatives>--></inline-formula> is the weight value; and <inline-formula id="ieqn-8"><!--<alternatives><inline-graphic xlink:href="ieqn-8.png"/>--><!--<tex-math id="tex-ieqn-8"><![CDATA[${\boldsymbol{b}}$]]></tex-math>--><mml:math id="mml-ieqn-8"><mml:mi>b</mml:mi></mml:math><!--</alternatives>--></inline-formula> is a bias term.</p>
</sec>
<sec id="s3_6_2">
<label>3.6.2</label>
<title>Na&#x00EF;ve Bayes</title>
<p><italic>NB</italic> [<xref ref-type="bibr" rid="ref-22">22</xref>] is a probabilistic supervised learning model based on the Bayes&#x2019; theorem. The fundamental concept of the NB method is to calculate the probabilities of categories allocated to the corpus and classify the test data. The Bayes algorithm presents a methodology that computes the posterior probability <inline-formula id="ieqn-9"><!--<alternatives><inline-graphic xlink:href="ieqn-9.png"/>--><!--<tex-math id="tex-ieqn-9"><![CDATA[$\mathrm{p} \left(\mathrm{c}/\mathrm{x}\right)$]]></tex-math>--><mml:math id="mml-ieqn-9"><mml:mstyle mathvariant="normal"><mml:mi>p</mml:mi></mml:mstyle><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mstyle mathvariant="normal"><mml:mi>c</mml:mi></mml:mstyle><mml:mo>/</mml:mo><mml:mstyle mathvariant="normal"><mml:mi>x</mml:mi></mml:mstyle></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math><!--</alternatives>--></inline-formula> by <inline-formula id="ieqn-10"><!--<alternatives><inline-graphic xlink:href="ieqn-10.png"/>--><!--<tex-math id="tex-ieqn-10"><![CDATA[$\mathrm{p} \left(\mathrm{c}\right)$]]></tex-math>--><mml:math id="mml-ieqn-10"><mml:mstyle mathvariant="normal"><mml:mi>p</mml:mi></mml:mstyle><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mstyle mathvariant="normal"><mml:mi>c</mml:mi></mml:mstyle></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math><!--</alternatives>--></inline-formula> and <inline-formula id="ieqn-11"><!--<alternatives><inline-graphic xlink:href="ieqn-11.png"/>--><!--<tex-math id="tex-ieqn-11"><![CDATA[$\mathrm{p} \left(\mathrm{x}/\mathrm{c}\right)$]]></tex-math>--><mml:math id="mml-ieqn-11"><mml:mstyle mathvariant="normal"><mml:mi>p</mml:mi></mml:mstyle><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mstyle mathvariant="normal"><mml:mi>x</mml:mi></mml:mstyle><mml:mo>/</mml:mo><mml:mstyle mathvariant="normal"><mml:mi>c</mml:mi></mml:mstyle></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math><!--</alternatives>--></inline-formula> written as follows:</p>
<p><disp-formula id="eqn-5">
<label>(5)</label>
<!--<alternatives><graphic mimetype="image" mime-subtype="png" xlink:href="eqn-5.png"/>-->
<!--<tex-math id="tex-eqn-5"><![CDATA[$$\begin{equation}
{\boldsymbol{p}} \left({\boldsymbol{c}}/{\boldsymbol{x}}\right)=\frac{{\boldsymbol{p}} \left({\boldsymbol{x}}/
{\boldsymbol{c}}\right){\boldsymbol{p}} \left({\boldsymbol{c}}\right)}{{\boldsymbol{p}} \left({\boldsymbol{x}}\right)}
 \label{eqn-5}
\end{equation}$$]]></tex-math>-->
<mml:math id="mml-eqn-5" display="block"><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>c</mml:mi><mml:mo>/</mml:mo><mml:mi>x</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>/</mml:mo><mml:mi>c</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>c</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:math><!--</alternatives>--></disp-formula></p>
<p>where, <inline-formula id="ieqn-12"><!--<alternatives><inline-graphic xlink:href="ieqn-12.png"/>--><!--<tex-math id="tex-ieqn-12"><![CDATA[$\mathrm{p} \left(\mathrm{c}/\mathrm{x}\right)=\mathrm{p} \left(\mathrm{x}_{1}/\mathrm{c}\right).\mathrm{p} \left(\mathrm{x}_{2}/\mathrm{c}\right).\mathrm{p} \left(\mathrm{x}_{3}/ \mathrm{c}\right)\ldots \mathrm{p} \left(\mathrm{x}_{\mathrm{n}}/\mathrm{c}\right)$]]></tex-math>--><mml:math id="mml-ieqn-12"><mml:mstyle mathvariant="normal"><mml:mi>p</mml:mi></mml:mstyle><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mstyle mathvariant="normal"><mml:mi>c</mml:mi></mml:mstyle><mml:mo>/</mml:mo><mml:mstyle mathvariant="normal"><mml:mi>x</mml:mi></mml:mstyle></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mstyle mathvariant="normal"><mml:mi>p</mml:mi></mml:mstyle><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="normal"><mml:mi>x</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>/</mml:mo><mml:mstyle mathvariant="normal"><mml:mi>c</mml:mi></mml:mstyle></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>.</mml:mo><mml:mstyle mathvariant="normal"><mml:mi>p</mml:mi></mml:mstyle><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="normal"><mml:mi>x</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>/</mml:mo><mml:mstyle mathvariant="normal"><mml:mi>c</mml:mi></mml:mstyle></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>.</mml:mo><mml:mstyle mathvariant="normal"><mml:mi>p</mml:mi></mml:mstyle><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="normal"><mml:mi>x</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:mo>/</mml:mo><mml:mstyle mathvariant="normal"><mml:mi>c</mml:mi></mml:mstyle></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2026;</mml:mo><mml:mstyle mathvariant="normal"><mml:mi>p</mml:mi></mml:mstyle><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="normal"><mml:mi>x</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mstyle mathvariant="normal"><mml:mi>n</mml:mi></mml:mstyle></mml:mrow></mml:msub><mml:mo>/</mml:mo><mml:mstyle mathvariant="normal"><mml:mi>c</mml:mi></mml:mstyle></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math><!--</alternatives>--></inline-formula>. <inline-formula id="ieqn-13"><!--<alternatives><inline-graphic xlink:href="ieqn-13.png"/>--><!--<tex-math id="tex-ieqn-13"><![CDATA[$\mathrm{p} \left(\mathrm{c}\right)$]]></tex-math>--><mml:math id="mml-ieqn-13"><mml:mstyle mathvariant="normal"><mml:mi>p</mml:mi></mml:mstyle><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mstyle mathvariant="normal"><mml:mi>c</mml:mi></mml:mstyle></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math><!--</alternatives>--></inline-formula> is a posterior probability of the class <inline-formula id="ieqn-14"><!--<alternatives><inline-graphic xlink:href="ieqn-14.png"/>--><!--<tex-math id="tex-ieqn-14"><![CDATA[$ \left(\mathrm{c}, \text{ source}\right)$]]></tex-math>--><mml:math id="mml-ieqn-14"><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mstyle mathvariant="normal"><mml:mi>c</mml:mi></mml:mstyle><mml:mo>,</mml:mo><mml:mstyle class="text"><mml:mtext>&#x00A0;source</mml:mtext></mml:mstyle></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math><!--</alternatives>--></inline-formula> specified predictor <inline-formula id="ieqn-15"><!--<alternatives><inline-graphic xlink:href="ieqn-15.png"/>--><!--<tex-math id="tex-ieqn-15"><![CDATA[$ \left(\mathrm{x}, \text{ parameters}\right)$]]></tex-math>--><mml:math id="mml-ieqn-15"><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mstyle mathvariant="normal"><mml:mi>x</mml:mi></mml:mstyle><mml:mo>,</mml:mo><mml:mstyle class="text"><mml:mtext>&#x00A0;parameters</mml:mtext></mml:mstyle></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math><!--</alternatives>--></inline-formula>, and <inline-formula id="ieqn-16"><!--<alternatives><inline-graphic xlink:href="ieqn-16.png"/>--><!--<tex-math id="tex-ieqn-16"><![CDATA[$\mathrm{p} \left(\mathrm{c}\right)$]]></tex-math>--><mml:math id="mml-ieqn-16"><mml:mstyle mathvariant="normal"><mml:mi>p</mml:mi></mml:mstyle><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mstyle mathvariant="normal"><mml:mi>c</mml:mi></mml:mstyle></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math><!--</alternatives>--></inline-formula> is a prior probability of a class. The probability <inline-formula id="ieqn-17"><!--<alternatives><inline-graphic xlink:href="ieqn-17.png"/>--><!--<tex-math id="tex-ieqn-17"><![CDATA[$\mathrm{p} \left(\mathrm{x}/\mathrm{c}\right)$]]></tex-math>--><mml:math id="mml-ieqn-17"><mml:mstyle mathvariant="normal"><mml:mi>p</mml:mi></mml:mstyle><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mstyle mathvariant="normal"><mml:mi>x</mml:mi></mml:mstyle><mml:mo>/</mml:mo><mml:mstyle mathvariant="normal"><mml:mi>c</mml:mi></mml:mstyle></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math><!--</alternatives>--></inline-formula> is a likelihood of a specified predictor class, and <inline-formula id="ieqn-18"><!--<alternatives><inline-graphic xlink:href="ieqn-18.png"/>--><!--<tex-math id="tex-ieqn-18"><![CDATA[$\mathrm{p} \left(\mathrm{x}\right)$]]></tex-math>--><mml:math id="mml-ieqn-18"><mml:mstyle mathvariant="normal"><mml:mi>p</mml:mi></mml:mstyle><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mstyle mathvariant="normal"><mml:mi>x</mml:mi></mml:mstyle></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math><!--</alternatives>--></inline-formula> is the prior probability of a predictor. However, in the training process, the variant of the NB (MultinomialNB) commonly used for the text classification is optimized in the proposed work.</p>
</sec>
<sec id="s3_6_3">
<label>3.6.3</label>
<title>Logistic Regression</title>
<p>LR [<xref ref-type="bibr" rid="ref-21">21</xref>] is the most commonly used supervised method because it is used to calculate the categorical variable based on independent variables. For instance, consider a situation where it is required to classify whether a person is infected by COVID-19 or not. If linear regression is used for this scenario, then the threshold value is required to be generated on which classification can be performed. If the real class category is positive or confirmed in our case, the threshold value is 0.5, and the expected value is 0.4. The feature vector would be classified as COVID-19 negative, leading to severe consequences in real time. LR is used to overcome the limitation in linear regression considering that the LR value ranges from 0 to 1. It can be mathematically denoted as follows:</p>
<p><disp-formula id="eqn-6">
<label>(6)</label>
<!--<alternatives><graphic mimetype="image" mime-subtype="png" xlink:href="eqn-6.png"/>-->
<!--<tex-math id="tex-eqn-6"><![CDATA[$$\begin{equation}y=\frac{1}{1+e^{z}}
 \label{eqn-6} \end{equation}$$]]></tex-math>-->
<mml:math id="mml-eqn-6" display="block"><mml:mrow></mml:mrow><mml:mrow><mml:mi>y</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>+</mml:mo><mml:msup><mml:mrow><mml:mi>e</mml:mi></mml:mrow><mml:mrow><mml:mi>z</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:mfrac></mml:mrow><mml:mrow></mml:mrow></mml:math>
<!--</alternatives>--></disp-formula></p>
<p><disp-formula id="eqn-7">
<label>(7)</label>
<!--<alternatives><graphic mimetype="image" mime-subtype="png" xlink:href="eqn-7.png"/>-->
<!--<tex-math id="tex-eqn-7"><![CDATA[$$\begin{equation}{\boldsymbol{z}}={\boldsymbol{w}}.{\boldsymbol{x}}+{\boldsymbol{b}}
 \label{eqn-7}\end{equation}$$]]></tex-math>-->
<mml:math id="mml-eqn-7" display="block"><mml:mrow></mml:mrow><mml:mrow><mml:mi>z</mml:mi><mml:mo>=</mml:mo><mml:mi>w</mml:mi><mml:mo>.</mml:mo><mml:mi>x</mml:mi><mml:mo>+</mml:mo><mml:mi>b</mml:mi></mml:mrow><mml:mrow></mml:mrow></mml:math>
<!--</alternatives>--></disp-formula></p>
<p>where, <inline-formula id="ieqn-19"><!--<alternatives><inline-graphic xlink:href="ieqn-19.png"/>--><!--<tex-math id="tex-ieqn-19"><![CDATA[${\boldsymbol{b}}$]]></tex-math>--><mml:math id="mml-ieqn-19"><mml:mi>b</mml:mi></mml:math><!--</alternatives>--></inline-formula> is a bias term; <inline-formula id="ieqn-20"><!--<alternatives><inline-graphic xlink:href="ieqn-20.png"/>--><!--<tex-math id="tex-ieqn-20"><![CDATA[${\boldsymbol{w}}$]]></tex-math>--><mml:math id="mml-ieqn-20"><mml:mi>w</mml:mi></mml:math><!--</alternatives>--></inline-formula> is the weight value; and <inline-formula id="ieqn-21"><!--<alternatives><inline-graphic xlink:href="ieqn-21.png"/>--><!--<tex-math id="tex-ieqn-21"><![CDATA[${\boldsymbol{x}}$]]></tex-math>--><mml:math id="mml-ieqn-21"><mml:mi>x</mml:mi></mml:math><!--</alternatives>--></inline-formula> denotes the continuous input values (e.g., the number of words in a tweet in our case) and produces the output between 0 and 1 range to classify the data into four categories.</p>
</sec>
<sec id="s3_6_4">
<label>3.6.4</label>
<title>Decision Tree</title>
<p>DT [<xref ref-type="bibr" rid="ref-23">23</xref>] is a simplified model used for classification problems. It is a supervised learning model in which data are separated based on certain features. DT classifies the data by sorting them down the tree to some terminal nodes from the base node, with the data identified by the terminal node. For a certain attribute, each node in the tree serves as a testing phase. Each edge descending from the node refers to the correct options for the testing phase. This mechanism is repeated for each subtree rooted throughout the new node. The entropy and entropy classes for each attribute are determined in the first phase. The information gain (IG) is determined for all the attributes defined in the following equations. This procedure is reiterated until all attributes are in the node.</p>
<p><disp-formula id="eqn-8">
<label>(8)</label>
<!--<alternatives><graphic mimetype="image" mime-subtype="png" xlink:href="eqn-8.png"/>-->
<!--<tex-math id="tex-eqn-8"><![CDATA[$$\begin{equation}\boldsymbol{E} \left(\boldsymbol{Y},\,\boldsymbol{X}\right)=\sum_{\boldsymbol{C}\boldsymbol{\varepsilon }\boldsymbol{X}}\boldsymbol{P} \left({\boldsymbol{c}}\right) \boldsymbol{E} \left({\boldsymbol{c}}\right)
 \label{eqn-8} \end{equation}$$]]></tex-math>-->
<mml:math id="mml-eqn-8" display="block"><mml:mrow></mml:mrow><mml:mrow><mml:mi>E</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>Y</mml:mi><mml:mo>,</mml:mo><mml:mspace width="0.3em"/><mml:mi>X</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:munder><mml:mrow><mml:mo>&#x2211;</mml:mo> </mml:mrow><mml:mrow><mml:mi>C</mml:mi><mml:mi>&#x03B5;</mml:mi><mml:mi>X</mml:mi></mml:mrow></mml:munder><mml:mi>P</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>c</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mi>E</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>c</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow></mml:mrow></mml:math>
<!--</alternatives>--></disp-formula></p>
<p><disp-formula id="eqn-9">
<label>(9)</label>
<!--<alternatives><graphic mimetype="image" mime-subtype="png" xlink:href="eqn-9.png"/>-->
<!--<tex-math id="tex-eqn-9"><![CDATA[$$\begin{equation}\boldsymbol{IG} \left(\boldsymbol{Y},\, \boldsymbol{X}\right)=\boldsymbol{E} \left(\boldsymbol{Y}\right)-\boldsymbol{E} \left(\boldsymbol{Y}/\boldsymbol{X}\right)
 \label{eqn-9}\end{equation}$$]]></tex-math>-->
<mml:math id="mml-eqn-9" display="block"><mml:mrow></mml:mrow><mml:mrow><mml:mi>I</mml:mi><mml:mi>G</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>Y</mml:mi><mml:mo>,</mml:mo><mml:mspace width="0.3em"/><mml:mi>X</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>E</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>Y</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>-</mml:mo><mml:mi>E</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>Y</mml:mi><mml:mo>/</mml:mo><mml:mi>X</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow></mml:mrow></mml:math>
<!--</alternatives>--></disp-formula></p>
<p>where, <italic>x</italic> represents the input, and <italic>T</italic> is the current state. DT employs different techniques to determine if a node is divided into two or more sub-nodes. The sub-node formation increases the uniformity of the resulting sub-nodes. In other words, for the target variable, the node integrity can be assumed to increase. The DT divides the nodes into available attributes and determines the split that occurs in the most homogeneous sub-attributes.</p>
</sec>
<sec id="s3_6_5">
<label>3.6.5</label>
<title>Random Forest</title>
<p>RF [<xref ref-type="bibr" rid="ref-19">19</xref>,<xref ref-type="bibr" rid="ref-43">43</xref>], is a traditional machine learning model based on an ensemble tree because it comprises a large number of DT that performs as an ensemble. It is a set of DTs from a randomly chosen subset training set. It collates votes from various DT approaches to evaluate the actual class of the test set. The Gini index is used by RF as an input parameter that calculates the defilement of an attribute in reference to the classes. For a certain training set <italic>x</italic>, one category (pixel) is randomly picked and claimed to correspond to some categories. The Gini index is defined as:</p>
<p><disp-formula id="eqn-10">
<label>(10)</label>
<!--<alternatives><graphic mimetype="image" mime-subtype="png" xlink:href="eqn-10.png"/>-->
<!--<tex-math id="tex-eqn-10"><![CDATA[$$\begin{equation}
\sum \sum_{j\neq i} \left(f \left(c_{i},\, x\right)/x\right) \left(f \left(c_{j},\,x\right)/x\right)
 \label{eqn-10}
\end{equation}$$]]></tex-math>-->
<mml:math id="mml-eqn-10" display="block"><mml:mo>&#x2211;</mml:mo><mml:munder><mml:mrow><mml:mo>&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo lspace='0pt' rspace='0pt'>&#x2260;</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:munder><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>c</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mspace width="0.3em"/><mml:mi>x</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>/</mml:mo><mml:mi>x</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>c</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mspace width="0.3em"/><mml:mi>x</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>/</mml:mo><mml:mi>x</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math><!--</alternatives>--></disp-formula></p>
<p>where, <inline-formula id="ieqn-22"><!--<alternatives><inline-graphic xlink:href="ieqn-22.png"/>--><!--<tex-math id="tex-ieqn-22"><![CDATA[$ \left(f \left(c_{i},  x\right)/x\right)$]]></tex-math>--><mml:math id="mml-ieqn-22"><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>c</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi>x</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>/</mml:mo><mml:mi>x</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math><!--</alternatives>--></inline-formula> is a probability that belongs to a certain class category <inline-formula id="ieqn-23"><!--<alternatives><inline-graphic xlink:href="ieqn-23.png"/>--><!--<tex-math id="tex-ieqn-23"><![CDATA[$\mathrm{c}_{\mathrm{i}}$]]></tex-math>--><mml:math id="mml-ieqn-23"><mml:msub><mml:mrow><mml:mstyle mathvariant="normal"><mml:mi>c</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mstyle mathvariant="normal"><mml:mi>i</mml:mi></mml:mstyle></mml:mrow></mml:msub></mml:math><!--</alternatives>--></inline-formula>. Thus, <italic>x</italic> represents the input values, and <italic>c</italic> is the targeted category.</p>
</sec>
</sec>
<sec id="s3_7">
<label>3.7</label>
<title>Pseudocode for the Proposed Approach</title>
<fig id="fig-8">
<graphic mimetype="image" mime-subtype="png" xlink:href="fig-8.png"/>
</fig>
</sec>
</sec>
<sec id="s4">
<label>4</label>
<title>Experiments and Results</title>
<p>This section presents the experimental results for the proposed approach. The empirical analysis was conducted using the Anaconda framework (Python 3.8) [<xref ref-type="bibr" rid="ref-44">44</xref>] with the open-source Python modules Scikit-Learn [<xref ref-type="bibr" rid="ref-45">45</xref>], Numpy [<xref ref-type="bibr" rid="ref-46">46</xref>], and Keras [<xref ref-type="bibr" rid="ref-47">47</xref>]. The performance of the proposed approach was evaluated using these modules.</p>
<p>The proposed approach was trained using machine learning approaches. The performance of each approach was evaluated on the test set by utilizing performance evaluation metrics [<xref ref-type="bibr" rid="ref-48">48</xref>]. Moreover, the performance of each model was graphically visualized by making a confusion matrix. A confusion matrix is a suitable approach for demonstrating the results in supervised learning problems because it reflects the output of the classification models on the testing set and attempts to evaluate the predicted (detected) dataset as per their true class label.</p>
<p>The obtained results depict that the SVM model led to slightly improved results. Similarly, the NB classifier performed well, as illustrated in the given figures and tables. The slight improvement in the results could be related to the length of the tweet summaries in our dataset. <xref ref-type="table" rid="table-2">Tab. 2</xref> only considers the classifiers that obtained the highest performance results with n-gram approaches.</p>
<table-wrap id="table-2">
<label>Table 2</label>
<caption>
<title>Train and test accuracy for machine learning approaches with <inline-formula id="ieqn-24"><!--<alternatives><inline-graphic xlink:href="ieqn-24.png"/>--><!--<tex-math id="tex-ieqn-24"><![CDATA[$\text{TF-IDF}+ \text{u}\text{n}\text{i}\text{g}\text{r}\text{am}+ \text{b}\text{i}\text{g}\text{r}\text{a}\text{m}$]]></tex-math>--><mml:math id="mml-ieqn-24"><mml:mstyle class="text"><mml:mtext>TF-IDF</mml:mtext></mml:mstyle><mml:mo>+</mml:mo><mml:mstyle class="text"><mml:mtext>u</mml:mtext></mml:mstyle><mml:mstyle class="text"><mml:mtext>n</mml:mtext></mml:mstyle><mml:mstyle class="text"><mml:mtext>i</mml:mtext></mml:mstyle><mml:mstyle class="text"><mml:mtext>g</mml:mtext></mml:mstyle><mml:mstyle class="text"><mml:mtext>r</mml:mtext></mml:mstyle><mml:mstyle class="text"><mml:mtext>am</mml:mtext></mml:mstyle><mml:mo>+</mml:mo><mml:mstyle class="text"><mml:mtext>b</mml:mtext></mml:mstyle><mml:mstyle class="text"><mml:mtext>i</mml:mtext></mml:mstyle><mml:mstyle class="text"><mml:mtext>g</mml:mtext></mml:mstyle><mml:mstyle class="text"><mml:mtext>r</mml:mtext></mml:mstyle><mml:mstyle class="text"><mml:mtext>a</mml:mtext></mml:mstyle><mml:mstyle class="text"><mml:mtext>m</mml:mtext></mml:mstyle></mml:math><!--</alternatives>--></inline-formula></title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Approach</th>
<th><inline-formula id="ieqn-25"><!--<alternatives><inline-graphic xlink:href="ieqn-25.png"/>--><!--<tex-math id="tex-ieqn-25"><![CDATA[$\text{T}\text{r}\text{a}\text{i}\text{n}+ \text{u}\text{n}\text{i}\text{g}\text{r}\text{am}$]]></tex-math>--><mml:math id="mml-ieqn-25"><mml:mstyle class="text"><mml:mtext>T</mml:mtext></mml:mstyle><mml:mstyle class="text"><mml:mtext>r</mml:mtext></mml:mstyle><mml:mstyle class="text"><mml:mtext>a</mml:mtext></mml:mstyle><mml:mstyle class="text"><mml:mtext>i</mml:mtext></mml:mstyle><mml:mstyle class="text"><mml:mtext>n</mml:mtext></mml:mstyle><mml:mo>+</mml:mo><mml:mstyle class="text"><mml:mtext>u</mml:mtext></mml:mstyle><mml:mstyle class="text"><mml:mtext>n</mml:mtext></mml:mstyle><mml:mstyle class="text"><mml:mtext>i</mml:mtext></mml:mstyle><mml:mstyle class="text"><mml:mtext>g</mml:mtext></mml:mstyle><mml:mstyle class="text"><mml:mtext>r</mml:mtext></mml:mstyle><mml:mstyle class="text"><mml:mtext>am</mml:mtext></mml:mstyle></mml:math><!--</alternatives>--></inline-formula> (%)</th>
<th><inline-formula id="ieqn-26"><!--<alternatives><inline-graphic xlink:href="ieqn-26.png"/>--><!--<tex-math id="tex-ieqn-26"><![CDATA[$\text{T}\text{e}\text{s}\text{t}+ \text{u}\text{n}\text{i}\text{g}\text{r}\text{am}$]]></tex-math>--><mml:math id="mml-ieqn-26"><mml:mstyle class="text"><mml:mtext>T</mml:mtext></mml:mstyle><mml:mstyle class="text"><mml:mtext>e</mml:mtext></mml:mstyle><mml:mstyle class="text"><mml:mtext>s</mml:mtext></mml:mstyle><mml:mstyle class="text"><mml:mtext>t</mml:mtext></mml:mstyle><mml:mo>+</mml:mo><mml:mstyle class="text"><mml:mtext>u</mml:mtext></mml:mstyle><mml:mstyle class="text"><mml:mtext>n</mml:mtext></mml:mstyle><mml:mstyle class="text"><mml:mtext>i</mml:mtext></mml:mstyle><mml:mstyle class="text"><mml:mtext>g</mml:mtext></mml:mstyle><mml:mstyle class="text"><mml:mtext>r</mml:mtext></mml:mstyle><mml:mstyle class="text"><mml:mtext>am</mml:mtext></mml:mstyle></mml:math><!--</alternatives>--></inline-formula> (%)</th>
<th><inline-formula id="ieqn-27"><!--<alternatives><inline-graphic xlink:href="ieqn-27.png"/>--><!--<tex-math id="tex-ieqn-27"><![CDATA[$\text{T}\text{r}\text{a}\text{i}\text{n}+ \text{b}\text{i}\text{g}\text{r}\text{a}\text{m}$]]></tex-math>--><mml:math id="mml-ieqn-27"><mml:mstyle class="text"><mml:mtext>T</mml:mtext></mml:mstyle><mml:mstyle class="text"><mml:mtext>r</mml:mtext></mml:mstyle><mml:mstyle class="text"><mml:mtext>a</mml:mtext></mml:mstyle><mml:mstyle class="text"><mml:mtext>i</mml:mtext></mml:mstyle><mml:mstyle class="text"><mml:mtext>n</mml:mtext></mml:mstyle><mml:mo>+</mml:mo><mml:mstyle class="text"><mml:mtext>b</mml:mtext></mml:mstyle><mml:mstyle class="text"><mml:mtext>i</mml:mtext></mml:mstyle><mml:mstyle class="text"><mml:mtext>g</mml:mtext></mml:mstyle><mml:mstyle class="text"><mml:mtext>r</mml:mtext></mml:mstyle><mml:mstyle class="text"><mml:mtext>a</mml:mtext></mml:mstyle><mml:mstyle class="text"><mml:mtext>m</mml:mtext></mml:mstyle></mml:math><!--</alternatives>--></inline-formula> (%)</th>
<th><inline-formula id="ieqn-28"><!--<alternatives><inline-graphic xlink:href="ieqn-28.png"/>--><!--<tex-math id="tex-ieqn-28"><![CDATA[$\text{T}\text{e}\text{s}\text{t}+ \text{b}\text{i}\text{g}\text{r}\text{a}\text{m}$]]></tex-math>--><mml:math id="mml-ieqn-28"><mml:mstyle class="text"><mml:mtext>T</mml:mtext></mml:mstyle><mml:mstyle class="text"><mml:mtext>e</mml:mtext></mml:mstyle><mml:mstyle class="text"><mml:mtext>s</mml:mtext></mml:mstyle><mml:mstyle class="text"><mml:mtext>t</mml:mtext></mml:mstyle><mml:mo>+</mml:mo><mml:mstyle class="text"><mml:mtext>b</mml:mtext></mml:mstyle><mml:mstyle class="text"><mml:mtext>i</mml:mtext></mml:mstyle><mml:mstyle class="text"><mml:mtext>g</mml:mtext></mml:mstyle><mml:mstyle class="text"><mml:mtext>r</mml:mtext></mml:mstyle><mml:mstyle class="text"><mml:mtext>a</mml:mtext></mml:mstyle><mml:mstyle class="text"><mml:mtext>m</mml:mtext></mml:mstyle></mml:math><!--</alternatives>--></inline-formula> (%)</th>
</tr>
</thead>
<tbody>
<tr>
<td>SVM</td>
<td>82</td>
<td><bold>80</bold></td>
<td>80</td>
<td><bold>79</bold></td>
</tr>
<tr>
<td>NB</td>
<td>79</td>
<td>77</td>
<td>79</td>
<td><bold>78</bold></td>
</tr>
<tr>
<td>LR</td>
<td>76</td>
<td>75</td>
<td>76</td>
<td>76</td>
</tr>
<tr>
<td>DT</td>
<td>75</td>
<td>75</td>
<td>76</td>
<td>75</td>
</tr>
<tr>
<td>RF</td>
<td>74</td>
<td>72</td>
<td>75</td>
<td>73</td>
</tr>
</tbody>
</table>
</table-wrap>
<p><xref ref-type="table" rid="table-3">Tab. 3</xref> shows the performance results considering the five machine learning approaches trained with the TF-IDF feature extraction approach. For a better comprehension, the precision, recall, and F1-score for each COVID-19 category (i.e., confirmed, death, recovered, and suspected) were interpreted separately. <xref ref-type="table" rid="table-4">Tab. 4</xref> presents the average scores of precision, recall, and F1-score for each approach. The classifiers that obtained the highest accuracy were NB and SVM. Compared to the other categories, the death class showed a low F1-score possibly because it is the minority category.</p>
<table-wrap id="table-3">
<label>Table 3</label>
<caption>
<title>Performance results</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Approach</th>
<th colspan="3">Confirmed (%)</th>
<th colspan="3">Death (%)</th>
<th colspan="3">Suspected (%)</th>
<th colspan="3">Recovered (%)</th>
</tr>
<tr>
<th></th>
<th>Precision</th>
<th>Recall</th>
<th>F1-score</th>
<th>Precision</th>
<th>Recall</th>
<th>F1-score</th>
<th>Precision</th>
<th>Recall</th>
<th>F1-score</th>
<th>Precision</th>
<th>Recall</th>
<th>F1-score</th>
</tr>
</thead>
<tbody>
<tr>
<td>SVM</td>
<td><bold>80</bold></td>
<td><bold>79</bold></td>
<td><bold>80</bold></td>
<td>72</td>
<td>72</td>
<td>72</td>
<td>76</td>
<td>77</td>
<td>78</td>
<td>78</td>
<td>79</td>
<td>79</td>
</tr>
<tr>
<td>NB</td>
<td>75</td>
<td>78</td>
<td>77</td>
<td>72</td>
<td>76</td>
<td>75</td>
<td>74</td>
<td>77</td>
<td>70</td>
<td>78</td>
<td>79</td>
<td>77</td>
</tr>
<tr>
<td>LR</td>
<td>74</td>
<td>76</td>
<td>78</td>
<td>70</td>
<td>70</td>
<td>71</td>
<td>72</td>
<td>70</td>
<td>72</td>
<td>74</td>
<td>76</td>
<td>71</td>
</tr>
<tr>
<td>DT</td>
<td>71</td>
<td>76</td>
<td>74</td>
<td>70</td>
<td>70</td>
<td>70</td>
<td>70</td>
<td>70</td>
<td>71</td>
<td>71</td>
<td>70</td>
<td>70</td>
</tr>
<tr>
<td>RF</td>
<td>70</td>
<td>71</td>
<td>71</td>
<td>70</td>
<td>71</td>
<td>70</td>
<td>71</td>
<td>70</td>
<td>70</td>
<td>70</td>
<td>71</td>
<td>70</td>
</tr>
</tbody>
</table>
</table-wrap>
 
<table-wrap id="table-4">
<label>Table 4</label>
<caption>
<title>Performance measure with an average total</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Approach</th>
<th>Precision (%)</th>
<th>Recall (%)</th>
<th>F1-score (%)</th>
</tr>
</thead>
<tbody>
<tr>
<td>SVM</td>
<td><bold>80</bold></td>
<td><bold>81</bold></td>
<td><bold>81</bold></td>
</tr>
<tr>
<td>NB</td>
<td>78</td>
<td>77</td>
<td>79</td>
</tr>
<tr>
<td>LR</td>
<td>76</td>
<td>76</td>
<td>78</td>
</tr>
<tr>
<td>DT</td>
<td>71</td>
<td>73</td>
<td>76</td>
</tr>
<tr>
<td>RF</td>
<td>70</td>
<td>72</td>
<td>74</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Moreover, the confusion matrix results were generated for the selected approaches (i.e., SVM (<xref ref-type="fig" rid="fig-3">Fig. 3</xref>), NB (<xref ref-type="fig" rid="fig-4">Fig. 4</xref>), LR (<xref ref-type="fig" rid="fig-5">Fig. 5</xref>), DT (<xref ref-type="fig" rid="fig-6">Fig. 6</xref>), and RF (<xref ref-type="fig" rid="fig-7">Fig. 7</xref>)).</p>
<fig id="fig-3">
<label>Figure 3</label>
<caption>
<title>Confusion matrix for the SVM</title>
</caption><graphic mimetype="image" mime-subtype="png" xlink:href="fig-3.png"/>
</fig>
<fig id="fig-4">
<label>Figure 4</label>
<caption>
<title>Confusion matrix for the NB</title>
</caption><graphic mimetype="image" mime-subtype="png" xlink:href="fig-4.png"/>
</fig>
<fig id="fig-5">
<label>Figure 5</label>
<caption>
<title>Confusion matrix for the LR</title>
</caption><graphic mimetype="image" mime-subtype="png" xlink:href="fig-5.png"/>
</fig>
<fig id="fig-6">
<label>Figure 6</label>
<caption>
<title>Confusion matrix for the DT</title>
</caption><graphic mimetype="image" mime-subtype="png" xlink:href="fig-6.png"/>
</fig>
<fig id="fig-7">
<label>Figure 7</label>
<caption>
<title>Confusion matrix for the RF</title>
</caption><graphic mimetype="image" mime-subtype="png" xlink:href="fig-7.png"/>
</fig>
<p>The figures presented above conclude that 77% of the confirmed ratings was detected as confirmed; 76% of the suspected ratings was detected as suspected; 70% of the death ratings was detected as death; and 74% of the recovered ratings was detected as recorded. These are not the best detections, but they are a good baseline or benchmark for even better approaches using deep learning techniques.</p>
</sec>
<sec id="s5">
<label>5</label>
<title>Conclusion</title>
<p>As a consistent feature in today&#x2019;s technology, artificial intelligence plays an important role in the new wave of approaches for public health. From a methodological point of view, machine learning approaches are one of the most applicable with artificial intelligence. This study analyzed the problem of identifying COVID-19 patients using Twitter messages without requiring medical records. This framework can be used as a surveillance system for observing the  COVID-19 pandemic in real time. The experimental setups, results, and evaluation of the proposed approach were illustrated to detect COVID-19-infected people on microblogging services that aim to tackle several challenges and offer a model for detecting COVID-19 pandemic to validate the proposed objectives.</p>
<p>The proposed intelligent traditional machine learning-based model classifying Twitter messages into four categories (i.e., confirmed, deaths, recovered, and suspected). For this purpose, a novel dataset was collected using Twitter streaming API to design a benchmark dataset for COVID-19 on Twitter messages that can be used for future research studies. The work also graphically visualized data to understand the data attributes. Data visualization revealed the highest number of the most frequently occurring keywords in the dataset. For the experimental analysis, Twitter data on the COVID-19 pandemic were analyzed to evaluate the results of the traditional machine learning approaches. The results of the proposed method were obtained using the SVM, LR, NB, RF, and DT with the help of the TF-IDF feature extraction technique. The proposed approach performance was evaluated using accuracy, precision, recall, F1 score, and confusion matrix techniques. Their results were then graphically visualized.</p>
<p>In the future, we aim to improve the performance of the proposed approach with deep learning approaches to analyze the novel coronavirus and the COVID-19 pandemic outbreak.</p>
</sec>
</body>
<back>
<ack><p>This work has been supported by a grant from the Research Center of the Female Scientific and Medical Colleges, Deanship of Scientific Research, King Saud University.</p></ack>
<fn-group><fn fn-type="other"><p><bold>Funding Statement:</bold> This work has been supported by a grant from the Research Center of the Female Scientific and Medical Colleges, Deanship of Scientific Research, King Saud University.</p></fn>
<fn fn-type="conflict"><p><bold>Conflicts of Interest:</bold> The authors declare that they have no conflicts of interest to report regarding the present study.</p></fn></fn-group>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Hung</surname></string-name>, <string-name><given-names>E.</given-names> <surname>Lauren</surname></string-name>, <string-name><given-names>E.</given-names> <surname>Hon</surname></string-name>, <string-name><given-names>E. S.</given-names> <surname>Birmingham</surname></string-name>, <string-name><given-names>W. C.</given-names> <surname>Xu</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>Social network analysis of COVID-19 sentiments: Application of artificial intelligence</article-title>,&#x201D; <source>Journal of Medical Internet Research</source>, vol. <volume>22</volume>, no. <issue>8</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>13</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Amin</surname></string-name>, <string-name><given-names>M. I.</given-names> <surname>Uddin</surname></string-name>, <string-name><given-names>M. A.</given-names> <surname>Zeb</surname></string-name>, <string-name><given-names>A. A.</given-names> <surname>Alarood</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Mahmoud</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>Detecting information on the spread of dengue on Twitter using artificial neural networks</article-title>,&#x201D; <source>Computers, Materials &#x0026; Continua</source>, vol. <volume>67</volume>, no. <issue>1</issue>, pp. <fpage>1317</fpage>&#x2013;<lpage>1332</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A. R.</given-names> <surname>Ahmad</surname></string-name> and <string-name><given-names>H. R.</given-names> <surname>Murad</surname></string-name></person-group>, &#x201C;<article-title>The impact of social media on panic during the COVID-19 pandemic in Iraqi Kurdistan: Online questionnaire study</article-title>,&#x201D; <source>Journal of Medical Internet Research</source>, vol. <volume>22</volume>, no. <issue>5</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>11</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Amin</surname></string-name>, <string-name><given-names>M. I.</given-names> <surname>Uddin</surname></string-name>, <string-name><given-names>M. A.</given-names> <surname>Zeb</surname></string-name>, <string-name><given-names>A. A.</given-names> <surname>Alarood</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Mahmoud</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>Detecting dengue/flu infections based on tweets using LSTM and word embedding</article-title>,&#x201D; <source>IEEE Access</source>, vol. <volume>8</volume>, pp. <fpage>189054</fpage>&#x2013;<lpage>189068</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Samuel</surname></string-name>, <string-name><given-names>G. G. M. N.</given-names> <surname>Ali</surname></string-name>, <string-name><given-names>M. M.</given-names> <surname>Rahman</surname></string-name>, <string-name><given-names>E.</given-names> <surname>Esawi</surname></string-name> and <string-name><given-names>Y.</given-names> <surname>Samuel</surname></string-name></person-group>, &#x201C;<article-title>COVID-19 public sentiment insights and machine learning for tweets classification</article-title>,&#x201D; <source>Information&#x2014;An International Interdisciplinary Journal</source>, vol. <volume>11</volume>, no. <issue>6</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>22</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M. L.</given-names> <surname>Holshue</surname></string-name>, <string-name><given-names>C.</given-names> <surname>DeBolt</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Lindquist</surname></string-name>, <string-name><given-names>H. K.</given-names> <surname>Lofy</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Wiesman</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>First case of 2019 novel coronavirus in the United States</article-title>,&#x201D; <source>New England Journal of Medicine</source>, vol. <volume>382</volume>, no. <issue>10</issue>, pp. <fpage>929</fpage>&#x2013;<lpage>936</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Ranjan</surname></string-name> and <string-name><given-names>B. B.</given-names> <surname>Gupta</surname></string-name></person-group>, &#x201C;<article-title>Multiple features based approach for automatic fake news detection on social networks using deep learning</article-title>,&#x201D; <source>Appllied Soft Computing Journal</source>, vol. <volume>100</volume>, no. <issue>3</issue>, pp. <fpage>106983</fpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Chora</surname></string-name></person-group>, &#x201C;<article-title>Advanced machine learning techniques for fake news (online disinformation) detection: A systematic mapping study</article-title>,&#x201D; <source>Appllied Soft Computing Journal</source>, vol. <volume>101</volume>, pp. <fpage>107050</fpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Amin</surname></string-name>, <string-name><given-names>M. I.</given-names> <surname>Uddin</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Hassan</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Khan</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Niddal</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>Recurrent neural networks with TF-IDF embedding technique for detection and classification in tweets of dengue disease</article-title>,&#x201D; <source>IEEE Access</source>, vol. <volume>8</volume>, no. <issue>July</issue>, pp. <fpage>131522</fpage>&#x2013;<lpage>131533</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M. J.</given-names> <surname>Paul</surname></string-name> and <string-name><given-names>M.</given-names> <surname>Dredze</surname></string-name></person-group>, &#x201C;<article-title>Social monitoring for public health</article-title>,&#x201D; <source>Synthesis Lectures on Information Concepts, Retrieval, and Services</source>, vol. <volume>9</volume>, no. <issue>5</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>183</lpage>, <year>2017</year>.</mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>M. J.</given-names> <surname>Paul</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Sarker</surname></string-name>, <string-name><given-names>J. S.</given-names> <surname>Brownstein</surname></string-name>, <string-name><given-names>J. S.</given-names> <surname>Brownstein</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Nikfarjam</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>Social media mining for public health monitoring and surveillance</article-title>,&#x201D; in <conf-name>Biocomputing 2016: Proc. of the Pacific Sym. Fairmont Orchid</conf-name>, <publisher-loc>Big Island of Hawaii</publisher-loc>, pp. <fpage>468</fpage>&#x2013;<lpage>479</lpage>, <year>2016</year>.</mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>X.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Yang</surname></string-name> and <string-name><given-names>L.</given-names> <surname>Ding</surname></string-name></person-group>, &#x201C;<article-title>Sparse vector coding-based multi-carrier NOMA for in-home health networks</article-title>,&#x201D; <source>IEEE Journal on Selected Areas in Communications</source>, vol. <volume>39</volume>, no. <issue>2</issue>, pp. <fpage>325</fpage>&#x2013;<lpage>337</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>G.</given-names> <surname>Zhiwei</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Shen</surname></string-name>, <string-name><given-names>A. K.</given-names> <surname>Bashir</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Imran</surname></string-name> and <string-name><given-names>N.</given-names> <surname>Kumar</surname></string-name></person-group>, &#x201C;<article-title>Robust spammer detection using collaborative neural network in internet of thing applications</article-title>,&#x201D; <source>IEEE Internet of Things Journal</source>, pp. <fpage>1</fpage>, <year>2020</year>. <uri>https://doi.org/10.1109/JIOT.2020.3003802</uri>.</mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>K. P.</given-names> <surname>Yu</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Tan</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Aloqaily</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Yang</surname></string-name> and <string-name><given-names>Y.</given-names> <surname>Jararweh</surname></string-name></person-group>, &#x201C;<article-title>Blockchain-enhanced data sharing with traceable and direct revocation in IIoT</article-title>,&#x201D; <source>IEEE Transactions on Industrial Informatics</source>, <year>2021</year>. https://scholar.google.com.pk/scholar?hl=en&#x0026;as_sdt=0%2C5&#x0026;q=Blockchain-enhanced+data+sharing+with+traceable+and+direct+revocation+in+IIoT%2C%E2%80%9D+IEEE+Transactions+on+Indu&#x0026;btnG=.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>K.</given-names> <surname>Yu</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Lin</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Alazab</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Tan</surname></string-name> and <string-name><given-names>B.</given-names> <surname>Gu</surname></string-name></person-group>, &#x201C;<article-title>Deep learning-based traffic safety solution for a mixture of autonomous and manual vehicles in a 5G-enabled intelligent transportation system</article-title>,&#x201D; <source>IEEE Transactions on Vehicular Technology</source>, vol. <volume>69</volume>, no. <issue>11</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>11</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>C.</given-names> <surname>Feng</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Yu</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Aloqaily</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Alazab</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Lv</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>Attribute-based encryption with parallel outsourced decryption for edge intelligent IoV</article-title>,&#x201D; <source>EEE Transactions on Vehicular Technology</source>, vol. <volume>69</volume>, no. <issue>11</issue>, pp. <fpage>13784</fpage>&#x2013;<lpage>13795</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>K.</given-names> <surname>Yu</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Tan</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Shang</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Huang</surname></string-name>, <string-name><given-names>G.</given-names> <surname>Srivastava</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>Efficient and privacy-preserving medical research support platform against COVID-19: A blockchain-based approach</article-title>,&#x201D; <source>IEEE Consumer Electronics Magazine</source>, vol. <volume>3</volume>, no. <issue>11</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>6</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Tong</surname></string-name> and <string-name><given-names>D.</given-names> <surname>Koller</surname></string-name></person-group>, &#x201C;<article-title>Support vector machine active learning with applications to text classification</article-title>,&#x201D; <source>Journal of Machine Learning Research</source>, vol. <volume>2</volume>, no. <issue>11</issue>, pp. <fpage>45</fpage>&#x2013;<lpage>66</lpage>, <year>2001</year>.</mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Y. Al</given-names> <surname>Amrani</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Lazaar</surname></string-name> and <string-name><given-names>K. E. El</given-names> <surname>Kadirp</surname></string-name></person-group>, &#x201C;<article-title>Random forest and support vector machine based hybrid approach to sentiment analysis</article-title>,&#x201D; <source>Procedia Computer Science</source>, vol. <volume>127</volume>, pp. <fpage>511</fpage>&#x2013;<lpage>520</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>V. F.</given-names> <surname>Rodriguez-Galiano</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Ghimire</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Rogan</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Chica-Olmo</surname></string-name> and <string-name><given-names>J. P.</given-names> <surname>Rigol-Sanchez</surname></string-name></person-group>, &#x201C;<article-title>An assessment of the effectiveness of a random forest classifier for land-cover classification</article-title>,&#x201D; <source>ISPRS Journal of Photogrammetry and Remote Sensing</source>, vol. <volume>67</volume>, no. <issue>1</issue>, pp. <fpage>93</fpage>&#x2013;<lpage>104</lpage>, <year>2012</year>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>C. Y. J.</given-names> <surname>Peng</surname></string-name>, <string-name><given-names>K. L.</given-names> <surname>Lee</surname></string-name> and <string-name><given-names>G. M.</given-names> <surname>Ingersoll</surname></string-name></person-group>, &#x201C;<article-title>An introduction to logistic regression analysis and reporting</article-title>,&#x201D; <source>Journal of Educational Research</source>, vol. <volume>96</volume>, no. <issue>1</issue>, pp. <fpage>3</fpage>&#x2013;<lpage>14</lpage>, <year>2002</year>.</mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>B.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>E.</given-names> <surname>Blasch</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Shen</surname></string-name> and <string-name><given-names>G.</given-names> <surname>Chen</surname></string-name></person-group>, &#x201C;<article-title>Scalable sentiment classification for big data analysis using na&#x00EF;ve bayes classifier</article-title>,&#x201D; in <conf-name>2013 IEEE Int. Conf. on Big Data</conf-name>, <publisher-loc>Santa Clara, CA, USA</publisher-loc>, pp. <fpage>99</fpage>&#x2013;<lpage>104</lpage>, <year>2013</year>.</mixed-citation></ref>
<ref id="ref-23"><label>[23]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>I. D.</given-names> <surname>Mienye</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Sun</surname></string-name> and <string-name><given-names>Z.</given-names> <surname>Wang</surname></string-name></person-group>, &#x201C;<article-title>Prediction performance of improved decision tree-based algorithms: A review</article-title>,&#x201D; <source>Procedia Manufacturing</source>, vol. <volume>35</volume>, pp. <fpage>698</fpage>&#x2013;<lpage>703</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-24"><label>[24]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>C. P.</given-names> <surname>Medina</surname></string-name> and <string-name><given-names>M. R. R.</given-names> <surname>Ramon</surname></string-name></person-group>, &#x201C;<article-title>Using TF-IDF to determine word relevance in document queries</article-title>,&#x201D; in <conf-name>Proc. of the First Instructional Conf. on Machine Learning</conf-name>, <publisher-loc>Piscataway, NJ USA</publisher-loc>, pp. <fpage>133</fpage>&#x2013;<lpage>142</lpage>, <year>2003</year>.</mixed-citation></ref>
<ref id="ref-25"><label>[25]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Pennington</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Socher</surname></string-name> and <string-name><given-names>C.</given-names> <surname>Manning</surname></string-name></person-group>, &#x201C;<article-title>Glove: Global vectors for word representation</article-title>,&#x201D; in <conf-name>Proc. of the 2014 Conf. on Empirical Methods in Natural Language Processing</conf-name>, <publisher-loc>Doha, Qatar</publisher-loc>, pp. <fpage>1532</fpage>&#x2013;<lpage>1543</lpage>, <year>2014</year>.</mixed-citation></ref>
<ref id="ref-26"><label>[26]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Violos</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Tserpes</surname></string-name>, <string-name><given-names>I.</given-names> <surname>Varlamis</surname></string-name> and <string-name><given-names>T.</given-names> <surname>Varvarigou</surname></string-name></person-group>, &#x201C;<article-title>Text classification using the n-gram graph representation model over high frequency data streams</article-title>,&#x201D; <source>Frontiers in Applied Mathematics and Statistics</source>, vol. <volume>4</volume>, no. <issue>9</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>19</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-27"><label>[27]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>C. Y.</given-names> <surname>Dai</surname></string-name>, <string-name><given-names>M. L.</given-names> <surname>Yu</surname></string-name>, <string-name><given-names>Y. J.</given-names> <surname>Jong</surname></string-name> and <string-name><given-names>C. K.</given-names> <surname>Ho</surname></string-name></person-group>, &#x201C;<article-title>Familial clusters of the 2019 novel coronavirus diseases in Taiwan</article-title>,&#x201D; <source>Travel Medicine and Infectious Disease</source>, vol. <volume>36</volume>, no. <issue>382</issue>, pp. <fpage>101813</fpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-28"><label>[28]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A. T.</given-names> <surname>Huang</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Garcia-Carreras</surname></string-name>, <string-name><given-names>M. D. T.</given-names> <surname>Hitchings</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Yang</surname></string-name>, <string-name><given-names>L. C.</given-names> <surname>Katzelnick</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>A systematic review of antibody mediated immunity to coronaviruses: Kinetics, correlates of protection, and association with severity</article-title>,&#x201D; <source>Nature Communications</source>, vol. <volume>11</volume>, no. <issue>1</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>16</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-29"><label>[29]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Ianevski</surname></string-name>, <string-name><given-names>M. H.</given-names> <surname>Fenstad</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Biza</surname></string-name>, <string-name><given-names>E.</given-names> <surname>Zusinaite</surname></string-name>, <string-name><given-names>T.</given-names> <surname>Reisberg</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>Potential antiviral options against SARS-CoV-2 infection</article-title>,&#x201D; <source>Viruses</source>, vol. <volume>12</volume>, no. <issue>6</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>19</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-30"><label>[30]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>R.</given-names> <surname>Kouzy</surname></string-name>, <string-name><given-names>J. Abi</given-names> <surname>Jaoude</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Kraitem</surname></string-name>, <string-name><given-names>M. B. E.</given-names> <surname>Alam</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Karam</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>Coronavirus goes viral: Quantifying the COVID-19 misinformation epidemic on Twitter</article-title>,&#x201D; <source>Cureus</source>, vol. <volume>12</volume>, no. <issue>3</issue>, <comment>Preprint</comment>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-31"><label>[31]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><given-names>L.</given-names> <surname>Singh</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Bansal</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Bode</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Budak</surname></string-name>, <string-name><given-names>G.</given-names> <surname>Chi</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>A first look at COVID-19 information and misinformation sharing on Twitter</article-title>,&#x201D; <comment>arXiv preprint arXiv: 2003.13907</comment>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-32"><label>[32]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A. M. U. D.</given-names> <surname>Khanday</surname></string-name>, <string-name><given-names>S. T.</given-names> <surname>Rabani</surname></string-name>, <string-name><given-names>Q. R.</given-names> <surname>Khan</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Rouf</surname></string-name> and <string-name><given-names>M.</given-names> <surname>Mohi Ud Din</surname></string-name></person-group>, &#x201C;<article-title>Machine learning based approaches for detecting COVID-19 using clinical text data</article-title>,&#x201D; <source>International Journal of Information Technology</source>, vol. <volume>12</volume>, no. <issue>3</issue>, pp. <fpage>731</fpage>&#x2013;<lpage>739</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-33"><label>[33]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M. P.</given-names> <surname>Hossain</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Junus</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Zhu</surname></string-name>, <string-name><given-names>P.</given-names> <surname>Jia</surname></string-name>, <string-name><given-names>T. H.</given-names> <surname>Wen</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>The effects of border control and quarantine measures on the spread of COVID-19</article-title>,&#x201D; <source>Epidemics</source>, vol. <volume>32</volume>, no. <issue>5</issue>, pp. <fpage>100397</fpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-34"><label>[34]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><given-names>M. Y.</given-names> <surname>Kabir</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Madria</surname></string-name></person-group>, &#x201C;<article-title>CoronaVis: A real-time COVID-19 tweets data analyzer and data repository</article-title>,&#x201D; <comment>arXiv: 2004.13932v2</comment>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-35"><label>[35]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Mirbabaie</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Bunker</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Stieglitz</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Marx</surname></string-name> and <string-name><given-names>C.</given-names> <surname>Ehnis</surname></string-name></person-group>, &#x201C;<article-title>Social media in times of crisis: Learning from hurricane harvey for the coronavirus disease 2019 pandemic response</article-title>,&#x201D; <source>Journal of Information Technology</source>, vol. <volume>35</volume>, no. <issue>3</issue>, pp. <fpage>195</fpage>&#x2013;<lpage>213</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-36"><label>[36]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>L.</given-names> <surname>Aggarwal</surname></string-name>, <string-name><given-names>P.</given-names> <surname>Goswami</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Sahdeva</surname></string-name></person-group>, &#x201C;<article-title>Multi-criterion intelligent decision support system for COVID-19</article-title>,&#x201D; <source>Appllied Soft Computing Journal</source>, vol. <volume>101</volume>, pp. <fpage>107056</fpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-37"><label>[37]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>H.</given-names> <surname>Yun</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Sun</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Wu</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Tang</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Hu</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>Laboratory data analysis of novel coronavirus (COVID-19) screening in 2510 patients</article-title>,&#x201D; <source>Clinica Chimica Acta</source>, vol. <volume>509</volume>, no. <issue>8</issue>, pp. <fpage>94</fpage>&#x2013;<lpage>97</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-38"><label>[38]</label><mixed-citation publication-type="other">&#x201C;<article-title>Twitter scraper</article-title>,&#x201D; [Online]. Available: https://github.com/taspinar/twitterscraper <comment>(Accessed  05 August 2018)</comment>.</mixed-citation></ref>
<ref id="ref-39"><label>[39]</label><mixed-citation publication-type="other">&#x201C;<article-title>Processing raw text</article-title>,&#x201D; [Online]. Available: https://www.nltk.org/book/ch03.html.</mixed-citation></ref>
<ref id="ref-40"><label>[40]</label><mixed-citation publication-type="other">&#x201C;<article-title>NLTK 3.5 documentation</article-title>,&#x201D; [Online]. Available: https://www.nltk.org/_modules/nltk/stem/porter.html <comment>(Accessed 24 July 2019)</comment>.</mixed-citation></ref>
<ref id="ref-41"><label>[41]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M. L.</given-names> <surname>McHugh</surname></string-name></person-group>, &#x201C;<article-title>Interrater reliability: The kappa statistic</article-title>,&#x201D; <source>Biochemia Medica</source>, vol. <volume>22</volume>, no. <issue>3</issue>, pp. <fpage>276</fpage>&#x2013;<lpage>282</lpage>, <year>2012</year>.</mixed-citation></ref>
<ref id="ref-42"><label>[42]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J. L.</given-names> <surname>Fleiss</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Levin</surname></string-name> and <string-name><given-names>M. C.</given-names> <surname>Paik</surname></string-name></person-group>, &#x201C;<article-title>The measurement of interrater agreement</article-title>,&#x201D; <source>Statistical Methods for Rates and Proportions</source>, vol. <volume>2</volume>, pp. <fpage>598</fpage>&#x2013;<lpage>626</lpage>, <year>2004</year>.</mixed-citation></ref>
<ref id="ref-43"><label>[43]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>L.</given-names> <surname>Breiman</surname></string-name></person-group>, &#x201C;<article-title>Random forests</article-title>,&#x201D; <source>Machine Learning</source>, vol. <volume>45</volume>, no. <issue>1</issue>, pp. <fpage>5</fpage>&#x2013;<lpage>32</lpage>, <year>2001</year>.</mixed-citation></ref>
<ref id="ref-44"><label>[44]</label><mixed-citation publication-type="other">&#x201C;<article-title>Anaconda</article-title>,&#x201D; [Online]. Available: https://anaconda.org/.</mixed-citation></ref>
<ref id="ref-45"><label>[45]</label><mixed-citation publication-type="other">&#x201C;<article-title>Scikit-learn</article-title>,&#x201D; [Online]. Available: https://scikit-learn.org/stable/.</mixed-citation></ref>
<ref id="ref-46"><label>[46]</label><mixed-citation publication-type="other">&#x201C;<article-title>NumPy</article-title>,&#x201D; [Online]. Available: https://numpy.org/.</mixed-citation></ref>
<ref id="ref-47"><label>[47]</label><mixed-citation publication-type="other">&#x201C;<article-title>Keras</article-title>,&#x201D; [Online]. Available: https://keras.io/api/layers/initializers/.</mixed-citation></ref>
<ref id="ref-48"><label>[48]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>D. M. W.</given-names> <surname>Powers</surname></string-name></person-group>, &#x201C;<article-title>Evaluation: From precision, recall and f-measure to roc, informedness, markedness &#x0026; correlation</article-title>,&#x201D; <source>Journal of Machine Learning Technologies</source>, vol. <volume>2</volume>, no. <issue>1</issue>, pp. <fpage>37</fpage>&#x2013;<lpage>63</lpage>, <year>2011</year>.</mixed-citation></ref>
</ref-list>
</back>
</article>