<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CMC</journal-id>
<journal-id journal-id-type="nlm-ta">CMC</journal-id>
<journal-id journal-id-type="publisher-id">CMC</journal-id>
<journal-title-group>
<journal-title>Computers, Materials &#x0026; Continua</journal-title>
</journal-title-group>
<issn pub-type="epub">1546-2226</issn>
<issn pub-type="ppub">1546-2218</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">21968</article-id>
<article-id pub-id-type="doi">10.32604/cmc.2022.021968</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>SMOTEDNN: A Novel Model for Air Pollution Forecasting and AQI Classification</article-title>
<alt-title alt-title-type="left-running-head">SMOTEDNN: A Novel Model for Air Pollution Forecasting and AQI Classification</alt-title>
<alt-title alt-title-type="right-running-head">SMOTEDNN: A Novel Model for Air Pollution Forecasting and AQI Classification</alt-title>
</title-group>
<contrib-group content-type="authors">
<contrib id="author-1" contrib-type="author" corresp="yes"><name name-style="western"><surname>Haq</surname><given-names>Mohd Anul</given-names></name><email>m.anul@mu.edu.sa</email>
</contrib>
<aff><label>1</label><institution>Department of Computer Science, College of Computer and Information Sciences, Majmaah University Almajmaah</institution>, <addr-line>11952</addr-line>, <country>Saudi Arabia</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>&#x002A;</label>Corresponding Author: Mohd Anul Haq. Email: <email>m.anul@mu.edu.sa</email></corresp>
</author-notes>
<pub-date pub-type="epub" date-type="pub" iso-8601-date="2021-10-18"><day>18</day>
<month>10</month>
<year>2021</year></pub-date>
<volume>71</volume>
<issue>1</issue>
<fpage>1403</fpage>
<lpage>1425</lpage>
<history>
<date date-type="received"><day>22</day><month>7</month><year>2021</year></date>
<date date-type="accepted"><day>23</day><month>8</month><year>2021</year></date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2022 Haq</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Haq</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CMC_21968.pdf"></self-uri>
<abstract>
<p>Rapid industrialization and urbanization are rapidly deteriorating ambient air quality, especially in the developing nations. Air pollutants impose a high risk on human health and degrade the environment as well. Earlier studies have used machine learning (ML) and statistical modeling to classify and forecast air pollution. However, these methods suffer from the complexity of air pollution dataset resulting in a lack of efficient classification and forecasting of air pollution. ML-based models suffer from improper data pre-processing, class imbalance issues, data splitting, and hyperparameter tuning. There is a gap in the existing ML-based studies on air pollution due to improper data handling and optimization. The present investigation aims to bridge these gaps and aid in effective air pollution classification and forecasting. Five ML models were developed, including one novel model named SMOTEDNN (Synthetic Minority Oversampling Technique with Deep Neural Network) to address air pollution classification. All five models utilized efficient data pre-processing and rigorous hyperparameter optimization. Three forecasting models were developed to forecast air pollution for one step-index based on statistical autoregression. All developed models in present investigation showed higher accuracy. Significantly, the novel model SMOTEDNN achieved an accuracy of (99.90&#x0025;) higher than the other models from the current investigation and previous studies.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Air pollution</kwd>
<kwd>smote</kwd>
<kwd>dnn</kwd>
<kwd>classification</kwd>
<kwd>autoregression</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1"><label>1</label><title>Introduction</title>
<p>Globally, due to rapid industrialization and urbanization, air pollution is increasing. Air pollution poses severe risks for the environment, creating health-related hazards and worsening climate change. Notably, smog from waste products such as carbon, nitric oxide (NO), carbon monoxide (CO), hydrocarbons, and synthetic nectar from cellular sources affect the environment [<xref ref-type="bibr" rid="ref-1">1</xref>]. The threat of air pollutants, especially particulate matter (PM), is serious enough to cause a higher rate of mortality, as advised by the World Health Organization (WHO) [<xref ref-type="bibr" rid="ref-2">2</xref>,<xref ref-type="bibr" rid="ref-3">3</xref>]. Additionally, an increasing number of vehicles are responsible for increasing pollutants such as NO<sub>2</sub>, CO, NH<sub>3</sub>, PM2.5, and PM10, whereas pollutants such as and SO<sub>2</sub>, CO, O<sub>3</sub>, B (Benzene), T (Toluene), and X (Xylene) are coming from industrial sources. India stands at the second place in terms of severity of pollution after Kuwait, based on higher Particulate Matter (PM) concentrations [<xref ref-type="bibr" rid="ref-4">4</xref>&#x2013;<xref ref-type="bibr" rid="ref-6">6</xref>].</p>
<p>Previous studies have used statistical models, mathematical models, and Machine Learning (ML) models to classify and forecast air pollution. Reference [<xref ref-type="bibr" rid="ref-7">7</xref>] used Recurrent Neural Networks (RNN) for classification of ozone and achieved an accuracy of 81&#x0025;. Reference [<xref ref-type="bibr" rid="ref-8">8</xref>] also used RNN to analyze PM 2.5 and PM10 with 95&#x0025; accuracy [<xref ref-type="bibr" rid="ref-9">9</xref>] used Support Vector Regression and ensemble model to classify PM10 with an accuracy of 96&#x0025;. Reference [<xref ref-type="bibr" rid="ref-10">10</xref>] used multiple ML models to classify air pollution and found that logistic regression provides an accuracy of 93&#x0025;. However, parameter tuning was missed. Majority of studies used ANN [<xref ref-type="bibr" rid="ref-9">9</xref>&#x2013;<xref ref-type="bibr" rid="ref-12">12</xref>] hybrid ANN [<xref ref-type="bibr" rid="ref-13">13</xref>&#x2013;<xref ref-type="bibr" rid="ref-15">15</xref>] and ML [<xref ref-type="bibr" rid="ref-9">9</xref>,<xref ref-type="bibr" rid="ref-10">10</xref>], [<xref ref-type="bibr" rid="ref-16">16</xref>&#x2013;<xref ref-type="bibr" rid="ref-18">18</xref>] to forecast the air quality. However, due to the complexity of the dataset due to trend and seasonality, most models lack efficient classification and forecasting of air pollution [<xref ref-type="bibr" rid="ref-19">19</xref>]. Given the learning ability and complex data handling capacity of ML, the use of ML models has rapidly increased [<xref ref-type="bibr" rid="ref-20">20</xref>]. However, critical issues such as data pre-processing, class imbalance issues, data splitting, and hyper-parameter tuning have been poorly addressed to optimize the performance of the models. Specifically, most studies showed high accuracy for the class with more observations and low accuracy for the class with less observation; clearly, illusory accuracy has been achieved due to all these issues. ML models can provide output to almost any given input based on training; however, proper data pre-processing and hyperparameter tuning can improve the model in terms of accuracy, sensitivity, and stability. There is a gap in the collective findings of the existing ML-based air pollution studies due to improper data handling and optimization [<xref ref-type="bibr" rid="ref-6">6</xref>,<xref ref-type="bibr" rid="ref-21">21</xref>]. The present investigation aims to bridge the gaps for more effective air pollution classification and forecasting. Five ML models were developed, including one novel model named SMOTEDNN to address air pollution classification. All five models utilized efficient data pre-processing and rigorous hyperparameter optimization. Three forecasting models were developed to forecast air pollution for one step-index based on statistical autoregression.</p>
<sec id="s1_1"><label>1.1</label><title>SMOTEDNN</title>
<p>Previous studies focused on getting higher accuracy values, with lesser attention on the illusory accuracy due to class imbalance in the dataset. Classification on imbalanced dataset showed less accuracy for minority class (fewer observations) and high accuracy for majority class (more observations). SMOTE was integrated with DNN in the current investigation to overcome the issue of class imbalance. It oversampled the values of the minority class based on duplicating minority class values. These new values do not add new information to the algorithm; however, new values were synthesized from existing values. SMOTE randomly selects a minor class and creates a new value based on k nearest neighbors randomly. The SMOTE was combined with DNN (SMOTEDNN) to classify air pollution.</p>
<p>Neural Network (NN) belongs to the feedforward artificial neural network (ANN); consists of three layers: an input, a hidden, and an output layer. The higher number of layers for ANN can be defined as DNN. The node of DNN mainly uses a non-linear activation function, excluding the input nodes. The primary benefit of automatic feature extraction in DL-based models makes it a widespread choice for classification [<xref ref-type="bibr" rid="ref-7">7</xref>,<xref ref-type="bibr" rid="ref-8">8</xref>]. The main components of the DNN have been delineated in the ensuing sub-sections.</p>
<sec id="s1_1_1"><label>1.1.1</label><title>Activation Layer</title>
<p>A neural network requires an activation function to make predictions. The rectifier activation function (ReLU) is one of the default activation functions for DNN-based applications; it adds nonlinearity to the network. ReLU output 0 for negative value and output the same value for non-negative values. Softmax is an output layer function used in the output layer for classification in a neural network. Softmax predicts a multinomial probability distribution with more than two classes.</p>
</sec>
<sec id="s1_1_2"><label>1.1.2</label><title>Dense Layer</title>
<p>The dense layer in a neural network is deeply connected. Each neuron in the dense layer receives input from all neurons of its preceding layer. It uses a linear operation function to map every input with every output.</p>
</sec>
<sec id="s1_1_3"><label>1.1.3</label><title>Training</title>
<p>Models such as neural networks use learning algorithms to minimize errors. For ANN, one of the leading learning algorithms is backpropagation, which computes the gradient of a function to fine-tune the network parameters for error minimization.</p>
</sec>
</sec>
<sec id="s1_2"><label>1.2</label><title>XGBoost</title>
<p>XGBoost (eXtensive Gradient Boosting) is a decision-tree-based ensemble ML model or optimized gradient boosting algorithm based on gradient descent algorithm. It includes parallelization, efficient handling of missing data, tree pruning, and regularization to prevent overfitting. XGBoost is one of the best algorithms in processing time and performance when compared to other models. XGBoost uses parallelized implementation approach to process the sequential trees. XGBoost and Random Forest (RF) are both decision tree-based models, and the difference between them is that XGBoost can minimize errors where RF cannot.</p>
</sec>
<sec id="s1_3"><label>1.3</label><title>Random Forest</title>
<p>Random forest is a supervised classification algorithm. RF produces decision trees on data samples and utilizes each tree for prediction based on an ensemble of voting or bagging. Bagging allows producing various subsets of the data randomly from the training data used to train the decision trees. RF is based on the classifiers c(a&#x007C;&#x0398;<sub>1</sub>),&#x2026;., c(a&#x007C;&#x0398;<sub>k</sub>) related to classification tree with parameters &#x0398;<sub>k</sub> selected randomly from a model random vector &#x0398;. The final classification f(a) uses an ensemble of {c<sub>k</sub> (a)} where the best fit classification calculates based on voting from each tree for input a. The dataset <inline-formula id="ieqn-1"><mml:math id="mml-ieqn-1"><mml:mi>d</mml:mi><mml:mo>=</mml:mo><mml:mo fence="false" stretchy="false">{</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:mi>a</mml:mi><mml:mi>i</mml:mi></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mspace width="thickmathspace" /><mml:mi>b</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:msubsup><mml:mo fence="false" stretchy="false">}</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:msubsup></mml:math></inline-formula>is trained on the collection of classifiers {c<sub>k</sub> (a)}.</p>
</sec>
<sec id="s1_4"><label>1.4</label><title>Support Vector Machine (SVM)</title>
<p>SVM is an ML algorithm that carries out classification using an optimal hyperplane. Generally, SVM classifiers are non-linear, aiming to find a higher margin to separate the classes in feature space [<xref ref-type="bibr" rid="ref-22">22</xref>]. SVM can be defined as [<xref ref-type="bibr" rid="ref-22">22</xref>,<xref ref-type="bibr" rid="ref-23">23</xref>]: (i) Suppose a set of training vectors T, where
<disp-formula id="eqn-1"><label>(1)</label><mml:math id="mml-eqn-1" display="block"><mml:mi>T</mml:mi><mml:mo>=</mml:mo><mml:mrow></mml:mrow><mml:mo fence="false" stretchy="false">{</mml:mo><mml:mrow><mml:mrow></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>a</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mrow></mml:mrow><mml:mrow><mml:msub><mml:mi>b</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>,</mml:mo><mml:mrow></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>a</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mrow></mml:mrow><mml:mrow><mml:msub><mml:mi>b</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>,</mml:mo><mml:mrow></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>a</mml:mi><mml:mn>3</mml:mn></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mrow></mml:mrow><mml:mrow><mml:msub><mml:mi>b</mml:mi><mml:mn>3</mml:mn></mml:msub></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mrow></mml:mrow><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:mrow></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>a</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mrow></mml:mrow><mml:mrow><mml:msub><mml:mi>b</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo fence="false" stretchy="false">}</mml:mo></mml:math></disp-formula>
</p>
<p>Here, xi &#x2208; &#x211D;<sup>n</sup> where <italic>i&#x003D; 1, 2, 3,&#x2026;, n.</italic> (ii) T can be defined as <italic>b<sub>i</sub></italic> &#x003D; {&#x2212;1, 1} for two classes. (iii) T is a linearly separable hyperplane using the function d<italic>(x)</italic> as:
<disp-formula id="eqn-2"><label>(2)</label><mml:math id="mml-eqn-2"><mml:mrow><mml:mi>d</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>w</mml:mi><mml:mo>,</mml:mo><mml:mi>a</mml:mi><mml:mo>+</mml:mo><mml:mi>c</mml:mi><mml:mo>=</mml:mo><mml:munderover><mml:mstyle mathsize='140%' displaystyle='true'><mml:mo>&#x2211;</mml:mo></mml:mstyle><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:munderover ><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>w</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:msub><mml:mi>a</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>+</mml:mo><mml:mtext>b</mml:mtext><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math></disp-formula>
where a is an independent variable, w is weight calculated using the model, and c is a constant.</p>
<p>SVM algorithm maximizes the margin of the hyperplane from the training vectors as
<disp-formula id="eqn-3"><label>(3)</label><mml:math id="mml-eqn-3" display="block"><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mrow><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mi>w</mml:mi><mml:mo>,</mml:mo><mml:mi>b</mml:mi></mml:mrow></mml:mtd></mml:mtr></mml:mtable><mml:mfrac><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:mfrac><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mi>w</mml:mi><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:math></disp-formula>
</p>
<p>SVM calculates the cost function as
<disp-formula id="eqn-4"><label>(4)</label><mml:math id="mml-eqn-4" display="block"><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>w</mml:mi><mml:mo>,</mml:mo><mml:mi>c</mml:mi><mml:mo>;</mml:mo><mml:mi>&#x03B1;</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:mfrac><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:msup><mml:mi>w</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:munderover><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:munderover><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:msub><mml:mi>&#x03B1;</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>b</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">[</mml:mo><mml:mrow><mml:mi>w</mml:mi><mml:mo>.</mml:mo><mml:mrow><mml:msub><mml:mi>a</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mo>+</mml:mo><mml:mi>c</mml:mi></mml:mrow><mml:mo stretchy="false">]</mml:mo><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:math></disp-formula>
</p>
<p>Here, the set <inline-formula id="ieqn-2"><mml:math id="mml-ieqn-2"><mml:mi>&#x03B1;</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>&#x03B1;</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>&#x03B1;</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>&#x03B1;</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:msup><mml:mo stretchy="false">)</mml:mo><mml:mi>T</mml:mi></mml:msup></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:msubsup><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow><mml:mrow><mml:mo>+</mml:mo></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> is the Lagrangian multiplier.</p>
<p>To overcome the finding optimum hyperplane, the penalty parameter &#x2018;C&#x2019; and slack variables <inline-formula id="ieqn-3"><mml:math id="mml-ieqn-3"><mml:mrow><mml:msub><mml:mi>&#x03BE;</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> were introduced [<xref ref-type="bibr" rid="ref-23">23</xref>,<xref ref-type="bibr" rid="ref-24">24</xref>]:
<disp-formula id="eqn-5"><label>(5)</label><mml:math id="mml-eqn-5" display="block"><mml:mrow><mml:msub><mml:mi>b</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>w</mml:mi><mml:mo>.</mml:mo><mml:mrow><mml:msub><mml:mi>a</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>+</mml:mo><mml:mi>c</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>&#x2265;</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:msub><mml:mi>&#x03BE;</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></disp-formula>
</p>
<p>The equation used to maximize the margin for the optimal hyperplane is given as:</p>
<p><disp-formula id="eqn-6"><label>(6)</label><mml:math id="mml-eqn-6" display="block"><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mfrac><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:mfrac><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mi>w</mml:mi><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>+</mml:mo><mml:mi>C</mml:mi><mml:munderover><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:munderover><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:msub><mml:mi>&#x03BE;</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></disp-formula>
</p>
</sec>
<sec id="s1_5"><label>1.5</label><title>KNN</title>
<p>The KNN algorithm is a supervised ML algorithm, and it assumes that a value of a point is similar to the values that exist in the neighbors. KNN is based on the principle of obtaining the value of a point using neighbor points in the dataset based on distance. Thus, it works on the principle of choosing the value of K(neighbors) near to the point of interest and voting for the most frequent class. The high number of K reduces the noise, and local anomalies add more error towards the decision boundaries.</p>
<p>The primary issue with KNN is that it become slows as the data grows. The Euclidean distance (d) for two points <inline-formula id="ieqn-4"><mml:math id="mml-ieqn-4"><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>a</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:msub><mml:mi>a</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> and <inline-formula id="ieqn-5"><mml:math id="mml-ieqn-5"><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>b</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:msub><mml:mi>b</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> can be obtained using
<disp-formula id="eqn-7"><label>(7)</label><mml:math id="mml-eqn-7" display="block"><mml:mi>d</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>a</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:msub><mml:mi>b</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>a</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:msub><mml:mi>b</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mrow><mml:msup><mml:mo stretchy="false">)</mml:mo><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:mfrac></mml:mrow></mml:msup></mml:mrow></mml:math></disp-formula>
</p>
<p>For n-dimensional space, d can be obtained as
<disp-formula id="eqn-8"><label>(8)</label><mml:math id="mml-eqn-8" display="block"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="normal">d</mml:mi></mml:mrow><mml:mi>e</mml:mi></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:munderover><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:munderover><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>a</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:msub><mml:mi>b</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:mfrac></mml:mrow></mml:msup></mml:mrow></mml:math></disp-formula>
</p>
</sec>
</sec>
<sec id="s2"><label>2</label><title>Dataset</title>
<p>The National Air Quality Monitoring Program (NAMP) is a nationwide program from the Central Pollution Control Board (CPCB) of India. It aims to monitor the levels of air pollutants from 793 active stations spreading over 344 cities from 29 states of India. The present study used a dataset released under the NAMP program (<uri xlink:href="http://www.cpcbenvis.nic.in/air_quality_data.html">http://www.cpcbenvis.nic.in/air_quality_data.html</uri>) from Jan 01, 2015, to July 07, 2020. The air pollutants analyzed in the present investigation are NOx, Nitrogen Oxide (NO), Nitrogen dioxide (NO<sub>2)</sub>, Particulate Matter (PM2.5, and PM10), Sulphur Dioxide (SO<sub>2)</sub>, Carbon monoxide (CO), Ammonia (NH<sub>3)</sub>, Ozone (O<sub>3</sub>) Benzene (B), Toluene and Xylene(X) (<?A3B2 "fig1",5,"anchor"?><xref ref-type="fig" rid="fig-1">Fig. 1</xref>). In addition, the dataset contains an air quality index (AQI) parameter, which is an indication used by government authorities to categorize the pollution in terms of its severity. The six AQI values and their ambient concentrations with health consequences are given in <?A3B2 "fig2",5,"anchor"?><xref ref-type="fig" rid="fig-2">Fig. 2</xref>. (Source: <uri xlink:href="https://app.cpcbccr.com/AQI_India/">https://app.cpcbccr.com/AQI_India/</uri>).</p>
<fig id="fig-1"><label>Figure 1</label><caption><title>Air pollution parameters from 793 stations across India from Jan 2015 to July 2020 daily</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_21968-fig-1.png"/></fig>
<fig id="fig-2"><label>Figure 2</label><caption><title>Six AQI values and their ambient concentrations with health consequences</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_21968-fig-2.png"/></fig>
</sec>
<sec id="s3"><label>3</label><title>Methodology</title>
<sec id="s3_1"><label>3.1</label><title>Data Pre-Processing</title>
<p>To improve the understanding of the data, handling missing values, and making data ready for modeling, the raw data underwent the data cleaning process. The primary step was to understand the missing values in the dataset (<?A3B2 "fig3",5,"anchor"?><xref ref-type="fig" rid="fig-3">Fig. 3</xref>). It was clear from <xref ref-type="fig" rid="fig-3">Fig. 3</xref> that B, X, T, O<sub>3</sub>, and NH<sub>3</sub> are among the highest missing values. On the other hand, the less missing value pollutants were CO, NO, NO<sub>2</sub>, SO<sub>2</sub>, O<sub>3</sub>, NOx, PM 2.5, and AQI values. The missing values were removed using Pandas dropna function, where any NA values are present in any row/column. The fields city, date, Year_Month, and AQI_Bucket were removed based on their substitute fields in the dataset to avoid redundancy.</p>
<fig id="fig-3"><label>Figure 3</label><caption><title>The missing values and &#x0025; of pollutants from the dataset</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_21968-fig-3.png"/></fig>
</sec>
<sec id="s3_2"><label>3.2</label><title>Data Analysis</title>
<p>The data pertaining to various air pollutants were analyzed to understand city-wise concentrations of various pollutants (<?A3B2 "fig4",5,"anchor"?><xref ref-type="fig" rid="fig-4">Fig. 4a</xref>). Delhi showed the highest levels for PM values. The average levels of the pollutants from 25 cities are given in <?A3B2 "fig5",5,"anchor"?><xref ref-type="fig" rid="fig-5">Fig. 5</xref>. Ahmedabad showed very high concentrations for SO<sub>2</sub> and NO<sub>3</sub> values. Delhi showed high values for SO<sub>2</sub> and NO values, whereas Kochi showed the highest level of NO. The AQI and CO values for Ahmedabad were the highest among all cities for the observation period, followed by Delhi, which showed AQI and PM10 values on the higher side. The correlation between all air pollution parameters is given in <xref ref-type="fig" rid="fig-4">Fig. 4b</xref>. The correlation values between pollutants were statistically significant emphasizing the high pollution levels and interrelationships between pollutants. It was essential to understand the AQI interrelationships with each pollutant, to understand which ones were contributing most significantly towards the index. Interestingly, this study found that AQI values were highly related to PM 2.5, NO, and NO<sub>2</sub>, based on a correlation value of 0.8 (CI&#x2009;&#x003D;&#x2009;95&#x0025;). SO<sub>2</sub> and O<sub>3</sub> followed next, based on a correlation value of 0.7 (CI&#x2009;&#x003D;&#x2009;95&#x0025;). Rest of the pollutants showed correlation values between 0.2 to 0.5.</p>
<fig id="fig-4"><label>Figure 4</label><caption><title>(a) The city-wise proportion of pollution; (b) Correlation heatmap between all air pollution parameters</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_21968-fig-4.png"/></fig>
<fig id="fig-5"><label>Figure 5</label><caption><title>(a) Average pollution from 25 major cities of India from Jan 2015 to July 2020 on daily (b) Average levels of pollution in 26 cities</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_21968-fig-5.png"/></fig>	
</sec>
<sec id="s3_3"><label>3.3</label><title>SMOTEDNN Model Development for AQI Classification</title>
<p>The present investigation developed a novel model SMOTEDNN to classify air pollution. In order to assess the model performance, it was compared with four state-of-the-art ML models-XGBoost, Random Forest, SVM, and KNN to classify the AQI into six classes shown in <xref ref-type="fig" rid="fig-2">Fig. 2</xref>. The performance in terms of accuracy, stability, and time complexity of ML/DL models depend on optimizing the hyperparameters. In the present investigation, hyperparameters for all the developed models were optimized rigorously to avoid illusory accuracy.</p>
<sec id="s3_3_1"><label>3.3.1</label><title>SMOTEDNN</title>
<p>SMOTE was integrated to overcome the issue of class imbalance; it oversampled the values of the minority class based on duplicating the minority class values. These new values did not add new information to the algorithm; however, new values were synthesized from the existing values. SMOTE randomly selects a minor class and creates a new value based on k nearest neighbors randomly. The counter object was used to summarize the number of points in each class to confirm that the dataset was created correctly. SMOTE performed well to overcome class imbalance issues and resultant pollution classes containing a balanced number of occurrences in each class (i.e., 129277).</p>
<p>The novel SMOTEDNN model was developed with five layers each in the present study to classify AQI based on the air pollution data. Python 3.8, Keras 2.3.0 API, Tensorflow 2.0 backend, NumPy, pandas, os, sklearn, matplotlib, and DateTime libraries were used in this research (<?A3B2 "fig6",5,"anchor"?><xref ref-type="fig" rid="fig-6">Fig. 6</xref>). Data pre-processing and SMOTE were applied to the raw dataset so that the output of both steps could be utilized with the developed DNN models. We used five neural network layers that operated on all the six classes of air quality. The rectifier activation function (ReLU) was utilized for starting four neural network layers. The ReLU activation function can be defined as <xref ref-type="disp-formula" rid="eqn-9">Eq. (9)</xref>. ReLU acts as a linear function for all positive values and provides zero for all negative values.
<disp-formula id="eqn-9"><label>(9)</label><mml:math id="mml-eqn-9" display="block"><mml:mi>y</mml:mi><mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow></mml:mrow><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mrow></mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:math></disp-formula></p>
<p>The fifth layer used the kernel initializer followed by the dense layer with softmax function for classification in the sixth layer. The softmax function can be given as <xref ref-type="disp-formula" rid="eqn-10">Eq. (10)</xref>.
<disp-formula id="eqn-10"><label>(10)</label><mml:math id="mml-eqn-10" display="block"><mml:mi>&#x03C3;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:mover><mml:mi>z</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover></mml:mrow></mml:mrow><mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mrow><mml:msup><mml:mrow><mml:mi mathvariant="normal">e</mml:mi></mml:mrow><mml:mrow><mml:mi>Z</mml:mi><mml:mrow><mml:mi mathvariant="normal">i</mml:mi></mml:mrow></mml:mrow></mml:msup></mml:mrow></mml:mrow><mml:mrow><mml:msubsup><mml:mi>&#x03A3;</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:msubsup><mml:mrow><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mi>Z</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:mrow></mml:mfrac></mml:math></disp-formula>
where n&#x003D;no of classes, and e<sup>zi,</sup> and e<sup>zj</sup> are input and output vector function, respectively.</p>
<p>Early stopping was used to reduce the learning rate through Keras callbacks function to prevent overfitting. The number of epochs was automatically chosen using the early stopping of Keras callback functions based on validation loss, minimum delta value, and patience. In DNN model training, the number of parameters such as iterations, learning rate, batch size, and the activation function was obtained using GridSearchCV. Deep learning models such as DNN might be complex, and data splitting is also a significant issue while tuning the parameters. There was a total 11,990 number of parameters, and all parameters were trainable using SMOTEDNN.</p>
<fig id="fig-6"><label>Figure 6</label><caption><title>Developed SMOTEDNN model to classify the air quality for six classes</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_21968-fig-6.png"/></fig>
<p>SMOTEDNN model was compiled after defining the model. The Adam optimizer was used with a decay of 1e-3; the learning rate was selected automatically and dynamically, using a callback monitor. The present problem was multiclass classification; therefore, the loss was measured based on categorical cross-entropy. It was used to compute the error rate between the actual and the m values for classification, as in <xref ref-type="disp-formula" rid="eqn-11">Eq. (11)</xref>.
<disp-formula id="eqn-11"><label>(11)</label><mml:math id="mml-eqn-11" display="block"><mml:mrow><mml:mi mathvariant="normal">L</mml:mi><mml:mi mathvariant="normal">o</mml:mi><mml:mi mathvariant="normal">s</mml:mi><mml:mi mathvariant="normal">s</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mo>&#x2212;</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="normal">O</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="normal">s</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:mfrac><mml:munderover><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="normal">i</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="normal">O</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="normal">s</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:munderover><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="normal">a</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="normal">i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>&#x22C5;</mml:mo><mml:mi>log</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mover><mml:mi>t</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow></mml:mrow><mml:mrow><mml:mi mathvariant="normal">i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>+</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="normal">a</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="normal">i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>&#x22C5;</mml:mo><mml:mi>log</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mrow><mml:mover><mml:mi>t</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow></mml:mrow></mml:mrow><mml:mrow><mml:mi mathvariant="normal">i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:math></disp-formula>
where <inline-formula id="ieqn-6"><mml:math id="mml-ieqn-6"><mml:mrow><mml:msub><mml:mrow><mml:mi>O</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>, <inline-formula id="ieqn-7"><mml:math id="mml-ieqn-7"><mml:mrow><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula id="ieqn-8"><mml:math id="mml-ieqn-8"><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mover><mml:mi>t</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> are the output size, target, and output values, respectively.</p>
<p>The optimization of hyperparameters for the developed SMOTEDNN model is given in <?A3B2 "tbl1",5,"anchor"?><xref ref-type="table" rid="table-1">Tab. 1</xref>.</p>
<table-wrap id="table-1"><label>Table 1</label><caption><title>Hyperparameters optimization for SMOTEDNN</title></caption>
<table frame="hsides">
<colgroup>
<col align="left" charoff="6"/>
<col align="left" charoff="6"/>
<col align="left"/>
<col align="left" charoff="12"/>
</colgroup>
<thead>
<tr>
<th align="left">Parameter</th>
<th align="left">Value</th>
<th align="left">Optimized parameter</th>
<th align="left">Remark</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Activation function</td>
<td align="left">identity, logistic, tanh, relu</td>
<td align="left">relu</td>
<td align="left">Activation function for the hidden layer</td>
</tr>
<tr>
<td align="left">Size of hidden layers</td>
<td align="left">Different values</td>
<td align="left">22, 28, 56, 112, 28</td>
<td align="left">Neurons in the hidden layer</td>
</tr>
<tr>
<td align="left">Optimizer</td>
<td align="left">lbfgs, sgd, adam</td>
<td align="left">adam</td>
<td align="left">Used for error optimization</td>
</tr>
<tr>
<td align="left">Alpha</td>
<td align="left">0.00001, 0.0001, 0.001, 0.01</td>
<td align="left">0.001</td>
<td align="left">It refers to the regularization parameter</td>
</tr>
<tr>
<td align="left">Learning rate</td>
<td align="left">0.0001, 0.001, 0.01</td>
<td align="left">0.001</td>
<td align="left">Step-size controller to update the weights</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s3_3_2"><label>3.3.2</label><title>XGBoost</title>
<p>The XGBoost model was developed using XGBClassifier class within the &#x2018;XGBoost&#x2019; module in the sklearn (scikit-learn) package in Python. The XgBoost algorithm was applied to the pre-processed data (in Section 3.1) to classify the AQI based on pollutants data. The hyperparameters were tuned based on RandomizedSearchCV with the following parameters (<?A3B2 "tbl2",5,"anchor"?><xref ref-type="table" rid="table-2">Tab. 2</xref>).</p>
<table-wrap id="table-2"><label>Table 2</label><caption><title>Hyperparameters optimization for XGBoost</title></caption>
<table frame="hsides">
<colgroup>
<col align="left" charoff="6"/>
<col align="left" charoff="6"/>
<col align="left"/>
<col align="left" charoff="12"/>
</colgroup>
<thead>
<tr>
<th align="left">Parameter</th>
<th align="left">Value</th>
<th align="left">Optimized parameter</th>
<th align="left">Remarks</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Learning rate</td>
<td align="left">0.05, 0.10, 0.15, 0.20, 0.25, 0.30</td>
<td align="left">0.20</td>
<td align="left">Loss minimization for each iteration</td>
</tr>
<tr>
<td align="left">max_depth</td>
<td align="left">[3, 4, 5, 6, 8, 10, 12, 14, 16]</td>
<td align="left">4</td>
<td align="left">Maximum depth for a tree</td>
</tr>
<tr>
<td align="left">min_child_weight</td>
<td align="left">[1, 3, 5, 7, 9, 11]</td>
<td align="left">7</td>
<td align="left">Minimum combined weights of all observations for a child node</td>
</tr>
<tr>
<td align="left">gamma</td>
<td align="left">[0.0, 0.1, 0.2, 0.3, 0.4, 0.5]</td>
<td align="left">0.0</td>
<td align="left">Refers to the minimum loss minimization need to do a split.</td>
</tr>
<tr>
<td align="left">colsample_bytree</td>
<td align="left">[0.3, 0.4, 0.5 , 0.6, 0.7]</td>
<td align="left">0.5</td>
<td align="left">Column fraction that samples randomly for every tree</td>
</tr>
<tr>
<td align="left">n_estimators</td>
<td align="left">10, 20, 50, 100, 200</td>
<td align="left">100</td>
<td align="left">Number of trees</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s3_3_3"><label>3.3.3</label><title>Random Forest</title>
<p>Random Forest Classifier from the sklearn.ensemble module of the sklearn package [<xref ref-type="bibr" rid="ref-25">25</xref>] was used to develop the RF model. In addition, the Gini impurity function was used in the present investigation as it requires less computation [<xref ref-type="bibr" rid="ref-26">26</xref>]. The optimized hyperparameters for RF are given in <?A3B2 "tbl3",5,"anchor"?><xref ref-type="table" rid="table-3">Tab. 3</xref>.</p>
<table-wrap id="table-3"><label>Table 3</label><caption><title>Hyperparameters optimization for RF</title></caption>
<table frame="hsides">
<colgroup>
<col align="left" charoff="6"/>
<col align="left" charoff="6"/>
<col align="left"/>
<col align="left" charoff="12"/>
</colgroup>
<thead>
<tr>
<th align="left">Parameter</th>
<th align="left">Value</th>
<th align="left">Optimized parameter</th>
<th align="left">Remark</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">n_estimators</td>
<td align="left">10, 20, 50, 100, 200</td>
<td align="left">100</td>
<td align="left">Number of trees</td>
</tr>
<tr>
<td align="left">n_jobs</td>
<td align="left">1, &#x2212;1</td>
<td align="left">&#x2212;1</td>
<td align="left">Number of processors &#x2212;1 means no restrictions, and 1 means only one is allowed</td>
</tr>
<tr>
<td align="left">random_state</td>
<td align="left">10, 20, 30, 40, 50, 60, 100</td>
<td align="left">50</td>
<td align="left">For replication of same results with same models</td>
</tr>
<tr>
<td align="left">min_samples_leaf</td>
<td align="left">10, 20, 30, 40, 50, 60, 100</td>
<td align="left">60</td>
<td align="left">End node of the decision tree</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s3_3_4"><label>3.3.4</label><title>SVM</title>
<p>The SVM model was developed using the svm module of the sklearn package [<xref ref-type="bibr" rid="ref-25">25</xref>]. The regularization parameters (C) control the SVM model performance with radial basis function (RBF) kernel. The RBF kernel can be represented as:
<disp-formula id="eqn-12"><label>(12)</label><mml:math id="mml-eqn-12" display="block"><mml:mi>K</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mrow><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mi>&#x03B3;</mml:mi><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:msup><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:math></disp-formula>
</p>
<p>Here, &#x03B3; is the kernel width. The optimized hyperparameters for the SVM model used in the present investigation are given in <?A3B2 "tbl4",5,"anchor"?><xref ref-type="table" rid="table-4">Tab. 4</xref>.</p>
<table-wrap id="table-4"><label>Table 4</label><caption><title>Hyperparameters optimization for SVM</title></caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">Parameter</th>
<th align="left">Value</th>
<th align="left">Optimized parameter</th>
<th align="left">Remark</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">C</td>
<td align="left">10, 20, 50, 100, 200</td>
<td align="left">100</td>
<td align="left">Regularization parameter</td>
</tr>
<tr>
<td align="left">kernel</td>
<td align="left">poly, linear, rbf, sigmoid,</td>
<td align="left">rbf</td>
<td align="left">Kernel type used</td>
</tr>
<tr>
<td align="left">degree</td>
<td align="left">1, 2, 3, 4, 5</td>
<td align="left">none</td>
<td align="left">Used only in polykernel</td>
</tr>
<tr>
<td align="left">gamma</td>
<td align="left">scale, float, auto</td>
<td align="left">auto</td>
<td align="left">Kernel coefficient</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s3_3_5"><label>3.3.5</label><title>KNN</title>
<p>KNN model was developed using KNeighborsClassifier class in sklearn package in Python. One of the most critical parameters for the KNN is neighbors, which determines that the unknown value can be obtained from how many neighbors have known values. The optimized hyperparameters for the KNN model used in the present investigation are given in <?A3B2 "tbl5",5,"anchor"?><xref ref-type="table" rid="table-5">Tab. 5</xref>.</p>
</sec>
</sec>
<sec id="s3_4"><label>3.6</label><title>AQI Forecasting Using Linear Regression</title>
<p>For AQI time series forecasting, the New Delhi city was selected based on high pollutant levels for the observation period. The primary step was to understand which particular pollutant affected New Delhi AQI values the most. The correlation values with AQI for different pollutants and the percentage of null values are given in <?A3B2 "tbl6",5,"anchor"?><xref ref-type="table" rid="table-6">Tab. 6</xref>. It was evident from <xref ref-type="table" rid="table-6">Tab. 6</xref> that particulate matter (PM 2.5, PM 10), B, NO<sub>2</sub>, and NO showed a significantly high correlation with AQI. The combined value of pollutants B_X_O<sub>3</sub>_NH<sub>3</sub> showed a higher correlation but more null values due to the non-availability of X pollutant data.</p><table-wrap id="table-5"><label>Table 5</label><caption><title>Hyperparameters optimization for KNN</title></caption>
	
		
			<table frame="hsides">
				<colgroup>
					<col align="left" charoff="6"/>
						<col align="left" charoff="8"/>
							<col align="left"/>
								<col align="left" charoff="14"/>
								</colgroup>
								<thead>
									<tr>
										<th align="left">Parameter</th>
										<th align="left">Value</th>
										<th align="left">Optimized parameter</th>
										<th align="left">Remark</th>
									</tr>
								</thead>
								<tbody>
									<tr>
										<td align="left">N_neighbors</td>
										<td align="left">2, 4, 6, 8, 10, 12, 14, 16, 18, 20</td>
										<td align="left">10</td>
										<td align="left">Number of neighbors</td>
									</tr>
									<tr>
										<td align="left">n_jobs</td>
										<td align="left">1, &#x2212;1</td>
										<td align="left">&#x2212;1</td>
										<td align="left">Number of processors &#x2212;1 means no restrictions, 1 means only 1 is allowed</td>
									</tr>
									<tr>
										<td align="left">weights</td>
										<td align="left">uniform, distance</td>
										<td align="left">distance</td>
										<td align="left">Weight function used in prediction</td>
									</tr>
									<tr>
										<td align="left">algorithm</td>
										<td align="left">auto, ball_tree, kd_tree, brute</td>
										<td align="left">Kd_tree</td>
										<td align="left">The algorithm used to compute the NN</td>
									</tr>
								</tbody>
							</table>
						
					</table-wrap>
					
<table-wrap id="table-6"><label>Table 6</label><caption><title>Correlation of pollutants with AQI and &#x0025; of null values for New Delhi for observation period</title></caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">Pollutant/Variable</th>
<th align="left">Correlation value</th>
<th align="left">&#x0025; of Null values</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Particulate matters</td>
<td align="left">0.92</td>
<td align="left">3.50</td>
</tr>
<tr>
<td align="left">PM10</td>
<td align="left">0.88</td>
<td align="left">3.50</td>
</tr>
<tr>
<td align="left">PM2.5</td>
<td align="left">0.88</td>
<td align="left">0.10</td>
</tr>
<tr>
<td align="left">Benzene</td>
<td align="left">0.67</td>
<td align="left">0.00</td>
</tr>
<tr>
<td align="left">NO2</td>
<td align="left">0.67</td>
<td align="left">0.10</td>
</tr>
<tr>
<td align="left">NO</td>
<td align="left">0.64</td>
<td align="left">0.10</td>
</tr>
<tr>
<td align="left">B_X_O3_NH3</td>
<td align="left">0.63</td>
<td align="left">38.60</td>
</tr>
<tr>
<td align="left">NOx</td>
<td align="left">0.56</td>
<td align="left">0.00</td>
</tr>
<tr>
<td align="left">NH3</td>
<td align="left">0.52</td>
<td align="left">0.40</td>
</tr>
<tr>
<td align="left">SO2</td>
<td align="left">0.41</td>
<td align="left">5.10</td>
</tr>
<tr>
<td align="left">O3</td>
<td align="left">0.33</td>
<td align="left">3.80</td>
</tr>
<tr>
<td align="left">CO</td>
<td align="left">0.28</td>
<td align="left">0.00</td>
</tr>
<tr>
<td align="left">Toluene</td>
<td align="left">0.28</td>
<td align="left">0.00</td>
</tr>
<tr>
<td align="left">Xylene</td>
<td align="left">0.23</td>
<td align="left">38.60</td>
</tr>
<tr>
<td align="left">Month</td>
<td align="left">0.06</td>
<td align="left">0.00</td>
</tr>
<tr>
<td align="left">Year</td>
<td align="left">&#x2212;0.28</td>
<td align="left">0.00</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The city, date, Year_Month, and AQI_Bucket columns were not required and were deleted. The frequency distribution of AQI for New Delhi is not a normal distribution. Therefore, seasonality involvement did exist. AD Fuller statistical test was performed to check the time series behavior (i.e., stationary or non-stationary). A value of &#x2212;3.351 was obtained based on the AD Fuller test with a p-value of 0.01263; it was found based on the test that the dataset was non-stationary. Since the data behavior of time series was non-stationary, the original data could not forecast the value through an autoregressive model (<?A3B2 "fig7",5,"anchor"?><xref ref-type="fig" rid="fig-7">Fig. 7a</xref>). However, one-step prediction based on the previous step&#x0027;s modeling was possible if there was autocorrelation in the dataset. <xref ref-type="fig" rid="fig-7">Figs. 7b</xref>&#x2013;<xref ref-type="fig" rid="fig-7">7d</xref> shows a significantly higher correlation for one-step forecasting based on the previous step value. The first time series forecasting model used the previous step-index to generate the next step-index (<?A3B2 "fig8",5,"anchor"?><xref ref-type="fig" rid="fig-8">Fig. 8a</xref>).</p>
<fig id="fig-7"><label>Figure 7</label><caption><title>(a) AQI time series for New Delhi, (b) Autocorrelation for first previous index, (c) Autocorrelation for second previous index, (d) Autocorrelation for third previous index, (e) Autocorrelation for the time series</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_21968-fig-7.png"/></fig>
</sec>
</sec>
<sec id="s4"><label>4</label><title>Results and Discussions</title>
<sec id="s4_1"><label>4.1</label><title>Accuracy Assessment</title>
<p>We evaluated the performance of all the developed models based on accuracy, error, sensitivity, specificity, false-positive rate, and false-negative rate. These metrics are defined as follows:</p>
<p>Accuracy of a method on a test dataset is the percentage used to correctly identify the test occurrences, and it was computed as <xref ref-type="disp-formula" rid="eqn-12">Eq. (12)</xref>. The error rate was obtained using <xref ref-type="disp-formula" rid="eqn-13">Eq. (13)</xref>. The uncertainty of the &#x2018;sensitivity&#x2019; and &#x2018;specificity&#x2019; was used to obtain the model&#x0027;s strength and stability <xref ref-type="disp-formula" rid="eqn-14">Eqs. (14)&#x2013;(15).</xref> False-positive ratio (FPR) and false-negative ratio (FNR) were obtained using <xref ref-type="disp-formula" rid="eqn-16">Eqs. (16)&#x2013;(17).</xref>
<disp-formula id="eqn-13"><label>(13)</label><mml:math id="mml-eqn-13" display="block"><mml:mrow><mml:mi mathvariant="normal">A</mml:mi><mml:mi mathvariant="normal">c</mml:mi><mml:mi mathvariant="normal">c</mml:mi><mml:mi mathvariant="normal">u</mml:mi><mml:mi mathvariant="normal">r</mml:mi><mml:mi mathvariant="normal">a</mml:mi><mml:mi mathvariant="normal">c</mml:mi><mml:mi mathvariant="normal">y</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="normal">T</mml:mi><mml:mi mathvariant="normal">P</mml:mi></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mi mathvariant="normal">T</mml:mi><mml:mi mathvariant="normal">N</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mrow><mml:mspace width="thickmathspace" /></mml:mrow><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:mrow><mml:mspace width="thickmathspace" /></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="normal">T</mml:mi><mml:mi mathvariant="normal">P</mml:mi><mml:mspace width="thickmathspace" /></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mspace width="thickmathspace" /><mml:mi mathvariant="normal">F</mml:mi><mml:mi mathvariant="normal">P</mml:mi><mml:mspace width="thickmathspace" /></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mspace width="thickmathspace" /><mml:mi mathvariant="normal">T</mml:mi><mml:mi mathvariant="normal">N</mml:mi><mml:mspace width="thickmathspace" /></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mspace width="thickmathspace" /><mml:mi mathvariant="normal">F</mml:mi><mml:mi mathvariant="normal">N</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:math></disp-formula>
<disp-formula id="eqn-14"><label>(14)</label><mml:math id="mml-eqn-14" display="block"><mml:mrow><mml:mi mathvariant="normal">E</mml:mi><mml:mi mathvariant="normal">r</mml:mi><mml:mi mathvariant="normal">r</mml:mi><mml:mi mathvariant="normal">o</mml:mi><mml:mi mathvariant="normal">r</mml:mi><mml:mspace width="thickmathspace" /></mml:mrow><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:mi mathvariant="normal">A</mml:mi><mml:mi mathvariant="normal">c</mml:mi><mml:mi mathvariant="normal">c</mml:mi><mml:mi mathvariant="normal">u</mml:mi><mml:mi mathvariant="normal">r</mml:mi><mml:mi mathvariant="normal">a</mml:mi><mml:mi mathvariant="normal">c</mml:mi><mml:mi mathvariant="normal">y</mml:mi></mml:mrow></mml:math></disp-formula>
<disp-formula id="eqn-15"><label>(15)</label><mml:math id="mml-eqn-15" display="block"><mml:mrow><mml:mi mathvariant="normal">S</mml:mi><mml:mi mathvariant="normal">e</mml:mi><mml:mi mathvariant="normal">n</mml:mi><mml:mi mathvariant="normal">s</mml:mi><mml:mi mathvariant="normal">i</mml:mi><mml:mi mathvariant="normal">v</mml:mi><mml:mi mathvariant="normal">i</mml:mi><mml:mi mathvariant="normal">t</mml:mi><mml:mi mathvariant="normal">i</mml:mi><mml:mi mathvariant="normal">y</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mspace width="thickmathspace" /></mml:mrow><mml:mfrac><mml:mrow><mml:mrow><mml:mi mathvariant="normal">T</mml:mi><mml:mi mathvariant="normal">P</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="normal">T</mml:mi><mml:mi mathvariant="normal">P</mml:mi></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mi mathvariant="normal">F</mml:mi><mml:mi mathvariant="normal">N</mml:mi></mml:mrow></mml:mrow></mml:mfrac><mml:mo>&#x00D7;</mml:mo><mml:mn>100</mml:mn></mml:math></disp-formula>
<disp-formula id="eqn-16"><label>(16)</label><mml:math id="mml-eqn-16" display="block"><mml:mrow><mml:mi mathvariant="normal">S</mml:mi><mml:mi mathvariant="normal">p</mml:mi><mml:mi mathvariant="normal">e</mml:mi><mml:mi mathvariant="normal">c</mml:mi><mml:mi mathvariant="normal">i</mml:mi><mml:mi mathvariant="normal">f</mml:mi><mml:mi mathvariant="normal">i</mml:mi><mml:mi mathvariant="normal">c</mml:mi><mml:mi mathvariant="normal">i</mml:mi><mml:mi mathvariant="normal">t</mml:mi><mml:mi mathvariant="normal">y</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mrow><mml:mi mathvariant="normal">T</mml:mi><mml:mi mathvariant="normal">N</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="normal">T</mml:mi><mml:mi mathvariant="normal">N</mml:mi></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mi mathvariant="normal">F</mml:mi><mml:mi mathvariant="normal">P</mml:mi></mml:mrow></mml:mrow></mml:mfrac><mml:mo>&#x00D7;</mml:mo><mml:mn>100</mml:mn></mml:math></disp-formula>
<disp-formula id="eqn-17"><label>(17)</label><mml:math id="mml-eqn-17" display="block"><mml:mrow><mml:mi mathvariant="normal">F</mml:mi><mml:mi mathvariant="normal">P</mml:mi><mml:mi mathvariant="normal">R</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mspace width="thickmathspace" /></mml:mrow><mml:mfrac><mml:mrow><mml:mrow><mml:mi mathvariant="normal">F</mml:mi><mml:mi mathvariant="normal">P</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="normal">T</mml:mi><mml:mi mathvariant="normal">P</mml:mi></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mi mathvariant="normal">F</mml:mi><mml:mi mathvariant="normal">P</mml:mi></mml:mrow></mml:mrow></mml:mfrac></mml:math></disp-formula>
<disp-formula id="eqn-18"><label>(18)</label><mml:math id="mml-eqn-18" display="block"><mml:mrow><mml:mi mathvariant="normal">F</mml:mi><mml:mi mathvariant="normal">N</mml:mi><mml:mi mathvariant="normal">R</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mspace width="thickmathspace" /></mml:mrow><mml:mfrac><mml:mrow><mml:mrow><mml:mi mathvariant="normal">F</mml:mi><mml:mi mathvariant="normal">N</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="normal">T</mml:mi><mml:mi mathvariant="normal">N</mml:mi></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mi mathvariant="normal">F</mml:mi><mml:mi mathvariant="normal">N</mml:mi></mml:mrow></mml:mrow></mml:mfrac><mml:mo>&#x00D7;</mml:mo><mml:mn>100</mml:mn></mml:math></disp-formula>
</p>
<p>TP, TN, FP, and FN represent true positive, true negative, false positive, and false negatives, respectively.</p>
</sec>
<sec id="s4_2"><label>4.2</label><title>Air Quality Classification Models</title>
<p>In this section, the results of air quality classification models are given. The performance of the developed SMOTEDNN model was assessed, with the unforeseen data kept separately during the training process for proper assessment and evaluation of the developed models. An attempt was made to see if the SMOTEDNN model was overfitted. It was evident that the variance between validation loss and training loss was almost negligible; therefore, overfitting did not exist. As mentioned earlier, the hyperparameters of SMOTEDNN were tuned using GridsearchCV, and callbacks were utilized to select the optimal number of epochs to prevent overfitting automatically. The SMOTEDNN model-optimized results were obtained using 17 epochs, which consumed fewer computing resources and lesser time (<xref ref-type="fig" rid="fig-8">Fig. 8</xref>). The present investigation utilized Tensor Processing Units (TPUs) v 2&#x2013;8. These TPUs are Google&#x0027;s application-specific circuits, which accelerate the training workflows of AI models. There were eight cores and 64GiB memory in the TPU v2-8 used in the present investigation. SMOTEDNN took 34&#x2005;s and 17&#x2005;ms with 17 epochs on an average. It was evident from <xref ref-type="fig" rid="fig-8">Fig. 8</xref> that the accuracy and loss were optimized for selecting the number of epochs through callbacks.</p>
<fig id="fig-8"><label>Figure 8</label><caption><title>The performance of the SMOTEDNN model based on accuracy and loss</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_21968-fig-8.png"/></fig>
<p>Based on the performance metrics given in <?A3B2 "tbl7",5,"anchor"?><xref ref-type="table" rid="table-7">Tab. 7</xref>, it was observed that SMOTEDNN outperformed all the other models, though the other models did fairly well. The reason behind better performance of Model 3 was the efficient data pre-processing and rigorous tuning of the hyperparameters.</p>
<table-wrap id="table-7"><label>Table 7</label><caption><title>Confusion matrix for all five developed Models to classify air quality index</title></caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">SMOTEDNN</th>
<th align="left">Good</th>
<th align="left">Satisfactory</th>
<th align="left">Moderate</th>
<th align="left">Poor</th>
<th align="left">Very poor</th>
<th align="left">Severe</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left"><bold>Good</bold></td>
<td align="left">3365</td>
<td align="left">0</td>
<td align="left">0</td>
<td align="left">0</td>
<td align="left">0</td>
<td align="left">0</td>
</tr>
<tr>
<td align="left"><bold>Satisfactory</bold></td>
<td align="left">0</td>
<td align="left">16845</td>
<td align="left">0</td>
<td align="left">6</td>
<td align="left">0</td>
<td align="left">0</td>
</tr>
<tr>
<td align="left"><bold>Moderate</bold></td>
<td align="left">0</td>
<td align="left">0</td>
<td align="left">4816</td>
<td align="left">0</td>
<td align="left">0</td>
<td align="left">0</td>
</tr>
<tr>
<td align="left"><bold>Poor</bold></td>
<td align="left">0</td>
<td align="left">0</td>
<td align="left">0</td>
<td align="left">13408</td>
<td align="left">0</td>
<td align="left">0</td>
</tr>
<tr>
<td align="left"><bold>Very poor</bold></td>
<td align="left">0</td>
<td align="left">0</td>
<td align="left">1</td>
<td align="left">0</td>
<td align="left">997</td>
<td align="left">1</td>
</tr>
<tr>
<td align="left"><bold>Severe</bold></td>
<td align="left">0</td>
<td align="left">1</td>
<td align="left">0</td>
<td align="left">0</td>
<td align="left">2</td>
<td align="left">3220</td>
</tr>
<tr>
<td align="left"><bold>XGBoost</bold></td>
<td align="left"><bold>Good</bold></td>
<td align="left"><bold>Satisfactory</bold></td>
<td align="left"><bold>Moderate</bold></td>
<td align="left"><bold>Poor</bold></td>
<td align="left"><bold>Very poor</bold></td>
<td align="left"><bold>Severe</bold></td>
</tr>
<tr>
<td align="left">Good</td>
<td align="left">3360</td>
<td align="left">0</td>
<td align="left">5</td>
<td align="left">0</td>
<td align="left">0</td>
<td align="left">0</td>
</tr>
<tr>
<td align="left">Satisfactory</td>
<td align="left">0</td>
<td align="left">16845</td>
<td align="left">0</td>
<td align="left">6</td>
<td align="left">0</td>
<td align="left">0</td>
</tr>
<tr>
<td align="left">Moderate</td>
<td align="left">0</td>
<td align="left">2</td>
<td align="left">4814</td>
<td align="left">0</td>
<td align="left">0</td>
<td align="left">0</td>
</tr>
<tr>
<td align="left">Poor</td>
<td align="left">0</td>
<td align="left">0</td>
<td align="left">0</td>
<td align="left">13408</td>
<td align="left">0</td>
<td align="left">0</td>
</tr>
<tr>
<td align="left">Very Poor</td>
<td align="left">0</td>
<td align="left">0</td>
<td align="left">1</td>
<td align="left">0</td>
<td align="left">984</td>
<td align="left">14</td>
</tr>
<tr>
<td align="left">Severe</td>
<td align="left">0</td>
<td align="left">0</td>
<td align="left">20</td>
<td align="left">0</td>
<td align="left">4</td>
<td align="left">3199</td>
</tr>
<tr>
<td align="left"><bold>RF</bold></td>
<td align="left"><bold>Good</bold></td>
<td align="left"><bold>Satisfactory</bold></td>
<td align="left"><bold>Moderate</bold></td>
<td align="left"><bold>Poor</bold></td>
<td align="left"><bold>Very poor</bold></td>
<td align="left"><bold>Severe</bold></td>
</tr>
<tr>
<td align="left"><bold>Good</bold></td>
<td align="left">3285</td>
<td align="left">0</td>
<td align="left">3</td>
<td align="left">77</td>
<td align="left">0</td>
<td align="left">0</td>
</tr>
<tr>
<td align="left"><bold>Satisfactory</bold></td>
<td align="left">0</td>
<td align="left">16818</td>
<td align="left">25</td>
<td align="left">8</td>
<td align="left">0</td>
<td align="left">0</td>
</tr>
<tr>
<td align="left"><bold>Moderate</bold></td>
<td align="left">0</td>
<td align="left">26</td>
<td align="left">4717</td>
<td align="left">0</td>
<td align="left">0</td>
<td align="left">73</td>
</tr>
<tr>
<td align="left"><bold>Poor</bold></td>
<td align="left">55</td>
<td align="left">113</td>
<td align="left">0</td>
<td align="left">13240</td>
<td align="left">0</td>
<td align="left">0</td>
</tr>
<tr>
<td align="left"><bold>Very poor</bold></td>
<td align="left">0</td>
<td align="left">2</td>
<td align="left">0</td>
<td align="left">0</td>
<td align="left">974</td>
<td align="left">23</td>
</tr>
<tr>
<td align="left"><bold>Severe</bold></td>
<td align="left">0</td>
<td align="left">0</td>
<td align="left">7</td>
<td align="left">0</td>
<td align="left">12</td>
<td align="left">3204</td>
</tr>
<tr>
<td align="left"><bold>SVM</bold></td>
<td align="left"><bold>Good</bold></td>
<td align="left"><bold>Satisfactory</bold></td>
<td align="left"><bold>Moderate</bold></td>
<td align="left"><bold>Poor</bold></td>
<td align="left"><bold>Very poor</bold></td>
<td align="left"><bold>Severe</bold></td>
</tr>
<tr>
	<td align="left">Good</td>
	<td align="left">3288</td>
	<td align="left">0</td>
	<td align="left">0</td>
	<td align="left">77</td>
	<td align="left">0</td>
	<td align="left">0</td>
</tr>
<tr>
	<td align="left">Satisfactory</td>
	<td align="left">0</td>
	<td align="left">16818</td>
	<td align="left">25</td>
	<td align="left">8</td>
	<td align="left">0</td>
	<td align="left">0</td>
</tr>
<tr>
	<td align="left">Moderate</td>
	<td align="left">0</td>
	<td align="left">26</td>
	<td align="left">4717</td>
	<td align="left">0</td>
	<td align="left">0</td>
	<td align="left">73</td>
</tr>
<tr>
	<td align="left">Poor</td>
	<td align="left">55</td>
	<td align="left">113</td>
	<td align="left">0</td>
	<td align="left">13240</td>
	<td align="left">0</td>
	<td align="left">0</td>
</tr>
<tr>
	<td align="left">Very poor</td>
	<td align="left">0</td>
	<td align="left">0</td>
	<td align="left">0</td>
	<td align="left">0</td>
	<td align="left">976</td>
	<td align="left">23</td>
</tr>
<tr>
<td align="left">Severe</td>
<td align="left">0</td>
<td align="left">0</td>
<td align="left">7</td>
<td align="left">0</td>
<td align="left">12</td>
<td align="left">3204</td>
</tr>
<tr>
<td align="left"/>
<td align="left"/>
<td align="left"/>
<td align="left"/>
<td align="left"/>
<td align="left"/>
<td align="left"/>
</tr>
<tr>
<td align="left"><bold>KNN</bold></td>
<td align="left"><bold>Good</bold></td>
<td align="left"><bold>Satisfactory</bold></td>
<td align="left"><bold>Moderate</bold></td>
<td align="left"><bold>Poor</bold></td>
<td align="left"><bold>Very poor</bold></td>
<td align="left"><bold>Severe</bold></td>
</tr>
<tr>
<td align="left"><bold>Good</bold></td>
<td align="left">3195</td>
<td align="left">0</td>
<td align="left">0</td>
<td align="left">170</td>
<td align="left">0</td>
<td align="left">0</td>
</tr>
<tr>
<td align="left"><bold>Satisfactory</bold></td>
<td align="left">1</td>
<td align="left">16568</td>
<td align="left">81</td>
<td align="left">201</td>
<td align="left">0</td>
<td align="left">0</td>
</tr>
<tr>
<td align="left"><bold>Moderate</bold></td>
<td align="left">0</td>
<td align="left">211</td>
<td align="left">4504</td>
<td align="left">0</td>
<td align="left">0</td>
<td align="left">101</td>
</tr>
<tr>
<td align="left"><bold>Poor</bold></td>
<td align="left">245</td>
<td align="left">778</td>
<td align="left">0</td>
<td align="left">12385</td>
<td align="left">0</td>
<td align="left">0</td>
</tr>
<tr>
<td align="left"><bold>Very poor</bold></td>
<td align="left">0</td>
<td align="left">0</td>
<td align="left">0</td>
<td align="left">0</td>
<td align="left">946</td>
<td align="left">53</td>
</tr>
<tr>
<td align="left"><bold>Severe</bold></td>
<td align="left">0</td>
<td align="left">0</td>
<td align="left">147</td>
<td align="left">0</td>
<td align="left">58</td>
<td align="left">3018</td>
</tr>
<tr>
<td align="left"/>
<td align="left"/>
<td align="left"/>
<td align="left"/>
<td align="left"/>
<td align="left"/>
<td align="left"/>
</tr>
<tr>
<td align="left"><bold>Models</bold></td>
<td align="left"><bold>SMOTEDNN</bold></td>
<td align="left"><bold>XGBoost</bold></td>
<td align="left"><bold>RF</bold></td>
<td align="left"><bold>SVM</bold></td>
<td align="left"><bold>KNN</bold></td>
<td align="left"/>
</tr>
<tr>
<td align="left"><bold>Accuracy</bold></td>
<td align="left">99.9</td>
<td align="left">99.2</td>
<td align="left">99.1</td>
<td align="left">99.01</td>
<td align="left">95.2</td>
<td align="left"/>
</tr>
<tr>
<td align="left">Sensitivity</td>
<td align="left">98.76</td>
<td align="left">96.54</td>
<td align="left">95.42</td>
<td align="left">95.23</td>
<td align="left">91.87</td>
<td align="left"/>
</tr>
<tr>
<td align="left">Specificity</td>
<td align="left">99.13</td>
<td align="left">97.65</td>
<td align="left">96.74</td>
<td align="left">96.58</td>
<td align="left">93.46</td>
<td align="left"/>
</tr>
<tr>
<td align="left">Error rate</td>
<td align="left">0.1</td>
<td align="left">0.8</td>
<td align="left">0.9</td>
<td align="left">0.99</td>
<td align="left">4.8</td>
<td align="left"/>
</tr>
<tr>
<td align="left">FDR</td>
<td align="left">2.17</td>
<td align="left">1.97</td>
<td align="left">2.73</td>
<td align="left">1.96</td>
<td align="left">6.49</td>
<td align="left"/>
</tr>
<tr>
<td align="left">FOR</td>
<td align="left">0.08</td>
<td align="left">0.16</td>
<td align="left">0.47</td>
<td align="left">0.73</td>
<td align="left">5.82</td>
<td align="left"/>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s4_3"><label>4.3</label><title>Air Quality Forecasting Models</title>
<p>Three air quality forecasting models were developed in the current investigation. The first linear model used one previous step as input to forecast the output of the following step (<?A3B2 "fig9",5,"anchor"?><xref ref-type="fig" rid="fig-9">Fig. 9a</xref>). The second linear model used seven previous steps to forecast the following step&#x0027;s output (<xref ref-type="fig" rid="fig-9">Fig. 9b</xref>). The third autoregressive model (Model 3) used tuning of the optimized number of steps required to forecast the air quality for New Delhi (<xref ref-type="fig" rid="fig-9">Fig. 9c</xref>).</p>
<fig id="fig-9"><label>Figure 9</label><caption><title>(a) AQI forecasting based on Model 1, (b) AQI forecasting based on Model 2, (c) AQI forecasting based on Model 3</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_21968-fig-9.png"/></fig>
<p>Model 1 showed a correlation value of 0.9 when comparing the forecasted value with the actual value with a root mean square error (RMSE) of 29.17 (<xref ref-type="fig" rid="fig-9">Fig. 9a</xref>). Model 1 can be used for forecasting based on the high correlation value (<?A3B2 "tbl8",5,"anchor"?><xref ref-type="table" rid="table-8">Tab. 8</xref>). Model 2 was developed to forecast a one-step model based on n previous steps; initially, we chose a value of k was one week (<xref ref-type="fig" rid="fig-9">Fig. 9b</xref>). The RMSE value for Model 2 was 53.21, larger than Model 1 (<?A3B2 "tbl9",5,"anchor"?><xref ref-type="table" rid="table-9">Tab. 9</xref>). Model 3 was based on autoregression. However, as already mentioned, the time series pattern was non-stationary and unsuitable for autoregression. Therefore, the trend and seasonality from the time series were removed based on the approach given by [<xref ref-type="bibr" rid="ref-27">27</xref>]. The AR package was used from the StatsModel library using Python to develop Model 3. The tuning of n, trend parameter, and the seasonal parameter was crucial to obtain the optimized forecasting model. The k values ranging from 1 to 365 days were given to model using a loop to obtain the optimized autoregression model. The optimized value for k was 14 steps or previous days value (<xref ref-type="fig" rid="fig-9">Fig. 9c</xref>). The trend parameter and seasonal parameter were n and True, respectively. The RMSE for Model 3 was 15.48 with an R-value of 0.93, which was significantly better than Models 1 and 2, it was also indicated through <xref ref-type="fig" rid="fig-9">Fig. 9c</xref> that the third developed model shown better fitting between actual and forecasted AQI values.</p>
<table-wrap id="table-8"><label>Table 8</label><caption><title>Forecasting performance of Model 1 with k&#x2009;&#x003D;&#x2009;1 step</title></caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">CM</th>
<th align="left">actual</th>
<th align="left">pred</th>
<th align="left">r</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">actual</td>
<td align="left">1</td>
<td align="left">0.8981</td>
<td align="left">0.9</td>
</tr>
<tr>
<td align="left">pred</td>
<td align="left">0.8989</td>
<td align="left">1</td>
<td align="left"/>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="table-9"><label>Table 9</label><caption><title>Forecasting performance of model 2 with k&#x2009;&#x003D;&#x2009;7 steps</title></caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">CM</th>
<th align="left">actual</th>
<th align="left">pred</th>
<th align="left">r</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">1</td>
<td align="left">384</td>
<td align="left">372.427</td>
<td align="left">0.4</td>
</tr>
<tr>
<td align="left">2</td>
<td align="left">340</td>
<td align="left">353.787</td>
<td align="left"/>
</tr>
<tr>
<td align="left">3</td>
<td align="left">372</td>
<td align="left">346.997</td>
<td align="left"/>
</tr>
<tr>
<td align="left">4</td>
<td align="left">425</td>
<td align="left">343.519</td>
<td align="left"/>
</tr>
<tr>
<td align="left">5</td>
<td align="left">455</td>
<td align="left">338.261</td>
<td align="left"/>
</tr>
<tr>
<td align="left">6</td>
<td align="left">506</td>
<td align="left">332.855</td>
<td align="left"/>
</tr>
<tr>
<td align="left">7</td>
<td align="left">417</td>
<td align="left">328.223</td>
<td align="left"/>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s4_4"><label>4.4</label><title>Comparison with Other Studies</title>
<p>The present investigation developed five models to classify air pollution severity based on different pollutants. The models developed in the present investigation were compared with other studies to assess the performance of the developed models (see <?A3B2 "tbl10",5,"anchor"?><xref ref-type="table" rid="table-10">Tab. 10</xref>). Overall, compared to the other models, the SMOTEDNN model produced the highest classification accuracy. Similarly, the autoregression-based Model 3 for forecasting yielded higher accuracy compared to other studies (<xref ref-type="table" rid="table-10">Tab. 10</xref>).</p>
<table-wrap id="table-10"><label>Table 10</label><caption><title>Comparison with other studies</title></caption>
<table frame="hsides">
<colgroup>
<col align="left" charoff="8"/>
<col align="left" charoff="8"/>
<col align="left" charoff="8"/>
<col align="left" charoff="8"/>
</colgroup>
<thead>
<tr>
<th align="left" colspan="4">AQI Classification models performance comparison</th>
</tr>
<tr>
<th align="left">Author</th>
<th align="left">Method</th>
<th align="left">Accuracy %</th>
<th align="left">Remarks</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-7">7</xref>]</td>
<td align="left">Recurrent neural networks</td>
<td align="left">81.00</td>
<td align="left">Used for O<sub>3</sub> pollutant</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-8">8</xref>]</td>
<td align="left">Recurrent neural networks</td>
<td align="left">95.00</td>
<td align="left">Only for PM2.5 and PM10</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-9">9</xref>]</td>
<td align="left">SVR&#x002B;wavelet, Ensemble</td>
<td align="left">96.00</td>
<td align="left">Only for PM10</td>
</tr>
<tr>
<td align="left"> [<xref ref-type="bibr" rid="ref-10">10</xref>]</td>
<td align="left">LR</td>
<td align="left">93.00</td>
<td align="left">Lack of hyperparameters tuning</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-10">10</xref>]</td>
<td align="left">RF</td>
<td align="left">86.00</td>
<td align="left">Lack of hyperparameters tuning</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-28">28</xref>]</td>
<td align="left">LR</td>
<td align="left">68.74</td>
<td align="left">Lack of hyperparameters tuning</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-28">28</xref>]</td>
<td align="left">SGD</td>
<td align="left">66.23</td>
<td align="left">Lack of hyperparameters tuning</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-28">28</xref>]</td>
<td align="left">RFR</td>
<td align="left">72.22</td>
<td align="left">Lack of hyperparameters tuning</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-28">28</xref>]</td>
<td align="left">DTR</td>
<td align="left">71.08</td>
<td align="left">Lack of hyperparameters tuning</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-28">28</xref>]</td>
<td align="left">MLP</td>
<td align="left">70.43</td>
<td align="left">Lack of hyperparameters tuning</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-28">28</xref>]</td>
<td align="left">SVR</td>
<td align="left">70.97</td>
<td align="left">Lack of hyperparameters tuning</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-28">28</xref>]</td>
<td align="left">GBR</td>
<td align="left">74.91</td>
<td align="left">Lack of hyperparameters tuning</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-28">28</xref>]</td>
<td align="left">ABR</td>
<td align="left">49.63</td>
<td align="left">Lack of hyperparameters tuning</td>
</tr>
<tr>
<td align="left">Current study</td>
<td align="left">SMOTEDNN</td>
<td align="left">99.90</td>
<td align="left">Novel model with rigorous hyperparameters tuning</td>
</tr>
<tr>
<td align="left">Current study</td>
<td align="left">XGBoost</td>
<td align="left">99.20</td>
<td align="left">Rigorous hyperparameters tuning</td>
</tr>
<tr>
<td align="left">Current study</td>
<td align="left">RF</td>
<td align="left">99.10</td>
<td align="left">Rigorous hyperparameters tuning</td>
</tr>
<tr>
<td align="left">Current study</td>
<td align="left">SVM</td>
<td align="left">99.01</td>
<td align="left">Rigorous hyperparameters tuning</td>
</tr>
<tr>
<td align="left">Current study</td>
<td align="left">KNN</td>
<td align="left">95.20</td>
<td align="left">Hyperparameters tuning</td>
</tr>
<tr>
<td align="left" colspan="4">AQI Forecasting Models Performance Comparison</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-9">9</xref>]</td>
<td align="left">ANN</td>
<td align="left">0.91</td>
<td align="left">Forecasting for PM10, R-value, lack of parameters tuning</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-10">10</xref>]</td>
<td align="left">SARIMA</td>
<td align="left">20.69</td>
<td align="left">Forecasting of AQI for 2019, RMSE</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-10">10</xref>]</td>
<td align="left">SARIMA</td>
<td align="left">43.95</td>
<td align="left">Forecasting of AQI for 2020, high RMSE</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-10">10</xref>]</td>
<td align="left">Facebook-Prophet</td>
<td align="left">22.81</td>
<td align="left">Forecasting of AQI for 2019, RMSE</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-11">11</xref>]</td>
<td align="left">ANN</td>
<td align="left">0.88</td>
<td align="left">O<sub>3</sub> peak forecasting, lack of parameters tuning</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-12">12</xref>]</td>
<td align="left">ANN</td>
<td align="left">0.89</td>
<td align="left">Air pollution forecasting, lack of parameters tuning</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-13">13</xref>]</td>
<td align="left">Hybrid ANN</td>
<td align="left">0.70&#x2013;0.83</td>
<td align="left">Forecasting of PM10, CO, NO, NO<sub>2</sub>, lack of parameters tuning</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-14">14</xref>]</td>
<td align="left">PCA&#x002B;ANN</td>
<td align="left">0.27&#x2013;0.75</td>
<td align="left">Weak data splitting and lack of parameters tuning</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-15">15</xref>]</td>
<td align="left">MLP&#x002B;ANN</td>
<td align="left">0.78&#x2013;0.82</td>
<td align="left">Detailed investigation</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-16">16</xref>,<xref ref-type="bibr" rid="ref-17">17</xref>]</td>
<td align="left">Decision tree and Naive based</td>
<td align="left">0.91</td>
<td align="left">Time series forecasting</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-18">18</xref>]</td>
<td align="left">SVR&#x002B;RBF</td>
<td align="left">&#x02212;0.67</td>
<td align="left">Used one station data</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-18">18</xref>]</td>
<td align="left">SVR&#x002B;Linear</td>
<td align="left">&#x02212;0.61</td>
<td align="left">Used one station data</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-18">18</xref>]</td>
<td align="left">SVR&#x002B;Poly</td>
<td align="left">&#x02212;0.79</td>
<td align="left">Used one station data</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-18">18</xref>]</td>
<td align="left">RNN&#x002B;LSTM</td>
<td align="left">0.48</td>
<td align="left">Used one station data</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-29">29</xref>]</td>
<td align="left">Urban Airshed Model</td>
<td align="left">0.69</td>
<td align="left">Forecasting for O<sub>3</sub>, R-value</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-30">30</xref>]</td>
<td align="left">RNN</td>
<td align="left">0.35&#x2013;0.36</td>
<td align="left">Only forecast PM2.5</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-31">31</xref>]</td>
<td align="left">StackedLSTM</td>
<td align="left">10.65&#x2013;21.44</td>
<td align="left">RMSE for 12&#x2005;h forecasting for CO, O<sub>3</sub>, NO<sub>2</sub>, SO<sub>2</sub>, and PM, used data only from 2 sensor locations</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-32">32</xref>]</td>
<td align="left">GRU&#x002B;SGD</td>
<td align="left">0.82</td>
<td align="left">PM 2.5 forecasting with GRU, SGD and RNN</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-33">33</xref>]</td>
<td align="left">CNN&#x002B;LSTM</td>
<td align="left">0.43</td>
<td align="left">Weak parameter tuning</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-34">34</xref>]</td>
<td align="left">LSTM&#x002B;PSO</td>
<td align="left">18</td>
<td align="left">MAPE</td>
</tr>
<tr>
<td align="left">Current study</td>
<td align="left">Linear regression model 1</td>
<td align="left">0.90</td>
<td align="left">Pre-processing and statistical operations were performed with one step-index forecasting</td>
</tr>
<tr>
<td align="left">Current study</td>
<td align="left">Linear regression model 2</td>
<td align="left">0.40</td>
<td align="left">Trial and error model for seven step-index forecasts, low accuracy</td>
</tr>
<tr>
<td align="left">Current study</td>
<td align="left">Auto regression model 3</td>
<td align="left">0.93</td>
<td align="left">Tuned number of k for autoregression, higher accuracy</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
<sec id="s5"><label>5</label><title>Conclusions</title>
<p>The increasing rate of industrialization and urbanization is the main reason for worsening air pollution status, especially for developing nations. Previous studies that used ML and statistical modeling to classify and forecast the air pollution suffered heavily due to the dataset&#x0027;s nature and complexity, resulting in a lack of efficient classification and forecasting of air pollution. Especially, ML-based models have shown improper data handling, class imbalance issues, data division for training and testing, and, most importantly, inaccurate hyperparameter tuning. The current investigation contributed toward bridging the identified gaps for both aspects of air pollution analysis, i.e., classification and forecasting. Five ML models were developed, including one novel model named SMOTEDNN to address the air pollution classification. All five models utilize efficient data pre-processing and rigorous hyperparameter optimization. All the developed models showed excellent performance based on accuracy, precision, sensitivity, and specificity. Significantly, the novel model SMOTEDNN showed higher accuracy (99.90&#x0025;) than the other models from the current investigation and previous studies. The primary reason for this exceptional performance was rigorous data pre-processing and intense hyperparameter tuning. The performance of the two forecasting models (Model 1 and Model 3) was good; however, Model 2 was not efficient enough. The study indicated that air pollution in India, during Jan 2015 to Jul 2020, showed severity of pollution trends with two weeks of index data as a baseline. The future scope of present investigation to include more datasets through IoT based pollution dataset to real time air quality assessment and forecasting.</p>
</sec>
</body>
<back>
<ack>
<p>The author is thankful to CPCB for air pollution data.</p>
</ack>
<fn-group>
<fn fn-type="other"><p><bold>Funding Statement:</bold> Mohd Anul Haq would like to thank Deanship of Scientific Research at Majmaah University for supporting this work under Project No. R-2021-202.</p></fn>
<fn fn-type="conflict"><p><bold>Conflicts of Interest:</bold> The author declare that they have no conflicts of interest to report regarding the present study.</p></fn>
</fn-group>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>P. M.</given-names> <surname>Mannucci</surname></string-name> and <string-name><given-names>M.</given-names> <surname>Franchini</surname></string-name></person-group>, &#x201C;<article-title>Health effects of ambient air pollution in developing countries</article-title>,&#x201D; <source>International Journal of Environmental Research and Public Health</source><italic>,</italic> vol. <volume>14</volume>, no. <issue>9</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>8</lpage>, <year>2017</year>.</mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>D.</given-names> <surname>Fang</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Hubacek</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Ni</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Chen</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>Clean air for some: Unintended spillover effects of regional air pollution policies</article-title>,&#x201D; <source>Science Advances</source><italic>,</italic> vol. <volume>5</volume>, no. <issue>4</issue>, pp. <fpage>4707</fpage>&#x2013;<lpage>4731</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>D. A.</given-names> <surname>Glencross</surname></string-name>, <string-name><given-names>T. R.</given-names> <surname>Ho</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Cami&#x00F1;a</surname></string-name>, <string-name><given-names>C. M.</given-names> <surname>Hawrylowicz</surname></string-name> and <string-name><given-names>P. E.</given-names> <surname>Pfeffer</surname></string-name></person-group>, &#x201C;<article-title>Air pollution and its effects on the immune system</article-title>,&#x201D; <source>Free Radical Biology and Medicine</source><italic>,</italic> vol. <volume>151</volume>, pp. <fpage>56</fpage>&#x2013;<lpage>68</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Miao</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Zhou</surname></string-name> and <string-name><given-names>T. Z.</given-names> <surname>Huang</surname></string-name></person-group>, &#x201C;<article-title>Local segmentation of images using an improved fuzzy C-means clustering algorithm based on self-adaptive dictionary learning</article-title>,&#x201D; <source>Applied Soft Computing Journal</source><italic>,</italic> vol. <volume>91</volume>, pp. <fpage>1</fpage>&#x2013;<lpage>15</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Z.</given-names> <surname>Ghaemi</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Alimohammadi</surname></string-name> and <string-name><given-names>M.</given-names> <surname>Farnaghi</surname></string-name></person-group>, &#x201C;<article-title>LaSVM-based big data learning system for dynamic prediction of air pollution in Tehran</article-title>,&#x201D; <source>Environmental Monitoring and Assessment</source><italic>,</italic> vol. <volume>190</volume>, no. <issue>5</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>17</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Mihara</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Wen</surname></string-name></person-group>, &#x201C;<article-title>Time series prediction of CO2, TVOC and HCHO based on machine learning at different sampling points</article-title>,&#x201D; <source>Building and Environment</source><italic>,</italic> vol. <volume>146</volume>, pp. <fpage>238</fpage>&#x2013;<lpage>246</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M. A.</given-names> <surname>Esfandani</surname></string-name> and <string-name><given-names>H.</given-names> <surname>Nematzadeh</surname></string-name></person-group>, &#x201C;<article-title>Predicting air pollution in Tehran: Genetic algorithm and back propagation neural network</article-title>,&#x201D; <source>Journal of Artificial Intelligence and Data Mining</source><italic>,</italic> vol. <volume>4</volume>, no. <issue>1</issue>, pp. <fpage>49</fpage>&#x2013;<lpage>54</lpage>, <year>2016</year>.</mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>F.</given-names> <surname>Biancofiore</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Busilacchio</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Verdecchia</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Tomassetti</surname></string-name>, <string-name><given-names>E.</given-names> <surname>Aruffo</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>Recursive neural network model for analysis and forecast of PM10 and PM2.5</article-title>,&#x201D; <source>Atmospheric Pollution Research</source><italic>,</italic> vol. <volume>8</volume>, no. <issue>4</issue>, pp. <fpage>652</fpage>&#x2013;<lpage>659</lpage>, <year>2017</year>.</mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>K.</given-names> <surname>Siwek</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Osowski</surname></string-name></person-group>, &#x201C;<article-title>Improving the accuracy of prediction of PM 10 pollution by the wavelet transformation and an ensemble of neural predictors</article-title>,&#x201D; <source>Engineering Applications of Artificial Intelligence</source><italic>,</italic> vol. <volume>25</volume>, no. <issue>6</issue>, pp. <fpage>1246</fpage>&#x2013;<lpage>1258</lpage>, <year>2012</year>.</mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>R.</given-names> <surname>Mangayarkarasi</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Vanmathi</surname></string-name>, <string-name><given-names>M. Z.</given-names> <surname>Khan</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Noorwali</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Jain</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>Covid19: Forecasting air quality index and particulate matter (pm2.5)</article-title>,&#x201D; <source>Computers, Materials and Continua</source><italic>,</italic> vol. <volume>67</volume>, no. <issue>3</issue>, pp. <fpage>3363</fpage>&#x2013;<lpage>3380</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A. L.</given-names> <surname>Dutot</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Rynkiewicz</surname></string-name>, <string-name><given-names>F. E.</given-names> <surname>Steiner</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Rude</surname></string-name></person-group>, &#x201C;<article-title>A 24-h forecast of ozone peaks and exceedance levels using neural classifiers and weather predictions</article-title>,&#x201D; <source>Environmental Modelling and Software</source><italic>,</italic> vol. <volume>22</volume>, no. <issue>9</issue>, pp. <fpage>1261</fpage>&#x2013;<lpage>1269</lpage>, <year>2007</year>.</mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>N. H. A.</given-names> <surname>Rahman</surname></string-name>, <string-name><given-names>M. H.</given-names> <surname>Lee</surname></string-name>, <string-name><given-names>M. T.</given-names> <surname>Latif</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Suhartono</surname></string-name></person-group>, &#x201C;<article-title>Forecasting of air pollution index with artificial neural network</article-title>,&#x201D; <source>Jurnal Teknologi (Sciences and Engineering)</source><italic>,</italic> vol. <volume>63</volume>, no. <issue>2</issue>, pp. <fpage>59</fpage>&#x2013;<lpage>64</lpage>, <year>2013</year>.</mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Russo</surname></string-name> and <string-name><given-names>A. O.</given-names> <surname>Soares</surname></string-name></person-group>, &#x201C;<article-title>Hybrid model for urban air pollution forecasting: A stochastic spatio-temporal approach</article-title>,&#x201D; <source>Mathematical Geosciences</source><italic>,</italic> vol. <volume>46</volume>, no. <issue>1</issue>, pp. <fpage>75</fpage>&#x2013;<lpage>93</lpage>, <year>2014</year>.</mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Azid</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Juahir</surname></string-name>, <string-name><given-names>M. T.</given-names> <surname>Latif</surname></string-name>, <string-name><given-names>S. M.</given-names> <surname>Zain</surname></string-name> and <string-name><given-names>M. R.</given-names> <surname>Osman</surname></string-name></person-group>, &#x201C;<article-title>Feed-forward artificial neural network model for air pollutant index prediction in the southern region of peninsular Malaysia</article-title>,&#x201D; <source>Journal of Environmental Protection</source><italic>,</italic> vol. <volume>04</volume>, no. <issue>12</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>10</lpage>, <year>2013</year>.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>L.</given-names> <surname>Bai</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Ma</surname></string-name> and <string-name><given-names>H.</given-names> <surname>Lu</surname></string-name></person-group>, &#x201C;<article-title>Air pollution forecasts: An overview</article-title>,&#x201D; <source>International Journal of Environmental Research and Public Health</source><italic>,</italic> vol. <volume>15</volume>, no. <issue>4</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>44</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A. J.</given-names> <surname>Cohen</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Brauer</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Burnett</surname></string-name>, <string-name><given-names>H. R.</given-names> <surname>Anderson</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Frostad</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>Estimates and 25-year trends of the global burden of disease attributable to ambient air pollution: An analysis of data from the global burden of diseases study 2015</article-title>,&#x201D; <source>Lancet</source><italic>,</italic> vol. <volume>389</volume>, no. <issue>10082</issue>, pp. <fpage>1907</fpage>&#x2013;<lpage>1918</lpage>, <year>2017</year>.</mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>R. W.</given-names> <surname>Gore</surname></string-name> and <string-name><given-names>D. S.</given-names> <surname>Deshpande</surname></string-name></person-group>, &#x201C;<article-title>An approach for classification of health risks based on air quality levels</article-title>,&#x201D; in <conf-name>Proc.-1st Int. Conf. on Intelligent Systems and Information Management, ICISIM</conf-name>, <year>2017</year>, <conf-loc>Aurangabad, India</conf-loc>, vol. <volume>2017-Jan</volume>, pp. <fpage>58</fpage>&#x2013;<lpage>61</lpage>, <year>2017</year>.</mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>K. S.</given-names> <surname>Rao</surname></string-name>, <string-name><given-names>G. L.</given-names> <surname>Devi</surname></string-name> and <string-name><given-names>N.</given-names> <surname>Ramesh</surname></string-name></person-group>, &#x201C;<article-title>Air quality prediction in visakhapatnam with lstm based recurrent neural networks</article-title>,&#x201D; <source>International Journal of Intelligent Systems and Applications</source><italic>,</italic> vol. <volume>11</volume>, no. <issue>2</issue>, pp. <fpage>18</fpage>&#x2013;<lpage>24</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Ameer</surname></string-name>, <string-name><given-names>M. A.</given-names> <surname>Shah</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Khan</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Song</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Maple</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>Comparative analysis of machine learning techniques for predicting air quality in smart cities</article-title>,&#x201D; <source>IEEE Access</source><italic>,</italic> vol. <volume>7</volume>, pp. <fpage>128325</fpage>&#x2013;<lpage>128338</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>K.</given-names> <surname>Gu</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Zhou</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Sun</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Zhao</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Liu</surname></string-name></person-group>, &#x201C;<article-title>Prediction of air quality in shenzhen based on neural network algorithm</article-title>,&#x201D; <source>Neural Computing and Applications</source><italic>,</italic> vol. <volume>32</volume>, no. <issue>7</issue>, pp. <fpage>1879</fpage>&#x2013;<lpage>1892</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>D.</given-names> <surname>Sahoo</surname></string-name>, <string-name><given-names>S. C. H.</given-names> <surname>Hoi</surname></string-name> and <string-name><given-names>B.</given-names> <surname>Li</surname></string-name></person-group>, &#x201C;<article-title>Large scale online multiple kernel regression with application to time-series prediction</article-title>,&#x201D; <source>ACM Transactions on Knowledge Discovery from Data</source><italic>,</italic> vol. <volume>13</volume>, no. <issue>1</issue>, pp. <fpage>33</fpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><given-names>V. N.</given-names> <surname>Vapnik</surname></string-name></person-group>, &#x201C;<source>The Nature of Statistical Learning Theory</source><italic>,</italic>&#x201D; <publisher-loc>New York, USA</publisher-loc>: <publisher-name>Springer</publisher-name>. <year>1995</year>.</mixed-citation></ref>
<ref id="ref-23"><label>[23]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>V. N.</given-names> <surname>Vapnik</surname></string-name></person-group>, &#x201C;<article-title>An overview of statistical learning theory</article-title>,&#x201D; <source>IEEE Transactions on Neural Networks</source><italic>,</italic> vol. <volume>10</volume>, no. <issue>5</issue>, pp. <fpage>988</fpage>&#x2013;<lpage>999</lpage>, <year>1999</year>.</mixed-citation></ref>
<ref id="ref-24"><label>[24]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>C.</given-names> <surname>Cortes</surname></string-name> and <string-name><given-names>V.</given-names> <surname>Vapnik</surname></string-name></person-group>, &#x201C;<article-title>Support-vector networks</article-title>,&#x201D; <source>Machine Learning</source><italic>,</italic> vol. <volume>20</volume>, pp. <fpage>273</fpage>&#x2013;<lpage>297</lpage>, <year>1995</year>.</mixed-citation></ref>
<ref id="ref-25"><label>[25]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>F.</given-names> <surname>Pedregosa</surname></string-name>, <string-name><given-names>G.</given-names> <surname>Varoquaux</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Gramfort</surname></string-name>, <string-name><given-names>V.</given-names> <surname>Michel</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Thirion</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>Scikit-learn: Machine learning in python</article-title>,&#x201D; <source>Journal of Machine Learning Research</source><italic>,</italic> vol. <volume>12</volume>, pp. <fpage>2825</fpage>&#x2013;<lpage>2830</lpage>, <year>2011</year>.</mixed-citation></ref>
<ref id="ref-26"><label>[26]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>L. E.</given-names> <surname>Raileanu</surname></string-name> and <string-name><given-names>K.</given-names> <surname>Stoffel</surname></string-name></person-group>, &#x201C;<article-title>Theoretical comparison between the gini index and information gain criteria</article-title>,&#x201D; <source>Annals of Mathematics and Artificial Intelligence</source><italic>,</italic> vol. <volume>41</volume>, no. <issue>1</issue>, pp. <fpage>77</fpage>&#x2013;<lpage>93</lpage>, <year>2004</year>.</mixed-citation></ref>
<ref id="ref-27"><label>[27]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Sultan</surname></string-name>, <string-name><given-names>N. C.</given-names> <surname>Sturchio</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Alsefry</surname></string-name>, <string-name><given-names>M. K.</given-names> <surname>Emil</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Mohamed</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>Assessment of age, origin, and sustainability of fossil aquifers: A geochemical and remote sensing-based approach</article-title>,&#x201D; <source>Journal of Hydrology</source><italic>,</italic> vol. <volume>576</volume>, no. <issue>May</issue>, pp. <fpage>325</fpage>&#x2013;<lpage>341</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-28"><label>[28]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>C.</given-names> <surname>Srivastava</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Singh</surname></string-name> and <string-name><given-names>A. P.</given-names> <surname>Singh</surname></string-name></person-group>, &#x201C;<article-title>Estimation of air pollution in Delhi using machine learning techniques</article-title>,&#x201D; in <conf-name>2018 Int. Conf. on Computing, Power and Communication Technologies, GUCON 2018</conf-name>, no. <issue>2018</issue>, pp. <fpage>304</fpage>&#x2013;<lpage>309</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-29"><label>[29]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M. E.</given-names> <surname>Chang</surname></string-name> and <string-name><given-names>C.</given-names> <surname>Cardelino</surname></string-name></person-group>, &#x201C;<article-title>Application of the urban airshed model to forecasting next-day peak ozone concentrations in Atlanta</article-title>,&#x201D; <source>Journal of the Air &#x0026; Waste Management Association</source><italic>,</italic> vol. <volume>50</volume>, no. <issue>11</issue>, pp. <fpage>2010</fpage>&#x2013;<lpage>2024</lpage>, <year>2000</year>.</mixed-citation></ref>
<ref id="ref-30"><label>[30]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>B.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Yan</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Lang</surname></string-name> and <string-name><given-names>G.</given-names> <surname>Qu</surname></string-name></person-group>, &#x201C;<article-title>A spatiotemporal recurrent neural network for prediction of atmospheric PM2.5: A case study of Beijing</article-title>,&#x201D; <source>IEEE Transactions on Computational Social Systems</source><italic>,</italic> vol. <volume>8</volume>, no. <issue>3</issue>, pp. <fpage>578</fpage>&#x2013;<lpage>588</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-31"><label>[31]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>V.</given-names> <surname>Chaudhary</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Deshbhratar</surname></string-name>, <string-name><given-names>V.</given-names> <surname>Kumar</surname></string-name> and <string-name><given-names>D.</given-names> <surname>Paul</surname></string-name></person-group>, &#x201C;<article-title>Time series based lstm model to predict air pollutant&#x0027;s concentration for prominent cities in India</article-title>,&#x201D; in <conf-name>Proc: Udm&#x2019;18</conf-name>, <conf-loc>London</conf-loc>, pp. <fpage>1</fpage>&#x2013;<lpage>9</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-32"><label>[32]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>C. J.</given-names> <surname>Masinde</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Gitahi</surname></string-name> and <string-name><given-names>M.</given-names> <surname>Hahn</surname></string-name></person-group>, &#x201C;<article-title>Training recurrent neural networks for particulate matter concentration prediction</article-title>,&#x201D; <source>International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences - ISPRS Archives</source><italic>,</italic> vol. <volume>43</volume>, no. <issue>b2</issue>, pp. <fpage>1575</fpage>&#x2013;<lpage>1582</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-33"><label>[33]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>K.</given-names> <surname>Tripathi</surname></string-name> and <string-name><given-names>P.</given-names> <surname>Pathak</surname></string-name></person-group>, &#x201C;<article-title>Deep learning techniques for air pollution</article-title>,&#x201D; in <conf-name>Proc. Int. Conf. on Computing, Communication, and Intelligent Systems (ICCCIS)</conf-name><conf-loc>, no.</conf-loc> <issue>2021</issue>, pp. <fpage>1013</fpage>&#x2013;<lpage>1020</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-34"><label>[34]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Heydari</surname></string-name>, <string-name><given-names>N. M.</given-names> <surname>Majidi</surname></string-name>, <string-name><given-names>D. A.</given-names> <surname>Garcia</surname></string-name>, <string-name><given-names>F.</given-names> <surname>Keynia</surname></string-name> and <string-name><given-names>L. D.</given-names> <surname>Santoli</surname></string-name></person-group>, &#x201C;<article-title>Air pollution forecasting application based on deep learning model and optimization algorithm</article-title>,&#x201D; <source>Clean Technology and Environment Policy</source><italic>,</italic> pp. <fpage>1</fpage>&#x2013;<lpage>15</lpage>, <year>2021</year>.</mixed-citation></ref>
</ref-list>
</back>
</article>