<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CMC</journal-id>
<journal-id journal-id-type="nlm-ta">CMC</journal-id>
<journal-id journal-id-type="publisher-id">CMC</journal-id>
<journal-title-group>
<journal-title>Computers, Materials &#x0026; Continua</journal-title>
</journal-title-group>
<issn pub-type="epub">1546-2226</issn>
<issn pub-type="ppub">1546-2218</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">19323</article-id>
<article-id pub-id-type="doi">10.32604/cmc.2022.019323</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Distributed Healthcare Framework Using MMSM-SVM and P-SVM Classification</article-title>
<alt-title alt-title-type="left-running-head">Distributed Healthcare Framework Using MMSM-SVM and P-SVM Classification</alt-title>
<alt-title alt-title-type="right-running-head">Distributed Healthcare Framework Using MMSM-SVM and P-SVM Classification</alt-title>
</title-group>
<contrib-group content-type="authors">
<contrib id="author-1" contrib-type="author" corresp="yes"><name name-style="western"><surname>Sujitha</surname><given-names>R.</given-names></name><email>rsujitharesearch1@gmail.com</email>
</contrib>
<contrib id="author-2" contrib-type="author"><name name-style="western"><surname>Paramasivan</surname><given-names>B.</given-names></name>
</contrib>
<aff><institution>Department of Information Technology, National Engineering College (Autonomous)</institution>, <addr-line>Kovilpatti, 628503, Tamilnadu</addr-line>, <country>India</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>&#x002A;</label>Corresponding Author: R. Sujitha. Email: <email>rsujitharesearch1@gmail.com</email></corresp>
</author-notes>
<pub-date pub-type="epub" date-type="pub" iso-8601-date="2021-08-30"><day>30</day><month>08</month><year>2021</year>
</pub-date>
<volume>70</volume>
<issue>1</issue>
<fpage>1557</fpage>
<lpage>1572</lpage>
<history>
<date date-type="received"><day>10</day><month>4</month><year>2021</year></date>
<date date-type="accepted"><day>27</day><month>5</month><year>2021</year></date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2022 Sujitha and Paramasivan</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Sujitha and Paramasivan</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CMC_19323.pdf"></self-uri>
<abstract>
<p>With the modernization of machine learning techniques in healthcare, different innovations including support vector machine (SVM) have predominantly played a major role in classifying lung cancer, predicting coronavirus disease 2019, and other diseases. In particular, our algorithm focuses on integrated datasets as compared with other existing works. In this study, parallel-based SVM (P-SVM) and multiclass-based multiple submodels (MMSM-SVM) were used to analyze the optimal classification of lung diseases. This analysis aimed to find the optimal classification of lung diseases with id and stages, such as key-value pairs in MapReduce combined with P-SVM and MMSVM for binary and multiclasses, respectively. For non-linear classification, kernel clustering-based SVM embedded with multiple submodels was developed. Both algorithms were developed using Apache spark environment, and data for the analysis were retrieved from microscope lab, UCI, Kaggle, and General Thoracic surgery database along with some electronic health records related to various lung diseases to increase the dataset size to 5 GB. Performance measures were conducted using a 5 GB dataset with five nodes. Dataset size was finally increased, and task analysis and CPU utilization were measured.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Lung cancer</kwd>
<kwd>COVID-19</kwd>
<kwd>machine learning</kwd>
<kwd>deep learning</kwd>
<kwd>parallel based support vector machine</kwd>
<kwd>multiclass-based multiple submodel</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1"><label>1</label><title>Introduction</title>
<p>Big data plays a vital role in analyzing extremely large data sets with reduced complexity and efficient analysis. With enhanced techniques of big data, a large amount of data has been handled in parallel. In specific, data classification has been performed using salient solutions. In the real world, data with exponential growth are complex and challenging to classify [<xref ref-type="bibr" rid="ref-1">1</xref>]. Prediction of coronavirus disease 2019 (COVID-19) is mandatory to prevent the risk of spread, and pre-determination of lung cancer stages is mandatory to determine lung cells damaged in increasing stages [<xref ref-type="bibr" rid="ref-2">2</xref>]. In medical science, affected parts can be retrieved and used to diagnose early stages of the disease [<xref ref-type="bibr" rid="ref-3">3</xref>]. Biopsy is the initial step in diagnosis; during this process, cells are sampled and intricately examined under the microscope to predetermine intra-tumoral cells [<xref ref-type="bibr" rid="ref-4">4</xref>]. However, efficient methods and equipment for accurate recognition and diagnosis remain lacking.</p>
<p>Parallel classification has become appropriate to solve classification problems in big data; in specific, distributed support vector machines (SVMs) have an iterative MapReduce framework, improved communication between nodes, and linear classification [<xref ref-type="bibr" rid="ref-5">5</xref>]. Classification and regression problems have become salient in binary classification, and MapReduce-based distributed parallel-based SVM (P-SVM) has been proposed to solve them [<xref ref-type="bibr" rid="ref-6">6</xref>]. P-SVM also solves optimization problems and is used for statistical learning theory to predict hypotheses with improved accuracy through iterative training of split datasets.</p>
<p>Lung cancer is a leading disease worldwide. For eradication of lung cancer, health checkers should employ various methods, but processing and extracting results from many datasets are challenging [<xref ref-type="bibr" rid="ref-7">7</xref>]. A previous study [<xref ref-type="bibr" rid="ref-8">8</xref>] extracted information from several datasets by using P-SVM. This technique uses row-based, approximate matrix factorization, which loads only essential data to each machine to perform parallel computation. In addition, some of the computations use big data tools. Another study [<xref ref-type="bibr" rid="ref-9">9</xref>] solved optimization problems over the cloud by using MapReduce techniques along with parallel computation. It also used statistical learning theory to predict the hypothesis that minimizes empirical risks and focused on multiclass parallel computations.</p>
<p>In [<xref ref-type="bibr" rid="ref-10">10</xref>], the author used multiple submodel parallel SVM (MSM-SVM) on a spark to accelerate the training process with non-linear SVM. Furthermore, data splitting methods improve the performance of parallel computations and approximate global solution with several local submodels. The author deployed and encountered a multiclass with a &#x201C;one-against-one&#x201D; strategy [<xref ref-type="bibr" rid="ref-11">11</xref>]. A new convolutional neural network-based multimodal disease risk prediction algorithm has been proposed to handle structured and unstructured data [<xref ref-type="bibr" rid="ref-12">12</xref>,<xref ref-type="bibr" rid="ref-13">13</xref>]. In addition, the latent factor model has been developed to handle incomplete data [<xref ref-type="bibr" rid="ref-14">14</xref>]. The former process also reconstructs missing data. Reference [<xref ref-type="bibr" rid="ref-15">15</xref>] analyzed the persistence of diabetes by using HUE. Moreover, they accurately counted the number of persons suffering from diabetes by using SVM. Reference [<xref ref-type="bibr" rid="ref-16">16</xref>] developed a tele-ecg system with Hadoop and big data framework by using mining techniques for processing and classifying datasets related to cardiovascular disease. Although Hadoop has been developed, some of the issues in handling large datasets raised concerns in terms of server handling. The most significant and essential tool in big data is MapReduce. The efficient use of MapReduce improves performance [<xref ref-type="bibr" rid="ref-17">17</xref>,<xref ref-type="bibr" rid="ref-18">18</xref>]. The author analyzed MapReduce impacts and penalty parameters with respect to large-scale datasets, divided datasets into chunks, and processed them under the Hadoop framework. Another efficient sub-model in MapReduce is an adjoint method [<xref ref-type="bibr" rid="ref-19">19</xref>]. The MapReduce based adjoint method prevents brain disease by detecting it earlier.</p>
<p>Reference [<xref ref-type="bibr" rid="ref-20">20</xref>] implemented communication efficient versions of parallel SVM and further developed CA-SVM. The author deployed statistical methods to improve its efficiency in communication and used algorithmic refinements. C-means clustering, which uses the UCI machine repository to collect data, has been proposed for analyzing patient records [<xref ref-type="bibr" rid="ref-21">21</xref>,<xref ref-type="bibr" rid="ref-22">22</xref>]. The author provided a framework for predicting and prescribing drugs for specified diseases. Reference [<xref ref-type="bibr" rid="ref-23">23</xref>] provided predictive pattern matching with Hadoop MapReduce environments to predict diabetes mellitus. The developed machine learning-based prediction methodology has drawbacks in its early analysis. Therefore, a new accurate prediction methodology is required to overcome the proposed methodology.</p>
<p>References [<xref ref-type="bibr" rid="ref-24">24</xref>,<xref ref-type="bibr" rid="ref-25">25</xref>] deliberated the basics of predictive analytics in healthcare. In our system, RBF acts as a non-linear kernel for SVM with respect to study. The study showed the impacts of predictive analytics in healthcare as general applications. A study [<xref ref-type="bibr" rid="ref-26">26</xref>] deployed a parallel RMC algorithm to classify medical data. This algorithm works better for integrated data as in our model. Hence, we used this model for comparison with our proposed model. Cascade SVM from a previous study [<xref ref-type="bibr" rid="ref-27">27</xref>] has been updated and compared with our proposed model. The only difference with cascade SVM is that it classifies the seed of flowers, which is the general application.</p>
<p>In this study, datasets with underlying SVM with threshold-based techniques for classification were developed. Furthermore, classified support vectors were fed to MMSM-SVM with some parameter changes and passes to MapReduce to extract id and stages from classified vectors. Apart from multiple submodels, to cluster similar datasets, were incorporated with kernel clustering-based SVM (KCB-SVM) and de-clustering was reduced and to cover all hidden data the most of dataset falls near the margin of support vectors. P-SVM and MMSM-SVM with some parameter settings were convened for binary classification. Finally, id and stages were retrieved from the MapReduce framework with four nodes of parallel computation. This analysis aimed to find the optimal classification of lung diseases with id and stages, such as key-value pair in MapReduce combined with P-SVM and MMSVM for binary and multiclasses, respectively. In this analysis, the MMSM-SVM algorithm was developed from MSM-SVM to classify high-dimensional lung disease datasets. Furthermore, the MapReduce technique was utilized to retrieve different id and stages from the classified support vectors. The obtained result proves that the developed MMSM-SVM algorithm has 92&#x0025; higher accuracy in classification with optimal data sets when compared with other learning techniques. The P-SVM algorithm also has an accuracy of 90&#x0025; in classification with different parameter settings for every dataset. Both algorithms were developed using Apache spark environment, and the data for the analysis were retrieved from microscope lab, UCI, Kaggle, and General Thoracic surgery database along with some HER (Electronic Health Records) related to various lung diseases to increase the datasets.</p>
</sec>
<sec id="s2"><label>2</label><title>Proposed Approach and Methodology</title>
<p>In big data classification, SVM models and sub-models have their own architecture. The proposed classification architecture is shown in <?A3B2 "fig1",5,"anchor"?><xref ref-type="fig" rid="fig-1">Fig. 1</xref>. Samples similar in nature form one cluster, and others are more likely to become support vectors. Samples in different regions are less likely able to train. Meanwhile, the training of samples uses local sub-models.</p>
<sec id="s2_1"><label>2.1</label><title>Modelling of Multi Class-Based Multiple Sub Models Support Vector Machine</title>
<p>Multiclass classification ensembles the most significant part in various classification tasks because it resides in the stages or classes of datasets. The submodel approach is suited for multiclass classification. For every class <inline-formula id="ieqn-1"><mml:math id="mml-ieqn-1"><mml:mi>C</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mi>D</mml:mi><mml:mi>T</mml:mi></mml:math></inline-formula>/C, a complete multiclass with function f<sub>i</sub>(X) is trained. The class C<sub>t</sub>/ is selected as the preferred class of any sample, where <inline-formula id="ieqn-2"><mml:math id="mml-ieqn-2"><mml:mi>s</mml:mi><mml:mi>a</mml:mi><mml:mi>m</mml:mi><mml:mi>p</mml:mi><mml:mi>l</mml:mi><mml:mi>e</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mi>D</mml:mi><mml:mi>T</mml:mi><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:mi>C</mml:mi></mml:math></inline-formula> and wins all other classes using the winner-takes-all strategy. The resultant models can be formed as
<disp-formula id="eqn-1"><label>(1)</label><mml:math id="mml-eqn-1" display="block"><mml:mi>f</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>Y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>g</mml:mi><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:mi>D</mml:mi><mml:mi>T</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mspace width="thickmathspace" /><mml:mrow><mml:msub><mml:mrow><mml:mover><mml:mi>f</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi>c</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>Y</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>.</mml:mo></mml:math></disp-formula>
Decision function of local sub models can be derived as
<disp-formula id="eqn-2"><label>(2)</label><mml:math id="mml-eqn-2" display="block"><mml:mi>D</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>g</mml:mi><mml:mi>n</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:munderover><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mi>m</mml:mi></mml:munderover><mml:mrow><mml:msub><mml:mo>&#x221D;</mml:mo><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mi>K</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:msup><mml:mi>x</mml:mi><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>+</mml:mo><mml:mi>b</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula>
</p>
<p>For one-<italic>vs.</italic>-all, let training set T &#x003D; ((x<sub>1,</sub> y<sub>1</sub>),(x<sub>2</sub>, y<sub>2</sub>),&#x2026;,(x<sub>n</sub>, y<sub>n</sub>)), where y &#x003D; 1&#x2026;k, where k is the number of classes. Let l &#x003D; 1&#x2026;k number of classes and l considered as positive class and other k &#x2212; 1 classes are considered as negative classes. With these representations, decision function becomes</p>
<p>With One-<italic>vs.</italic>-all,
<disp-formula id="eqn-3"><label>(3)</label><mml:math id="mml-eqn-3" display="block"><mml:mi>f</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>g</mml:mi><mml:mi>n</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:msup><mml:mi>D</mml:mi><mml:mi>l</mml:mi></mml:msup></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mtext>&#xA0;</mml:mtext><mml:mrow><mml:mi mathvariant="normal">a</mml:mi><mml:mi mathvariant="normal">n</mml:mi><mml:mi mathvariant="normal">d</mml:mi></mml:mrow></mml:math></disp-formula>
To find if the specified class belongs to or not,
<disp-formula id="eqn-4"><label>(4)</label><mml:math id="mml-eqn-4" display="block"><mml:mrow><mml:msup><mml:mi>D</mml:mi><mml:mi>r</mml:mi></mml:msup></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mrow><mml:mi mathvariant="normal">m</mml:mi><mml:mi mathvariant="normal">a</mml:mi><mml:mi mathvariant="normal">x</mml:mi></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:msup><mml:mi>D</mml:mi><mml:mi>l</mml:mi></mml:msup></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>,</mml:mo><mml:mtext>&#xA0;</mml:mtext><mml:mrow><mml:mi mathvariant="normal">w</mml:mi><mml:mi mathvariant="normal">h</mml:mi><mml:mi mathvariant="normal">e</mml:mi><mml:mi mathvariant="normal">r</mml:mi><mml:mi mathvariant="normal">e</mml:mi></mml:mrow><mml:mtext>&#xA0;</mml:mtext><mml:mi>r</mml:mi><mml:mtext>&#xA0;</mml:mtext><mml:mrow><mml:mi mathvariant="normal">r</mml:mi><mml:mi mathvariant="normal">e</mml:mi><mml:mi mathvariant="normal">p</mml:mi><mml:mi mathvariant="normal">r</mml:mi><mml:mi mathvariant="normal">e</mml:mi><mml:mi mathvariant="normal">s</mml:mi><mml:mi mathvariant="normal">e</mml:mi><mml:mi mathvariant="normal">n</mml:mi><mml:mi mathvariant="normal">t</mml:mi><mml:mi mathvariant="normal">s</mml:mi><mml:mtext>&#xA0;</mml:mtext><mml:mi mathvariant="normal">t</mml:mi><mml:mi mathvariant="normal">h</mml:mi><mml:mi mathvariant="normal">e</mml:mi><mml:mtext>&#xA0;</mml:mtext><mml:mi mathvariant="normal">a</mml:mi><mml:mi mathvariant="normal">s</mml:mi><mml:mi mathvariant="normal">s</mml:mi><mml:mi mathvariant="normal">i</mml:mi><mml:mi mathvariant="normal">g</mml:mi><mml:mi mathvariant="normal">n</mml:mi><mml:mi mathvariant="normal">e</mml:mi><mml:mi mathvariant="normal">d</mml:mi><mml:mtext>&#xA0;</mml:mtext><mml:mi mathvariant="normal">c</mml:mi><mml:mi mathvariant="normal">l</mml:mi><mml:mi mathvariant="normal">a</mml:mi><mml:mi mathvariant="normal">s</mml:mi><mml:mi mathvariant="normal">s</mml:mi></mml:mrow><mml:mo>.</mml:mo></mml:math></disp-formula></p>
<fig id="fig-1"><label>Figure 1</label><caption><title>Overall layout of proposed methodology</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_19323-fig-1.png"/></fig>
<p>In multiple submodels, the system must leave away local training by enabling the cluster and splitting models. Some of the clusters may have classes C<sub>t</sub>, where <inline-formula id="ieqn-3"><mml:math id="mml-ieqn-3"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:mi>D</mml:mi><mml:mi>T</mml:mi></mml:math></inline-formula> with the effect of classification insight into classes with the largest similarity in the feature space. The clustering model forms local subsets and classes with high preference. KCB-SVM is incorporated with approximate hierarchical clustering method, which scans whole large data sets and provides boundary for similar classes. It also estimates the best boundary with respect to limited resources and provides high scalability.</p>
<p>In the clustering stage, the clustering feature (CF) for every cluster should include
<disp-formula id="eqn-5"><label>(5)</label><mml:math id="mml-eqn-5" display="block"><mml:mrow><mml:mi mathvariant="normal">C</mml:mi><mml:mi mathvariant="normal">F</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="bold-italic">c</mml:mi></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mi mathvariant="bold-italic">r</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>,</mml:mo></mml:math></disp-formula>
where <italic>c</italic> and <italic>r</italic> form the center of the cluster and radius.
where the radius is calculated by
<disp-formula id="eqn-6"><label>(6)</label><mml:math id="mml-eqn-6" display="block"><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:munder><mml:mrow><mml:mo form="prefix">max</mml:mo></mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:msub><mml:mi>H</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:munder><mml:mspace width="thickmathspace" /><mml:mi>d</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>c</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:math></disp-formula>
Let (<italic>x<sub>i</sub></italic>, <italic>y<sub>i</sub></italic>) be input parameters and <italic>H<sub>i</sub></italic> be the geometric metric of the mapped feature space. Radius is calculated with respect to the cluster center and distance between two data points.</p>
<p>For the RBF kernel, the distance measures are computed as follows:
<disp-formula id="eqn-7"><label>(7)</label><mml:math id="mml-eqn-7" display="block"><mml:mi>d</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:msqrt><mml:mn>2</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mn>2</mml:mn><mml:mrow><mml:mi mathvariant="normal">e</mml:mi><mml:mi mathvariant="normal">x</mml:mi><mml:mi mathvariant="normal">p</mml:mi></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mi>&#x03C3;</mml:mi><mml:mo fence="false" stretchy="false">&#x2016;</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:mrow><mml:msubsup><mml:mo fence="false" stretchy="false">&#x2016;</mml:mo><mml:mn>2</mml:mn><mml:mn>2</mml:mn></mml:msubsup><mml:mo stretchy="false">)</mml:mo></mml:msqrt></mml:math></disp-formula>
Suppose some clusters are not selected for computing the cluster center, then it is computed as follows:
<disp-formula id="eqn-8"><label>(8)</label><mml:math id="mml-eqn-8" display="block"><mml:mi>Y</mml:mi><mml:mo>=</mml:mo><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>g</mml:mi><mml:munder><mml:mrow><mml:mo form="prefix">min</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi>l</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:munder><mml:mo>&#x2061;</mml:mo><mml:mi>d</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>c</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>l</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>c</mml:mi><mml:mrow><mml:mrow><mml:msub><mml:mi>l</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:msub></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:math></disp-formula>
</p>
<p><roman>H<sub>(l&#x2212;1)k</sub></roman> is merged into cluster <roman>H<sub>ly</sub></roman> and then CFs of cluster <roman>H<sub>ly</sub></roman> are updated as <roman>H<sub>ly</sub></roman> &#x003D; <roman>H<sub>ly</sub></roman>, where <roman>H<sub>(l&#x2212;1)k</sub></roman> depicts the unselected cluster and <roman>H<sub>ly</sub></roman> depicts the unselected cluster with margin Y(8). The radius can be calculated as the maximum summation of clusters and unselected clusters with the distance of clusters and unselected clusters. Here, <italic>l</italic> is the cluster level.</p>
<p>Declustering can be implemented with the condition for positive classes,
<disp-formula id="eqn-9"><label>(9)</label><mml:math id="mml-eqn-9" display="block"><mml:mfrac><mml:mrow><mml:mrow><mml:msup><mml:mi>w</mml:mi><mml:mi>T</mml:mi></mml:msup></mml:mrow><mml:mrow><mml:msub><mml:mi>c</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>+</mml:mo><mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mo fence="false" stretchy="false">&#x2016;</mml:mo><mml:mi>w</mml:mi><mml:mo fence="false" stretchy="false">&#x2016;</mml:mo></mml:mrow></mml:mfrac><mml:mo>&#x2264;</mml:mo><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>+</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mo fence="false" stretchy="false">&#x2016;</mml:mo><mml:mi>w</mml:mi><mml:mo fence="false" stretchy="false">&#x2016;</mml:mo></mml:mrow></mml:mfrac></mml:math></disp-formula>
for negative classes,
<disp-formula id="eqn-10"><label>(10)</label><mml:math id="mml-eqn-10" display="block"><mml:mfrac><mml:mrow><mml:mrow><mml:msup><mml:mi>w</mml:mi><mml:mi>T</mml:mi></mml:msup></mml:mrow><mml:mrow><mml:msub><mml:mi>c</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>+</mml:mo><mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mo fence="false" stretchy="false">&#x2016;</mml:mo><mml:mi>w</mml:mi><mml:mo fence="false" stretchy="false">&#x2016;</mml:mo></mml:mrow></mml:mfrac><mml:mo>&#x2264;</mml:mo><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mo fence="false" stretchy="false">&#x2016;</mml:mo><mml:mi>w</mml:mi><mml:mo fence="false" stretchy="false">&#x2016;</mml:mo></mml:mrow></mml:mfrac></mml:math></disp-formula>
</p>
<p>Let the parameters be the number of cores, sample size, LC lung cancer datasets with 1 to C, where C depicts the number of classes, SP sputum datasets, and datasets with 1 to C and DT depicts datasets and clustering model.</p>
<p><bold>Algorithm 1:</bold></p>
<p>1. Load support vectors of (x<sub>i,</sub> y<sub>i</sub>) with more than one classes C to RDD</p>
<p>2. Use RDD.map() to generate (id, stages) pairs with RDD</p>
<p>3. Merge the vectors with respect to id to form local subsets {LC}<sup>C</sup><sub>i &#x003D; 1</sub> and {SP}<sup>C</sup><sub>i &#x003D; 1</sub> with RDD.groupByKey();</p>
<p>4. Use KCB-SVM to select subclusters from {LC}</p>
<p>5. Cluster similar classes using WTA.</p>
<p>6. Decluster other classes.</p>
<p>7. Repeat steps 5 and 6 to cover all data points.</p>
<p>8. For i &#x003D; 1&#x2026;C parallel do</p>
<p>9. f<sub>i</sub>: X &#x02192; R.</p>
<p>10. Y<sub>i</sub> &#x003D; argmax<sub>{{1&#x2026;C}}</sub></p>
<p>11. f<sub>i</sub>(X) as the final class</p>
</sec>
<sec id="s2_2"><label>2.2</label><title>Modelling of Parallel Support Vector Machine</title>
<p><?A3B2 "fig2",5,"anchor"?><xref ref-type="fig" rid="fig-2">Fig. 2</xref> depicts the layout of P-SVM. The support vectors that are already classified are given as input to P-SVM. Subvectors are calculated and optimized using P-SVM. Then, the calculated support vectors of the previous sub-SVM are given as an input to the next sub-SVM. Therefore, the output of more than two last sub support vectors forms input to the present support vectors. The process continues until single support vectors are derived as the result. Furthermore, P-SVM can be achieved in spark using library LIBSVM.</p>
<fig id="fig-2"><label>Figure 2</label><caption><title>Design of developed approach</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_19323-fig-2.png"/></fig>
<p><?A3B2 "fig3",5,"anchor"?><xref ref-type="fig" rid="fig-3">Fig. 3</xref> shows the support vectors of sputum, lung cancer, and thoracic surgery datasets with n size. The size of support vectors may vary for every iteration.</p>
<fig id="fig-3"><label>Figure 3</label><caption><title>Parallel SVM architecture</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_19323-fig-3.png"/></fig>
<p><bold>Algorithm 2:</bold></p>
<p>Input: n training instances, m number of machines, s global support vectors, h hypothesis, v iterations</p>
<p>Output:</p>
<p>1. Load support vectors of (x<sub>i</sub>, y<sub>i</sub>) for i &#x003D; 0, 1</p>
<p>2. Use RDD.map() to generate (id, stages) pairs with RDD</p>
<p>3. For i &#x003D; 0 to 1 do</p>
<p>4. Load xi into hdfs</p>
<p>5. End for</p>
<p>6. Initially h &#x02192; 0, v &#x02192; 0 in master node</p>
<p>7. While v &#x003E; 0</p>
<p>8. V &#x02192; v &#x002B; 1</p>
<p>9. For each node in the cluster C, C &#x003D; c1, c2&#x2026; c<sub>m</sub></p>
<p>10. S &#x02192; s &#x002B; 1;</p>
<p>11. S &#x02192; s &#x002B; n; add global support vectors with subsets of training data</p>
<p>12. Train support vector machine with new merged dataset.</p>
<p>13. Find out all the support vectors with each data subset.</p>
<p>14. Merge all local SVs and calculate the global SVs</p>
<p>15. If h<sub>v</sub> &#x003D; h<sup>v&#x2212;1</sup>stop, else go to step 8</p>
<p>16. f<sub>i</sub>(x) as final class</p>
<p>17. Map reduce ();</p>
<p>18. Generate (id, stages) with MapReduce()</p>
<p>19. End</p>
</sec>
<sec id="s2_3"><label>2.3</label><title>Modelling of MapReduce</title>
<p>MapReduce is a programming model suitable for processing huge data. The developed MapReduce is shown in <?A3B2 "fig4",5,"anchor"?><xref ref-type="fig" rid="fig-4">Fig. 4</xref>. Hadoop is capable of running MapReduce programs written in various languages, such as Map phase and Reduce phase. An input to each phase is key-value pairs, and every programmer needs to specify two functions: map and reduce.</p>
<fig id="fig-4"><label>Figure 4</label><caption><title>MapReduce framework of proposed methodology</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_19323-fig-4.png"/></fig>
<p>Let MT be a set of all tasks with map function and MD be the results of data after being split. Let splitting (MS) be as follows,
<disp-formula id="eqn-11"><label>(11)</label><mml:math id="mml-eqn-11" display="block"><mml:mi>M</mml:mi><mml:mi>S</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mi>d</mml:mi><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi></mml:math></disp-formula>
MS depicts the splitting of input data in the map phase with respect to different tasks. The data derived from MS is the partial function of given input data, which is required.
<disp-formula id="eqn-12"><label>(12)</label><mml:math id="mml-eqn-12" display="block"><mml:mi>M</mml:mi><mml:mi>T</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mi>d</mml:mi><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo><mml:mi>p</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>t</mml:mi><mml:mi>i</mml:mi><mml:mi>t</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>M</mml:mi><mml:mi>T</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula>
<disp-formula id="eqn-13"><label>(13)</label><mml:math id="mml-eqn-13" display="block"><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>p</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo><mml:mi>p</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>t</mml:mi><mml:mi>i</mml:mi><mml:mi>t</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mspace width="thickmathspace" /><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>M</mml:mi><mml:mi>T</mml:mi><mml:mo>&#x2217;</mml:mo><mml:mi>R</mml:mi><mml:mi>D</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:math></disp-formula></p>
<p>Herein, the map phase involves portioning of input data and further returns reduced data, which is then fed as input to reduce tasks. Therefore, results of map tasks would be of (id, stages) pairs with unstructured format. The Reduce task is formatted as,
<disp-formula id="eqn-14"><label>(14)</label><mml:math id="mml-eqn-14" display="block"><mml:mi>R</mml:mi><mml:mi>S</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mi>p</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>t</mml:mi><mml:mi>i</mml:mi><mml:mi>t</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>M</mml:mi><mml:mi>T</mml:mi><mml:mo>&#x2217;</mml:mo><mml:mi>R</mml:mi><mml:mi>D</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:math></disp-formula>
</p>
<p>The above function <xref ref-type="disp-formula" rid="eqn-14">(14)</xref> Reduce splits (RS) process intermediate results by formatting and generate partition of reduced data. Then, the reduced task (RT) is given by
<disp-formula id="eqn-15"><label>(15)</label><mml:math id="mml-eqn-15" display="block"><mml:mi>R</mml:mi><mml:mi>T</mml:mi><mml:mo>=</mml:mo><mml:mi>P</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>t</mml:mi><mml:mi>i</mml:mi><mml:mi>t</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>R</mml:mi><mml:mi>D</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:math></disp-formula>
RT takes input as RS and partitions the reduced data, which is in required format (id, stages).</p>
</sec>
</sec>
<sec id="s3"><label>3</label><title>Experimental Setup</title>
<p>We computed the classification accuracy in the data center with three executors per node and five nodes used. Hence, five nodes with RAM size 64 GB and executor memory at 19 GB and total big data size to 500 GB are used. Furthermore, we increased data size starting from 5 GB by leap and bounds and reduced the running time. Hence, we need 15 tasks/node for data and used 75 tasks with five nodes in parallel. Moreover, Pyspark, LIBSVM, and MapReduce for Parallel SVM binary classification and MMSM-SVM environments were used for the parallel execution of multiclass. The datasets used in the experiment are listed in <?A3B2 "tbl1",5,"anchor"?><xref ref-type="table" rid="table-1">Tab. 1</xref>. We used 8:2 for sputum, thoracic surgery, and lung cancer datasets. Sample size for MMSM-SVM was 0.5. The iteration of the experiment increases by n times, where n depends on the size of the datasets.</p>
<table-wrap id="table-7">
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<tbody>
<tr>
<td>a. System specific:</td>
</tr>
<tr>
<td>CPU (system specific)</td>
<td>Details-cores (4 cores/cpu)</td>
<td>No. of nodes</td>
</tr>
<tr>
<td></td>
<td>Memory-128 gb </td>
<td>3</td>
</tr>
<tr>
<td></td>
<td>Network-10 gbps</td>
<td></td>
</tr>
<tr>
<td>Another system</td>
<td>core-4 cores/cpu</td>
<td></td>
</tr>
<tr>
<td></td>
<td>Memory-64 gb</td>
<td></td></tr>
<tr>
<td></td>
<td>Network-10 gbps</td>
<td>2</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="table-1"><label>Table 1</label><caption><title>Comparison results for binary classification</title></caption>
<table frame="hsides">
<colgroup>
<col charoff="5"></col>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left" rowspan="3">Methods</th>
<th align="left" colspan="2">Sputum datasets</th>
<th align="left" colspan="2">Sputum datasets</th>
<th align="left" colspan="2">Thoracic surgery</th>
<th align="left" colspan="2">Thoracic surgery</th>
</tr>
<tr>
<th align="left" colspan="2">C &#x003D; 2, &#x03B3; &#x003D; 0.09</th>
<th align="left" colspan="2">C &#x003D; 2, &#x03B3; &#x003D; 2</th>
<th align="left" colspan="2">C &#x003D; 2, &#x03B3; &#x003D; 0.09</th>
<th align="left" colspan="2">C &#x003D; 2, &#x03B3; &#x003D; 2</th>
</tr>
<tr>
<th align="left">Time (s)</th>
<th align="left">Accuracy</th>
<th align="left">Time (s)</th>
<th align="left">Accuracy</th>
<th align="left">Time (s)</th>
<th align="left">Accuracy</th>
<th align="left">Time (s)</th>
<th align="left">Accuracy</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">P-SVM&#x2217;</td>
<td align="left">28</td>
<td align="left">0.9032</td>
<td align="left">15</td>
<td align="left">0.904</td>
<td align="left">31</td>
<td align="left">0.921</td>
<td align="left">19</td>
<td align="left">0.9204</td>
</tr>
<tr>
<td align="left">MapReduce-based adjoint (2018)</td>
<td align="left">72</td>
<td align="left">0.8404</td>
<td align="left">115</td>
<td align="left">0.8696</td>
<td align="left">70</td>
<td align="left">0.832</td>
<td align="left">100</td>
<td align="left">0.852</td>
</tr>
<tr>
<td align="left">Cascade SVM(2018)</td>
<td align="left">70</td>
<td align="left">0.85</td>
<td align="left">62</td>
<td align="left">0.812</td>
<td align="left">55</td>
<td align="left">0.89</td>
<td align="left">60</td>
<td align="left">0.90</td>
</tr>
<tr>
<td align="left">Overall accuracy&#x2217;</td>
<td align="left" colspan="2">90.2&#x0025;</td>
<td align="left" colspan="2">90.4&#x0025;</td>
<td align="left" colspan="2">92.2&#x0025;</td>
<td align="left" colspan="2">92.02&#x0025;</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The datasets used in our work are listed in <xref ref-type="table" rid="table-1">Tab. 1</xref>. Furthermore, 8:2 was considered for sputum datasets, 7:3 for thoracic surgery datasets, and 5:5 for lung cancer datasets as training data. The sample size for MSM-SVM was 0.5. The iteration of the experiment increases by n times, where n depends on the size of the datasets. We set the iteration as 200 for binary because stability was achieved in the 200<sup><roman>th</roman></sup> iteration.</p>
</sec>
<sec id="s4"><label>4</label><title>Results and Discussion</title>
<p>MMSM-SVM is also a submodel of P-SVM. The difference is that P-SVM classifies well in binary classification. To obtain accurate results, we used MMSM-SVM and P-SVM for multiclass and binary classification, respectively. The obtained experimental results are shown in <xref ref-type="table" rid="table-1">Tabs. 1</xref> and <?A3B2 "tbl2",5,"anchor"?><xref ref-type="table" rid="table-2">2</xref> for binary and multiclass classification, respectively. The obtained results were compared with previous literature [<xref ref-type="bibr" rid="ref-19">19</xref>,<xref ref-type="bibr" rid="ref-27">27</xref>]. C and &#x03B3; values changed, and time in sec and accuracy were measured. The analysis was carried out on the basics of C &#x003D; 2 and &#x03B3; &#x003D; 0.09 for sputum datasets, C &#x003D; 2 and &#x03B3; &#x003D; 2 for sputum datasets, C &#x003D; 2 and &#x03B3; &#x003D; 0.09 for thoracic surgery datasets, and C &#x003D; 2 and &#x03B3; &#x003D; 2 for thoracic surgery datasets.</p>
<table-wrap id="table-2"><label>Table 2</label><caption><title>Comparison results for multiclass classification</title></caption>
<table frame="hsides">
<colgroup>
<col charoff="5"></col>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left" rowspan="3">Methods</th>
<th align="left" colspan="2">Sputum datasets</th>
<th align="left" colspan="2">Sputum datasets</th>
<th align="left" colspan="2">Lung cancer</th>
<th align="left" colspan="2">Lung cancer</th>
</tr>
<tr>
<th align="left" colspan="2">C &#x003D; 2, &#x03B3; &#x003D; 0.09</th>
<th align="left" colspan="2">C &#x003D; 2, &#x03B3; &#x003D; 2</th>
<th align="left" colspan="2">C &#x003D; 2, &#x03B3; &#x003D; 0.09</th>
<th align="left" colspan="2">C &#x003D; 2, &#x03B3; &#x003D; 2</th>
</tr>
<tr>
<th align="left">Time (s)</th>
<th align="left">Accuracy</th>
<th align="left">Time (s)</th>
<th align="left">Accuracy</th>
<th align="left">Time (s)</th>
<th align="left">Accuracy</th>
<th align="left">Time (s)</th>
<th align="left">Accuracy</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">MMSM-SVM&#x2217;</td>
<td align="left">10</td>
<td align="left">0.912</td>
<td align="left">80</td>
<td align="left">0.914</td>
<td align="left">43</td>
<td align="left">0.9222</td>
<td align="left">47</td>
<td align="left">0.924</td>
</tr>
<tr>
<td align="left">MapReduce-based adjoint (2018)</td>
<td align="left">72</td>
<td align="left">0.8404</td>
<td align="left">115</td>
<td align="left">0.845</td>
<td align="left">70</td>
<td align="left">0.852</td>
<td align="left">100</td>
<td align="left">0.881</td>
</tr>
<tr>
<td align="left">Cascade SVM (2018)</td>
<td align="left">350</td>
<td align="left">0.85</td>
<td align="left">380</td>
<td align="left">0.87</td>
<td align="left">420</td>
<td align="left">0.86</td>
<td align="left">408</td>
<td align="left">0.87</td>
</tr>
<tr>
<td align="left">Overall accuracy&#x2217;</td>
<td align="left" colspan="2">91.2&#x0025;</td>
<td align="left" colspan="2">91.4&#x0025;</td>
<td align="left" colspan="2">92.2&#x0025;</td>
<td align="left" colspan="2">92.02&#x0025;</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>As shown in <xref ref-type="table" rid="table-1">Tab. 1</xref>, the proposed methodology takes 28&#x2005;s and 90&#x0025; accuracy with C &#x003D; 2 and &#x03B3; &#x003D; 0.09 for sputum datasets while 15&#x2005;s and 90.4&#x0025; accuracy for C &#x003D; 2 and &#x03B3; &#x003D; 2 sputum datasets. Computation timing is 31&#x2005;s and accuracy is 92.2&#x0025; for C &#x003D; 2 and &#x03B3; &#x003D; 0.09 thoracic surgery while 19&#x2005;s and 92&#x0025; for C &#x003D; 2 and &#x03B3; &#x003D; 2 thoracic surgery. This analysis indicates that the proposed methodology takes lesser computational timing with higher accuracy when compared with the methods in [<xref ref-type="bibr" rid="ref-19">19</xref>,<xref ref-type="bibr" rid="ref-27">27</xref>]. These measures were observed at the dataset size of 5 GB.</p>
<p>The results obtained for multiclass classification are listed in <xref ref-type="table" rid="table-2">Tab. 2</xref>. This analysis was carried out on the basics of C &#x003D; 2 and &#x03B3; &#x003D; 0.09 for sputum datasets, C &#x003D; 2 and &#x03B3; &#x003D; 2 for sputum datasets, C &#x003D; 2 and &#x03B3; &#x003D; 0.09 for lung cancer datasets, and C &#x003D; 2 and &#x03B3; &#x003D; 2 for lung cancer datasets. As shown in <xref ref-type="table" rid="table-2">Tab. 2</xref>, the proposed methodology takes 10&#x2005;s and 91&#x0025; accuracy with C &#x003D; 2 and &#x03B3; &#x003D; 0.09 for sputum datasets while 80&#x2005;s and 91.4&#x0025; for C &#x003D; 2 and &#x03B3; &#x003D; 2 sputum datasets. It has 43&#x2005;s computation timing and 92.2&#x0025; accuracy for C &#x003D; 2 and &#x03B3; &#x003D; 0.09 lung cancer datasets while 47&#x2005;s and 92.4&#x0025; for C &#x003D; 2 and &#x03B3; &#x003D; 2 lung cancer datasets. The average time for every model was compared with accuracy metrics to show that our proposed method performs better. As shown in <?A3B2 "fig6",5,"anchor"?><xref ref-type="fig" rid="fig-6">Fig. 6</xref>, at the specified time 120&#x2005;s, the accuracy of P-SVM is higher than those of other existing models. Meanwhile, the accuracy of MMSM-SVM is higher than other existing works, as shown in <xref ref-type="fig" rid="fig-6">Fig. 6b</xref>. This analysis indicates that the proposed mythology takes lesser computational timing with increasing accuracy when compared with the methods in [<xref ref-type="bibr" rid="ref-19">19</xref>,<xref ref-type="bibr" rid="ref-27">27</xref>]. In addition, our dataset contains replicas of data to increase the dataset size.</p>
<p>The execution time and accuracy of our model analysis for 100&#x2013;1000 mb samples are listed in <xref ref-type="table" rid="table-1">Tabs. 1</xref> and <xref ref-type="table" rid="table-2">2</xref>, and graphs for the corresponding plots are shown in <xref ref-type="fig" rid="fig-6">Fig. 6</xref>.</p>
<p>For the five nodes and above parameter settings in <xref ref-type="table" rid="table-2">Tab. 2</xref>, the average time computation and accuracy for the corresponding time were measured and compared with existing models.</p>
<p><xref ref-type="fig" rid="fig-5">Figs. 5a</xref>&#x2013;<xref ref-type="fig" rid="fig-5">5c</xref> depict the performance analysis of the sputum, lung cancer, and thoracic surgery datasets obtained for the proposed methodology. From the perspective of the results in <xref ref-type="fig" rid="fig-4">Figs. 4</xref>&#x2013;<xref ref-type="fig" rid="fig-6">6</xref>, the accuracy improved to 92.2&#x0025; and stabilized for varying iterations. Then, we increased the number of nodes and analyzed the performance. <xref ref-type="fig" rid="fig-6">Fig. 6</xref> shows the accuracy analysis for the binary and multiclass classification. As shown in <xref ref-type="fig" rid="fig-6">Fig. 6a</xref>, the accuracy of P-SVM is higher than those of MapReduce and Cascade SVM. The accuracy measures are 3&#x0025; higher than those of MapReduce and 8&#x0025; higher than those of Cascade SVM. As illustrated in <xref ref-type="fig" rid="fig-6">Fig. 6b</xref>, the accuracy of MMSM-SVM is higher than those of MapReduce and Cascade SVM. The above figure shows that the running time for each node is 120&#x2005;s on average, which increases with increasing dataset size. For five nodes, it would become 300&#x2013;380&#x2005;s for five nodes. Similarly, task analysis was obtained from below graph for about 5 GB dataset. Hence, we increased the dataset size from 2 to 5 GB, and metrics outcomes deviate for each dataset size that has been discussed so far.</p>
<fig id="fig-5"><label>Figure 5</label><caption><title>(a) Performance analysis for sputum datasets (b) Performance analysis of lung cancer datasets (c) Performance analysis of thoracic surgery datasets</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_19323-fig-5.png"/></fig>
<fig id="fig-6"><label>Figure 6</label><caption><title>Accuracy analysis (a) Binary classification (b) Multiclass classification</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_19323-fig-6.png"/></fig>
<p>The number of tasks analyzed for 5 GB data shows that it performs the optimal number of tasks for the corresponding dataset size. That is, it yields only 75 tasks for 5 GB data.</p>
<p>No. of tasks shared between five nodes (number of nodes required and allotted is discussed in Section 3) are as each node-4 cores, 15 tasks/node. In accordance with MapReduce and other tasks, optimized performance includes 20 tasks for 2 GB data and increased data set size as in <?A3B2 "tbl3",5,"anchor"?><xref ref-type="table" rid="table-3">Tab. 3</xref>. We achieved this optimization with respect to all jobs, specifically for MapReduce jobs. The graph plots are illustrated in <?A3B2 "fig7",5,"anchor"?><xref ref-type="fig" rid="fig-7">Fig. 7</xref>. Furthermore AUC values were computed by measuring the specificity and sensitivity of various algorithms, as shown in <?A3B2 "fig9",5,"anchor"?><xref ref-type="fig" rid="fig-9">Fig. 9</xref>. The corresponding values are listed in <?A3B2 "tbl4",5,"anchor"?><xref ref-type="table" rid="table-4">Tab. 4</xref>.</p>
<table-wrap id="table-3"><label>Table 3</label><caption><title>Task analysis</title></caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">No. of tasks</th>
<th align="left">Dataset size (all datasets in GB)</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">20</td>
<td align="left">2</td>
</tr>
<tr>
<td align="left">40</td>
<td align="left">3</td>
</tr>
<tr>
<td align="left">60</td>
<td align="left">4</td>
</tr>
<tr>
<td align="left">75</td>
<td align="left">5</td>
</tr>
</tbody>
</table>
</table-wrap>
<fig id="fig-7"><label>Figure 7</label><caption><title>Task analysis</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_19323-fig-7.png"/></fig>
<table-wrap id="table-4"><label>Table 4</label><caption><title>AUC values</title></caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">T-BMSVM</th>
<th align="left">P-SVM</th>
<th align="left">MMSM-SVM</th>
<th align="left">Parallel RMC</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">0.88</td>
<td align="left">0.9</td>
<td align="left">0.92</td>
<td align="left">0.91</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>From the resource utilization, our algorithms and dataset should achieve better CPU utilization. In our study, we achieved about 70&#x0025;&#x2013;75&#x0025; CPU utilization in an average of all algorithms. <xref ref-type="fig" rid="fig-9">Fig. 9</xref> illustrates the varying measures of balanced datasets in all our proposed algorithm. In specific, datasets utilize 74&#x0025; in existing works compared with our proposed method. Even though all mechanisms work well in all metrics, we prove that our datasets work dynamically with respect to every algorithm. CPU utilization plots are illustrated in <xref ref-type="fig" rid="fig-8">Fig. 8</xref>.</p>
<fig id="fig-8"><label>Figure 8</label><caption><title>CPU utilization</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_19323-fig-8.png"/></fig>
<fig id="fig-9"><label>Figure 9</label><caption><title>AUC scores for various methods</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_19323-fig-9.png"/></fig>
<p>AUC values for the proposed models are specified in <xref ref-type="table" rid="table-4">Tab. 4</xref>.</p>
<p>Sensitivity and specificity of all the classes in every model are discussed in <?A3B2 "tbl5",5,"anchor"?><xref ref-type="table" rid="table-5">Tab. 5</xref>, from which AUC values were calculated.</p>
<table-wrap id="table-5"><label>Table 5</label><caption><title>Specificity and sensitivity values</title></caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left" rowspan="2">Methods</th>
<th align="left" colspan="3">Specificity (&#x0025;)</th>
<th align="left" colspan="3">Sensitivity (&#x0025;)</th>
</tr>
<tr>
<th align="left">Class 1</th>
<th align="left">Class 2</th>
<th align="left">Class 3</th>
<th align="left">Class 1</th>
<th align="left">Class 2</th>
<th align="left">Class 3</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Parallel RMC</td>
<td align="left">58</td>
<td align="left">63</td>
<td align="left">60</td>
<td align="left">90</td>
<td align="left">91</td>
<td align="left">85</td>
</tr>
<tr>
<td align="left">P-SVM&#x2217;</td>
<td align="left">75.2</td>
<td align="left">80</td>
<td align="left">89</td>
<td align="left">89</td>
<td align="left">88</td>
<td align="left">92</td>
</tr>
<tr>
<td align="left">MMSM-SVM&#x2217;</td>
<td align="left">89.8</td>
<td align="left">77</td>
<td align="left">90</td>
<td align="left">90.48</td>
<td align="left">92.63</td>
<td align="left">89</td>
</tr>
<tr>
<td align="left">MapReduce based</td>
<td align="left">82</td>
<td align="left">90</td>
<td align="left">89</td>
<td align="left">86.8</td>
<td align="left">92.16</td>
<td align="left">95</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The accuracy of the proposed method is 2.3&#x0025; higher than that of MapReduce and 7.2&#x0025; higher than that of Cascade SVM. The results prove that the prediction efficiency using the proposed algorithm is greater than that using the MapReduce-based adjoint [<xref ref-type="bibr" rid="ref-19">19</xref>] and Cascade SVM [<xref ref-type="bibr" rid="ref-27">27</xref>]. In some plots, the parallel RMC proposed by [<xref ref-type="bibr" rid="ref-26">26</xref>] has been compared and proves the efficiency of the proposed models for some parameters.</p>
</sec>
<sec id="s5"><label>5</label><title>Conclusion</title>
<p>P-SVM and MMSM-SVM were proposed to analyze the optimal classification of diseases, such as lung cancer. The proposed models for binary and multiclass classifications outperform other methodologies. For binary classification, P-SVM deployed and retrieved the stages by using the MapReduce phase. Meanwhile, for multiclass classification, MMSM-SVM retrieved the results with improved accuracy. Using KCB-SVM, datasets split regarding likely samples in a cluster so that the training phase is easier to do and works well in nonlinear dimensions. In addition, the proposed solution approximates better accuracy without repeated training and testing, which enables the model to use the classification and storage capacity. For load balancing, the model uses HDFS balancer. The approach enrolls multiclass with the winner-takes-all strategy. Results show that the support vectors and training time with a large set of data sets scrutinize binary and multiclass classification with optimized parameter settings. In addition, the proposed method shows an accuracy of 90&#x0025; in classification when compared with competitive methodologies. Our work could diagnose the stages earliest. Thus, the proposed method can be applied to predict other healthcare-related issues, such as COVID-19, by collecting symptoms of patients from electronic health records. Our study can prevent COVID-19 by collecting health conditions of in-patients who treated for other diseases and predict the possibility of COVID-19.</p>
</sec>
</body>
<back>
<ack>
<p>This study is supported by the Tamil Nadu State Council of Science and Technology. The authors thank the government for their financial assistance and valuable support.</p>
</ack>
<fn-group>
<fn fn-type="other"><p><bold>Funding Statement:</bold> This study is supported by the Tamil Nadu State Council of Science and Technology.</p></fn>
<fn fn-type="conflict"><p><bold>Conflicts of Interest:</bold> The authors declare that they have no conflicts of interest to report regarding the present study.</p></fn>
</fn-group>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>R. </given-names> <surname>Lin</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Xie</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Hao</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Wang</surname></string-name></person-group>, &#x201C;<article-title>Improving high-tech enterprise innovation in big data environment: A combinative view of internal and external governance</article-title>,&#x201D; <source>International Journal of Information Management</source>, vol. <volume>50</volume>, pp. <fpage>575</fpage>&#x2013;<lpage>585</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>D.</given-names> <surname>Fanelli</surname></string-name> and <string-name><given-names>F.</given-names> <surname>Piazza</surname></string-name></person-group>, &#x201C;<article-title>Analysis and forecast of COVID-19 spreading in China, Italy and France</article-title>,&#x201D; <source>Chaos, Solitons &#x0026; Fractals</source>, vol. <volume>134</volume>, pp. <fpage>1</fpage>&#x2013;<lpage>6</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>D.</given-names> <surname>Arora</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Ritu</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Ajeet</surname></string-name></person-group>, &#x201C;<article-title>System biology approach to identify potential receptor for targeting cancer and biomolecular interaction studies of indole [2, 1-a] isoquinoline derivative as anticancerous drug candidate against it</article-title>,&#x201D; <source>Interdisciplinary Sciences: Computational Life Sciences</source>, vol. <volume>11</volume>, no. <issue>1</issue>, pp. <fpage>125</fpage>&#x2013;<lpage>134</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>C. M.</given-names> <surname>Rice</surname></string-name>, <string-name><given-names>S. A.</given-names> <surname>Renowden</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Urankar</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Love</surname></string-name> and <string-name><given-names>N. J.</given-names> <surname>Scolding</surname></string-name></person-group>, &#x201C;<article-title>Brain biopsy before or after treatment with corticosteroids?</article-title>,&#x201D; <source>Neuroradiology</source>, vol. <volume>62</volume>, no. <issue>5</issue>, pp. <fpage>545</fpage>&#x2013;<lpage>546</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>K.</given-names> <surname>Bandagar</surname></string-name>, <string-name><given-names>P.</given-names> <surname>Sowkuntla</surname></string-name>, <string-name><given-names>S. A.</given-names> <surname>Moiz</surname></string-name> and <string-name><given-names>P. S.</given-names> <surname>Prasad</surname></string-name></person-group>, &#x201C;<article-title>MR_IMQRA: An efficient mapreduce based approach for fuzzy decision reduct computation</article-title>,&#x201D; in <conf-name>Proc. PReMI</conf-name>, <conf-loc>Tezpur, India</conf-loc>, pp. <fpage>306</fpage>&#x2013;<lpage>316</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S. H.</given-names> <surname>Ebenuwa</surname></string-name>, <string-name><given-names>M. S.</given-names> <surname>Sharif</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Alazab</surname></string-name> and <string-name><given-names>A.</given-names> <surname>Al-Nemrat</surname></string-name></person-group>, &#x201C;<article-title>Variance ranking attributes selection techniques for binary classification problem in imbalance data</article-title>,&#x201D; <source>IEEE Access</source>, vol. <volume>7</volume>, pp. <fpage>24649</fpage>&#x2013;<lpage>24666</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Zhao</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Xu</surname></string-name> and <string-name><given-names>H.</given-names> <surname>Fujita</surname></string-name></person-group>, &#x201C;<article-title>An improved non-parallel universum support vector machine and its safe sample screening rule</article-title>,&#x201D; <source>Knowledge-Based Systems</source>, vol. <volume>170</volume>, pp. <fpage>79</fpage>&#x2013;<lpage>88</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>V.</given-names> <surname>Abeykoon</surname></string-name>, <string-name><given-names>G.</given-names> <surname>Fox</surname></string-name> and <string-name><given-names>M.</given-names> <surname>Kim</surname></string-name></person-group>, &#x201C;<article-title>Performance optimization on model synchronization in parallel stochastic gradient descent based SVM</article-title>,&#x201D; in <conf-name>Proc. CCGRID</conf-name>, <conf-loc>Larnaca, Cyprus</conf-loc>, pp. <fpage>508</fpage>&#x2013;<lpage>517</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Yadav</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Prakash</surname></string-name></person-group>, &#x201C;<article-title>A survey on implementation of word-count with map reduce programming oriented model using hadoop framework</article-title>,&#x201D; in <conf-name>Proc. ICACSE</conf-name>, <conf-loc>Sultanpur, UP, India</conf-loc>, pp. <fpage>1</fpage>&#x2013;<lpage>5</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>C.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Wu</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Yang</surname></string-name> and <string-name><given-names>Z.</given-names> <surname>Guo</surname></string-name></person-group>, &#x201C;<article-title>Multiple submodels parallel support vector machine on spark</article-title>,&#x201D; in <conf-name>Proc. IEEE Big Data</conf-name>, <conf-loc>Washington, DC</conf-loc>, pp. <fpage>945</fpage>&#x2013;<lpage>950</lpage>, <year>2016</year>.</mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Neelakandan</surname></string-name> and <string-name><given-names>D.</given-names> <surname>Paulraj</surname></string-name></person-group>, &#x201C;<article-title>An automated exploring and learning model for data prediction using balanced CA-sVM</article-title>,&#x201D; <source>Ambient Intelligence and Humanized Computing</source>, pp. <fpage>1</fpage>&#x2013;<lpage>12</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Javid</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Hamidzadeh</surname></string-name></person-group>, &#x201C;<article-title>An active multi-class classification using privileged information and belief function</article-title>,&#x201D; <source>Machine Learning and Cybernetics</source>, vol. <volume>11</volume>, no. <issue>3</issue>, pp. <fpage>511</fpage>&#x2013;<lpage>524</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Grover</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Kalani</surname></string-name> and <string-name><given-names>S. K.</given-names> <surname>Dubey</surname></string-name></person-group>, &#x201C;<article-title>Analytical approach towards prediction of diseases using machine learning algorithms</article-title>,&#x201D; in <conf-name>Proc. CONFLUENCE</conf-name>, <conf-loc>Noida, UP, India</conf-loc>, pp. <fpage>793</fpage>&#x2013;<lpage>797</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>L.</given-names> <surname>Yahaya</surname></string-name>, <string-name><given-names>N. D.</given-names> <surname>Oye</surname></string-name> and <string-name><given-names>E. J.</given-names> <surname>Garba</surname></string-name></person-group>, &#x201C;<article-title>A comprehensive review on heart disease prediction using data mining and machine learning techniques</article-title>,&#x201D; <source>American Journal of Artificial Intelligence</source>, vol. <volume>4</volume>, no. <issue>1</issue>, pp. <fpage>20</fpage>&#x2013;<lpage>29</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>C. B.</given-names> <surname>Storlie</surname></string-name>, <string-name><given-names>T. M.</given-names> <surname>Therneau</surname></string-name>, <string-name><given-names>R. E.</given-names> <surname>Carter</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Chia</surname></string-name>, <string-name><given-names>J. R.</given-names> <surname>Bergquist</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>Prediction and inference with missing data in patient alert systems</article-title>,&#x201D; <source>Journal of the American Statistical Association</source>, vol. <volume>115</volume>, no. <issue>529</issue>, pp. <fpage>32</fpage>&#x2013;<lpage>46</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Y.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Lv</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Jin</surname></string-name> and <string-name><given-names>G.</given-names> <surname>Liang</surname></string-name></person-group>, &#x201C;<article-title>Revealing the mechanism of EGCG, genistein, rutin, qercetin, and silibinin against hIAPP aggregation via computational simulations</article-title>,&#x201D; <source>Interdisciplinary Sciences: Computational Life Sciences</source>, vol. <volume>12</volume>, no. <issue>1</issue>, pp. <fpage>59</fpage>&#x2013;<lpage>68</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>H.</given-names> <surname>Ozkan</surname></string-name>, <string-name><given-names>O.</given-names> <surname>Ozhan</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Karadana</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Gulcu</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Macit</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>A portable wearable tele-eCG monitoring system</article-title>,&#x201D; <source>IEEE Transactions on Instrumentation and Measurement</source>, vol. <volume>69</volume>, no. <issue>1</issue>, pp. <fpage>173</fpage>&#x2013;<lpage>182</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>J. P.</given-names> <surname>Verma</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Patel</surname></string-name> and <string-name><given-names>A.</given-names> <surname>Patel</surname></string-name></person-group>, &#x201C;<article-title>Big data analysis: Recommendation system with Hadoop framework</article-title>,&#x201D; in <conf-name>Proc. CICT</conf-name>, <conf-loc>Ghaziabad, India, pp. 92&#x2013;97</conf-loc>, <year>2015</year>.</mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Zettam</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Laassiri</surname></string-name> and <string-name><given-names>N.</given-names> <surname>Enneya</surname></string-name></person-group>, &#x201C;<article-title>A mapreduce-based adjoint method for preventing brain disease</article-title>,&#x201D; <source>Journal of Big Data</source>, vol. <volume>5</volume>, no. <issue>27</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>7</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>O.</given-names> <surname>Kramer</surname></string-name></person-group>, &#x201C;<article-title>Cascade support vector machines with dimensionality reduction</article-title>,&#x201D; <source>Applied Computational Intelligence and Soft Computing</source>, vol. <volume>2015</volume>, pp. <fpage>1</fpage>&#x2013;<lpage>9</lpage>, <year>2015</year>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Purwar</surname></string-name> and <string-name><given-names>S. K.</given-names> <surname>Singh</surname></string-name></person-group>, &#x201C;<article-title>Hybrid prediction model with missing value imputation for medical data</article-title>,&#x201D; <source>Expert Systems with Applications</source>, vol. <volume>42</volume>, no. <issue>13</issue>, pp. <fpage>5621</fpage>&#x2013;<lpage>5631</lpage>, <year>2015</year>.</mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>M. M.</given-names> <surname>Mishu</surname></string-name></person-group>, &#x201C;<article-title>A patient oriented framework using big data &#x0026; C-means clustering for biomedical engineering applications</article-title>,&#x201D; in <conf-name>Proc. ICREST</conf-name>, <conf-loc>Dhaka, Bangladesh</conf-loc>, pp. <fpage>113</fpage>&#x2013;<lpage>115</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-23"><label>[23]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>T. J.</given-names> <surname>Mathew</surname></string-name> and <string-name><given-names>E.</given-names> <surname>Sherly</surname></string-name></person-group>, &#x201C;<article-title>Analysis of supervised learning techniques for cost effective disease prediction using non-clinical parameters</article-title>,&#x201D; in <conf-name>Proc. IC4</conf-name>, <conf-loc>Thiruvananthapuram, India, pp.</conf-loc> <fpage>356</fpage><conf-loc>&#x2013;</conf-loc><lpage>360</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-24"><label>[24]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>R.</given-names> <surname>Sitharthan</surname></string-name>, <string-name><given-names>T.</given-names> <surname>Parthasarathy</surname></string-name>, <string-name><given-names>S. S.</given-names> <surname>Rani</surname></string-name> and <string-name><given-names>K. C.</given-names> <surname>Ramya</surname></string-name></person-group>, &#x201C;<article-title>An improved radial basis function neural network control strategy-based maximum power point tracking controller for wind power generation system</article-title>,&#x201D; <source>Transactions of the Institute of Measurement and Control</source>, vol. <volume>41</volume>, no. <issue>11</issue>, pp. <fpage>3158</fpage>&#x2013;<lpage>3170</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-25"><label>[25]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>T. A.</given-names> <surname>Naqishbandi</surname></string-name> and <string-name><given-names>N.</given-names> <surname>Ayyanathan</surname></string-name></person-group>, &#x201C;<article-title>Clinical big data predictive analytics transforming healthcare:an integrated framework for promise towards value-based healthcare</article-title>,&#x201D; in <conf-name>Proc. ICETE</conf-name>, <conf-loc>Hyderabad, India</conf-loc>, pp. <fpage>545</fpage>&#x2013;<lpage>561</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-26"><label>[26]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Z.</given-names> <surname>Qi</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Tian</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Shi</surname></string-name> and <string-name><given-names>V.</given-names> <surname>Alexandrov</surname></string-name></person-group>, &#x201C;<article-title>Parallel RMCLP classification algorithm and its application on the medical data</article-title>,&#x201D; <source>IEEE Transactions on Cloud Computing</source>, vol. <volume>8</volume>, no. <issue>2</issue>, pp. <fpage>532</fpage>&#x2013;<lpage>538</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-27"><label>[27]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>G.</given-names> <surname>Jaya Brindha</surname></string-name> and <string-name><given-names>E. S. G.</given-names> <surname>Subbu</surname></string-name></person-group>, &#x201C;<article-title>Ant colony technique for optimizing the order of cascaded SVM classifier for sunflower seed classification</article-title>,&#x201D; <source>IEEE Transactions on Emerging Topics in Computational Intelligence</source>, vol. <volume>2</volume>, no. <issue>1</issue>, pp. <fpage>78</fpage>&#x2013;<lpage>88</lpage>, <year>2018</year>.</mixed-citation></ref>
</ref-list>
</back>
</article>