<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">IASC</journal-id>
<journal-id journal-id-type="nlm-ta">IASC</journal-id>
<journal-id journal-id-type="publisher-id">IASC</journal-id>
<journal-title-group>
<journal-title>Intelligent Automation &#x0026; Soft Computing</journal-title>
</journal-title-group>
<issn pub-type="epub">2326-005X</issn><issn pub-type="ppub">1079-8587</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">17154</article-id>
<article-id pub-id-type="doi">10.32604/iasc.2021.017154</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Blockchain-Based Decision Tree Classification in Distributed Networks</article-title><alt-title alt-title-type="left-running-head">Blockchain-Based Decision Tree Classification in Distributed Networks</alt-title><alt-title alt-title-type="right-running-head">Blockchain-Based Decision Tree Classification in Distributed Networks</alt-title>
</title-group>
<contrib-group content-type="authors">
<contrib id="author-1" contrib-type="author">
<name name-style="western">
<surname>Yu</surname>
<given-names>Jianping</given-names>
</name>
<xref ref-type="aff" rid="aff-1">1</xref>
<xref ref-type="aff" rid="aff-2">2</xref>
<xref ref-type="aff" rid="aff-3">3</xref>
</contrib>
<contrib id="author-2" contrib-type="author">
<name name-style="western">
<surname>Qiao</surname>
<given-names>Zhuqing</given-names>
</name>
<xref ref-type="aff" rid="aff-1">1</xref>
</contrib>
<contrib id="author-3" contrib-type="author" corresp="yes">
<name name-style="western">
<surname>Tang</surname>
<given-names>Wensheng</given-names>
</name>
<xref ref-type="aff" rid="aff-1">1</xref>
<xref ref-type="aff" rid="aff-2">2</xref>
<xref ref-type="aff" rid="aff-3">3</xref>
<email>tangws@hunnu.edu.cn</email>
</contrib>
<contrib id="author-4" contrib-type="author">
<name name-style="western">
<surname>Wang</surname>
<given-names>Danni</given-names>
</name>
<xref ref-type="aff" rid="aff-1">1</xref>
</contrib>
<contrib id="author-5" contrib-type="author">
<name name-style="western">
<surname>Cao</surname>
<given-names>Xiaojun</given-names>
</name>
<xref ref-type="aff" rid="aff-4">4</xref>
</contrib>
<aff id="aff-1">
<label>1</label><institution>College of Information Science and Engineering, Hunan Normal University</institution>, <addr-line>Changsha, 410081</addr-line>, <country>P.R. China</country></aff>
<aff id="aff-2">
<label>2</label><institution>Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing, Hunan Normal University</institution>, <addr-line>Changsha, 410081</addr-line>, <country>P.R. China</country></aff>
<aff id="aff-3">
<label>3</label><institution>Hunan Xiangjiang Artificial Intelligence Academy</institution>, <addr-line>Changsha, 410000</addr-line>, <country>P.R. China</country></aff>
<aff id="aff-4">
<label>4</label><institution>Department of Computer Science, Georgia State University</institution>, <addr-line>Atlanta, 30303</addr-line>, <country>USA</country></aff>
</contrib-group><author-notes><corresp id="cor1">&#x002A;Corresponding Author: Wensheng Tang. Email: <email>tangws@hunnu.edu.cn</email></corresp></author-notes>
<pub-date pub-type="epub" date-type="pub" iso-8601-date="2021-06-17">
<day>17</day>
<month>6</month>
<year>2021</year>
</pub-date>
<volume>29</volume>
<issue>3</issue>
<fpage>713</fpage>
<lpage>728</lpage>
<history>
<date date-type="received">
<day>22</day>
<month>1</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>11</day>
<month>3</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2021 Yu et al.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Yu et al.</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_IASC_17154.pdf"></self-uri>
<abstract>
<p>In a distributed system such as Internet of things, the data volume from each node may be limited. Such limited data volume may constrain the performance of the machine learning classification model. How to effectively improve the performance of the classification in a distributed system has been a challenging problem in the field of data mining. Sharing data in the distributed network can enlarge the training data volume and improve the machine learning classification model&#x2019;s accuracy. In this work, we take data sharing and the quality of shared data into consideration and propose an efficient Blockchain-based ID3 Decision Tree Classification (BIDTC) framework for distributed networks. The proposed BIDTC takes advantage of three techniques: blockchain-based ID3 decision tree, enhanced homomorphic encryption, and stimulation smart contract to conduct classification while effectively considering the data privacy and the value of user data. BIDTC employs the data federation scheme based on homomorphic encryption and blockchain to achieve more training data sharing without sacrificing data privacy. Meanwhile, smart contracts are integrated into BIDTC to incentivize users to share more high-quality data. Our extensive experiments have demonstrated that the proposed BIDTC significantly outperforms existing schemes in constructed consortium blockchain networks.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Blockchain</kwd>
<kwd>classification algorithm</kwd>
<kwd>decision tree</kwd>
<kwd>homomorphic encryption</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<label>1</label>
<title>Introduction</title>
<p>Much data is produced by social networks, engineering sciences, biomolecular research, commerce, and security logs [<xref ref-type="bibr" rid="ref-1">1</xref>]. To extract the information hidden in such big data, machine learning techniques such as statistical model estimation and predictive learning have emerged [<xref ref-type="bibr" rid="ref-2">2</xref>]. Classification is a critical supervised machine learning technique that can learn from the training data and label test data as different predefined classes [<xref ref-type="bibr" rid="ref-3">3</xref>]. Many classification algorithms such as Iterative Dichotomiser 3 (ID3), Support Vector Machine (SVM), and K-Nearest Neighbor (KNN) have been intensively studied [<xref ref-type="bibr" rid="ref-4">4</xref>,<xref ref-type="bibr" rid="ref-5">5</xref>]. Most of the existing classification schemes are based on centralized settings where a large training dataset is available in a single host. However, in a distributed computing system such as Internet of Things (IoT), the data is likely scattered around the system, which makes it difficult to have a large centralized dataset for training and classifications [<xref ref-type="bibr" rid="ref-6">6</xref>&#x2013;<xref ref-type="bibr" rid="ref-8">8</xref>]. For example, the work in Ang et al. [<xref ref-type="bibr" rid="ref-6">6</xref>] proposed the ensemble approach PINE to classify concepts of interest in a distributed computing system. PINE combines reactive adaptation, proactive handling of upcoming changes, and adaptation across peers to achieve better accuracy. A distributed classification algorithm (P2P-RVM) for the peer-to-peer networks was proposed in Khan et al. [<xref ref-type="bibr" rid="ref-7">7</xref>], which is based on the relevance vector machines. To solve the distributed multi-label classification problem, the work in Xu et al. [<xref ref-type="bibr" rid="ref-8">8</xref>] proposed a quantized distributed semi-supervised multi-label learning algorithm, where the kernel logistic regression function is used, and the common low-dimensional subspace shared by multiple labels is learned. The work in Vu et al. [<xref ref-type="bibr" rid="ref-9">9</xref>,<xref ref-type="bibr" rid="ref-10">10</xref>] tries to consider data privacy by use of encrypted traffic. Similarly, the flow-based relation network classification model RBRN was proposed in Zheng et al. [<xref ref-type="bibr" rid="ref-11">11</xref>] to overcome the imbalanced issues of encrypted traffic. However, in these existing approaches, either the data privacy or the value of the user data was not taken into consideration.</p>
<p>It is challenging to optimize the classification accuracy while effectively taking the data privacy and data value into consideration in a distributed system. As each user node in a distributed network system has a limited amount of data for model training, the classification accuracy may be limited due to the insufficient training data at the node. Data sharing among nodes can be employed to enlarge the training dataset and improve classification accuracy. However, such data sharing gives rise to data privacy leakage, which is of great importance for many security-sensitive IoT applications. In this work, we propose an efficient Blockchain-based ID3 Decision Tree Classification (BIDTC) framework to take data sharing and the quality of shared data into consideration during the classification process. The proposed BIDTC employs a blockchain-based distributed storage and fully homomorphic encryption scheme for data sharing among the distributed nodes. By adopting the blockchain-based data federation classification and the smart contract-based stimulation scheme, the proposed BIDTC allows an individual node to have an enlarged training dataset in the distributed environment. As the decision tree-based classification is widely adopted and requires a short training time for knowledge acquisition in various applications [<xref ref-type="bibr" rid="ref-12">12</xref>,<xref ref-type="bibr" rid="ref-13">13</xref>], the proposed BIDTC integrates the decision tree-based classification with the blockchain-based scheme.</p>
<p>The organization of the rest of the paper is as follows. The related literature is summarized in Section 2. Section 3 proposes a blockchain-based data sharing architecture for training the classification model. A blockchain-based ID3 decision tree classification algorithm for the distributed environment is presented in Section 4. Experimental evaluations and the analysis of the results are presented in Section 5. Finally, Section 6 concludes the paper.</p>
</sec>
<sec id="s2">
<label>2</label>
<title>Related Work</title>
<p>The related work is summarized in this section, which mainly includes the literature work in the decision tree-based classification, fully homomorphic encryption, and blockchain technologies.</p>
<sec id="s2_1">
<label>2.1</label>
<title>Decision Tree-based Classification</title>
<p>The decision tree technique is widely used in data analysis and prediction [<xref ref-type="bibr" rid="ref-14">14</xref>&#x2013;<xref ref-type="bibr" rid="ref-21">21</xref>]. For example, in [<xref ref-type="bibr" rid="ref-16">16</xref>], the C4.5 decision tree algorithm is applied to achieve precision marketing prediction. The C5.0 decision tree classifier is proposed in [<xref ref-type="bibr" rid="ref-17">17</xref>] for the general and Medical dataset, in which the Gain calculation function is modified by adopting the Tsallis entropy function. A service decision tree-based post-pruning prediction approach is proposed to classify the services into the corresponding reliability level after discretizing the continuous attribute of services in service-oriented computing [<xref ref-type="bibr" rid="ref-18">18</xref>]. The ID3 is one of the standard algorithms for the decision tree learning process, which calculates the entropy to select the condition attributes [<xref ref-type="bibr" rid="ref-19">19</xref>&#x2013;<xref ref-type="bibr" rid="ref-21">21</xref>].</p>
</sec>
<sec id="s2_2">
<label>2.2</label>
<title>Fully Homomorphic Encryption</title>
<p>Several privacy-involved machine learning classification has been proposed recently [<xref ref-type="bibr" rid="ref-22">22</xref>,<xref ref-type="bibr" rid="ref-23">23</xref>]. For example, fully homomorphic encryption (FHE) is proposed for classification without leaking user privacy, especially in the outsourcing scenarios of the distributed environment [<xref ref-type="bibr" rid="ref-24">24</xref>]. An ElGamal Elliptic Curve (EGEC) Homomorphic encryption scheme for safeguarding the confidentiality of data stored in a cloud is proposed in Vedara et al. [<xref ref-type="bibr" rid="ref-25">25</xref>]. In Ren et al. [<xref ref-type="bibr" rid="ref-26">26</xref>], a practical homomorphic encryption scheme is proposed to allow the IoT systems to operate encrypted data. A privacy-preserving distributed analytics framework is presented for big data in the cloud by using the FHE cryptosystem [<xref ref-type="bibr" rid="ref-27">27</xref>]. In order to reduce the excessive interactions and ciphertext transformation, the work in Smart et al. [<xref ref-type="bibr" rid="ref-28">28</xref>] proposed the SIMD to improve the efficiency of homomorphic operations by encrypting multiple small plaintexts into a ciphertext. In [<xref ref-type="bibr" rid="ref-29">29</xref>], a private decision tree classification algorithm with SIMD-based fully homomorphic encryption is proposed.</p>
</sec>
<sec id="s2_3">
<label>2.3</label>
<title>Blockchain</title>
<p>The blockchain is a distributed ledger database and has attracted much recent attention in the academic community [<xref ref-type="bibr" rid="ref-30">30</xref>]. The blockchain paradigm takes advantage of key technologies such as peer-to-peer networking, the distributed ledger, the consensus mechanism, and the smart contracts, which has many applications in fields such as Internet of Things (IoT), finance, and manufacture [<xref ref-type="bibr" rid="ref-31">31</xref>]. In Wang et al. [<xref ref-type="bibr" rid="ref-32">32</xref>], a blockchain-powered parallel healthcare system (PHS) framework is proposed to support comprehensive healthcare data sharing and care auditability. A blockchain-based framework for supply chain provenance is proposed in Cui et al. [<xref ref-type="bibr" rid="ref-33">33</xref>], and the analysis for this framework is performed to ensure its security and reliability. A theoretical framework for trust in IoT scenarios and the blockchain-based trust provision system are investigated in Bordel et al. [<xref ref-type="bibr" rid="ref-34">34</xref>]. The blockchain technique is deployed to create a secure and reliable data exchange platform across multiple data providers in Nguyen et al. [<xref ref-type="bibr" rid="ref-35">35</xref>]. In Wang et al. [<xref ref-type="bibr" rid="ref-36">36</xref>], a blockchain-based data secure storage mechanism for sensor networks is proposed. The blockchain-based privacy-aware content caching in cognitive Internet of vehicles is presented in Qian et al. [<xref ref-type="bibr" rid="ref-37">37</xref>], in which the privacy protection and secure content transaction are examined.</p>
</sec>
</sec>
<sec id="s3">
<label>3</label>
<title>Data Sharing for Classification</title>
<p>The dataset owned by a single node in a distributed system is usually limited and insufficient for training a classification model with high accuracy. In order to improve the classification accuracy, data sharing among nodes is needed. In addition, both the value and the privacy of the shared data are of great importance in the applications such as healthcare and finance. To jointly take the data sharing, data privacy, and the value of data into consideration, we propose a blockchain-based data sharing architecture for classification, as shown in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>.</p>
<fig id="fig-1">
<label>Figure 1</label>
<caption>
<title>The architecture of the blockchain-based data sharing</title>
</caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="IASC_17154-fig-1.png"/>
</fig>
<p>There are double chains and different types of nodes in the proposed data sharing architecture. As shown in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>, a node in the blockchain network can be a data provider, data requestor, storage server, or ledgering node. The data providers in the blockchain network can share valuable data with encryption throughout the whole network. The sharing procedure will be recorded by the ledgering node and finally be written into the corresponding blockchain. If one of the data requestors demands more training datasets to improve the classification accuracy, it can send the request message to a storage server in the blockchain network. As a result, better performance of classification can be achieved by data requestors, and the financial profits can be obtained by data providers when the predefined blockchain-based smart contracts are executed, as shown in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>.</p>
<sec id="s3_1">
<label>3.1</label>
<title>Double Blockchains</title>
<p>In the proposed blockchain-based data sharing architecture, the consortium chain is employed to store and share the training datasets among multiple nodes in the blockchain network. The data in the consortium chain is mainly from several related nodes such as institutions or companies [<xref ref-type="bibr" rid="ref-38">38</xref>]. In <xref ref-type="fig" rid="fig-1">Fig. 1</xref>, we propose double blockchains according to the various transactions in the system. One chain for Transaction I is used to store the block data and share the encrypted data by data providers. The other chain is for Transaction II, which is used to store the block data for improving the classification performance by enlarging the volume of the related training dataset. The chain with Transaction II enables some nodes to make financial profits through the blockchain-based pre-negotiated smart contracts between the data providers and the data requestors.</p>
</sec>
<sec id="s3_2">
<label>3.2</label>
<title>Roles of Node</title>
<p>As shown in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>, every node in the consortium blockchain network has one or multiple roles: data provider, data requestor, storage server, or ledgering node. The data provider needs to encrypt the plaintext data <italic>M</italic> to generate ciphertext data <italic>C</italic>, then upload the ciphertext file and the corresponding encryption algorithm to a data storage server. At the same time, the data provider can obtain the download address of the file and calculate the hash value of ciphertext data to verify the data integrity. The access policies for the uploaded data can be defined by data providers. The data owned by data providers can be packed as a transaction and added to a blockchain (after the confirmation by ledgering nodes in the focused consortium blockchain network). Note that the storage server is not a physical centralized storage node/device. It can be a virtual/logic node like cloud-based storage existing in the consortium blockchain network.</p>
<p>The data requestors can issue a request to the ledgering nodes for some shared data. The ledgering nodes verify the different identities of access policies corresponding to the requested data. Once approved, the data requestors can download the requested encrypted data from the storage servers and train the classification models on the federated training datasets. In the meantime, the smart contracts for the transactions associated with data sharing between the data requestors and the data providers can be executed automatically.</p>
</sec>
<sec id="s3_3">
<label>3.3</label>
<title>Data Storage and Sharing</title>
<p>Each node that has valuable data can obtain some rewards from data sharing. The implementation process requires two phases associated with the two chains of the blockchain network. In phase I, the data providers share their valuable encrypted data to the storage servers. Such sharing is recorded and validated by the ledgering nodes running the consensus algorithm. The data requestors can then issue requests for specific shared data and receive the shared data along with the encryption algorithm after authentication. In phase II, the data requestors encrypt their local data using the obtained encryption algorithm and federate the obtained encrypted training data with their local encrypted data, then train the classification models on the newly federated training data. Correspondingly, the data requestors will pay the predetermined electronic currency to the data providers according to the blockchain-based smart contracts.</p>
</sec>
</sec>
<sec id="s4">
<label>4</label>
<title>Blockchain-based Improved ID3 Decision Tree Classification</title>
<p>In this section, we present a new Blockchain-based ID3 Decision Tree Classification (BIDTC) framework for the blockchain-based data sharing architecture. The proposed BIDTC takes into account the relation between the current condition attributes, the other condition attributes in the learning process, and the stimulation mechanism in smart contracts.</p>
<sec id="s4_1">
<label>4.1</label>
<title>An Improved ID3 Decision Tree Classification</title>
<p>The original ID3 classification algorithm only takes the current condition attributes and decision attributes into consideration during the process of calculating the gain. Here, we present an improved ID3 algorithm to take advantage of all the attributes from the system that includes the relationship between the current condition attributes and the other condition attributes. In specific, we denote <inline-formula id="ieqn-1">
<!--<alternatives><inline-graphic xlink:href="ieqn-1.tif"/><tex-math id="tex-ieqn-1"><![CDATA[A]]></tex-math>--><mml:math id="mml-ieqn-1"><mml:mi>A</mml:mi></mml:math>
<!--</alternatives>--></inline-formula> &#x003D;<inline-formula id="ieqn-2">
<!--<alternatives><inline-graphic xlink:href="ieqn-2.tif"/><tex-math id="tex-ieqn-2"><![CDATA[\; ({A_1},{A_2}, \ldots ,{A_N})]]></tex-math>--><mml:math id="mml-ieqn-2"><mml:mspace width="thickmathspace"></mml:mspace><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>N</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:math>
<!--</alternatives>--></inline-formula> as a set of <italic>N</italic> conditions attributes with values of <inline-formula id="ieqn-3">
<!--<alternatives><inline-graphic xlink:href="ieqn-3.tif"/><tex-math id="tex-ieqn-3"><![CDATA[({R_1},{R_2}, \ldots ,{R_N})]]></tex-math>--><mml:math id="mml-ieqn-3"><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mi>N</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:math>
<!--</alternatives>--></inline-formula>, respectively. Assuming that the occurrence of attribute <inline-formula id="ieqn-4">
<!--<alternatives><inline-graphic xlink:href="ieqn-4.tif"/><tex-math id="tex-ieqn-4"><![CDATA[{A_i}]]></tex-math>--><mml:math id="mml-ieqn-4"><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula>(<italic>i</italic> &#x003D; 1, 2,<inline-formula id="ieqn-5">
<!--<alternatives><inline-graphic xlink:href="ieqn-5.tif"/><tex-math id="tex-ieqn-5"><![CDATA[\; \ldots ,]]></tex-math>--><mml:math id="mml-ieqn-5"><mml:mspace width="thickmathspace"></mml:mspace><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo></mml:math>
<!--</alternatives>--></inline-formula> <italic>N</italic>) is <inline-formula id="ieqn-6">
<!--<alternatives><inline-graphic xlink:href="ieqn-6.tif"/><tex-math id="tex-ieqn-6"><![CDATA[{N_i}]]></tex-math>--><mml:math id="mml-ieqn-6"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula>, the frequency of <inline-formula id="ieqn-7">
<!--<alternatives><inline-graphic xlink:href="ieqn-7.tif"/><tex-math id="tex-ieqn-7"><![CDATA[{A_i}]]></tex-math>--><mml:math id="mml-ieqn-7"><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula> can be defined as below.</p>
<p><disp-formula id="eqn-1">
<label>(1)</label>
<!--<alternatives>
<graphic mimetype="image" mime-subtype="png" xlink:href="eqn-1.png"/><tex-math id="tex-eqn-1"><![CDATA[F({A_i}) = \displaystyle{{{N_i}} \over N}]]></tex-math>--><mml:math id="mml-eqn-1" display="block"><mml:mi>F</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>&#x003D;</mml:mo><mml:mstyle scriptlevel="0" displaystyle="true"><mml:mrow><mml:mfrac><mml:mrow><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mi>N</mml:mi></mml:mfrac></mml:mrow></mml:mstyle></mml:math>
<!--</alternatives>--></disp-formula></p>
<p>Then the weight of the attribute <inline-formula id="ieqn-8">
<!--<alternatives><inline-graphic xlink:href="ieqn-8.tif"/><tex-math id="tex-ieqn-8"><![CDATA[{A_i}]]></tex-math>--><mml:math id="mml-ieqn-8"><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula> can be calculated as <xref ref-type="disp-formula" rid="eqn-2">Eq. (2)</xref>.</p>
<p><disp-formula id="eqn-2">
<label>(2)</label>
<!--<alternatives>
<graphic mimetype="image" mime-subtype="png" xlink:href="eqn-2.png"/><tex-math id="tex-eqn-2"><![CDATA[W{A_i} = \displaystyle{{F({A_i})} \over {\mathop \sum \nolimits_{i = 1}^N F({A_i})}}]]></tex-math>--><mml:math id="mml-eqn-2" display="block"><mml:mi>W</mml:mi><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>&#x003D;</mml:mo><mml:mstyle scriptlevel="0" displaystyle="true"><mml:mrow><mml:mfrac><mml:mrow><mml:mi>F</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:msubsup><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:msubsup><mml:mo>&#x2061;</mml:mo><mml:mi>F</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mfrac></mml:mrow></mml:mstyle></mml:math>
<!--</alternatives>--></disp-formula></p>
<p>Assume that &#x1EF6; is a decision attribute with <italic>M</italic> possible values <inline-formula id="ieqn-9">
<!--<alternatives><inline-graphic xlink:href="ieqn-9.tif"/><tex-math id="tex-ieqn-9"><![CDATA[{R_{\rm Y} } = \rm{\left( {{{&#x1EF6;} _1},{&#x1EF6;} _2}, \ldots ,{&#x1EF6;} _M}} \right)}]]></tex-math>--><mml:math id="mml-ieqn-9"><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mi mathvariant="normal">Y</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>&#x003D;</mml:mo><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mo>&#x1EF6;</mml:mo><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mo>&#x1EF6;</mml:mo><mml:mn>2</mml:mn></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mo>&#x1EF6;</mml:mo><mml:mi mathvariant="normal">M</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula>, <inline-formula id="ieqn-10">
<!--<alternatives><inline-graphic xlink:href="ieqn-10.tif"/><tex-math id="tex-ieqn-10"><![CDATA[{A_i}]]></tex-math>--><mml:math id="mml-ieqn-10"><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula>(<italic>i</italic>&#x003D;1, 2,<inline-formula id="ieqn-11">
<!--<alternatives><inline-graphic xlink:href="ieqn-11.tif"/><tex-math id="tex-ieqn-11"><![CDATA[\; \ldots ,]]></tex-math>--><mml:math id="mml-ieqn-11"><mml:mspace width="thickmathspace"></mml:mspace><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo></mml:math>
<!--</alternatives>--></inline-formula> <italic>N</italic>) has <inline-formula id="ieqn-12">
<!--<alternatives><inline-graphic xlink:href="ieqn-12.tif"/><tex-math id="tex-ieqn-12"><![CDATA[{U_i}]]></tex-math>--><mml:math id="mml-ieqn-12"><mml:mrow><mml:msub><mml:mi>U</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula> possible values, and <inline-formula id="ieqn-13">
<!--<alternatives><inline-graphic xlink:href="ieqn-13.tif"/><tex-math id="tex-ieqn-13"><![CDATA[{R_i}]]></tex-math>--><mml:math id="mml-ieqn-13"><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula> is set as <inline-formula id="ieqn-14">
<!--<alternatives><inline-graphic xlink:href="ieqn-14.tif"/><tex-math id="tex-ieqn-14"><![CDATA[{R_{i \in \left\{ {1,2, \ldots ,N} \right\}}} = \left( {{a_1},{a_2}, \ldots ,{a_{{U_i}}}} \right)]]></tex-math>--><mml:math id="mml-ieqn-14"><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:mi>N</mml:mi></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow></mml:msub></mml:mrow><mml:mo>&#x003D;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>a</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>a</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>a</mml:mi><mml:mrow><mml:mrow><mml:msub><mml:mi>U</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:msub></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula>. Then, the relationship degree between the condition attribute <inline-formula id="ieqn-15">
<!--<alternatives><inline-graphic xlink:href="ieqn-15.tif"/><tex-math id="tex-ieqn-15"><![CDATA[{A_i}]]></tex-math>--><mml:math id="mml-ieqn-15"><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula> and the decision attribute &#x1EF6; can be defined as <xref ref-type="disp-formula" rid="eqn-3">Eq. (3)</xref>.</p>
<p><disp-formula id="eqn-3">
<label>(3)</label>
<!--<alternatives>
<graphic mimetype="image" mime-subtype="png" xlink:href="eqn-3.png"/><tex-math id="tex-eqn-3"><![CDATA[RD\left( {{A_i},&#x1EF6; } \right) = \displaystyle{{\mathop \sum \nolimits_{k = 1}^{{U_i}} \left| {\left| {{A_{kj}}} \right| - \mathop \sum \nolimits_{j = 2}^M \left| {{A_{kj}}} \right|} \right|} \over {{U_i}}}]]></tex-math>--><mml:math id="mml-eqn-3" display="block"><mml:mi>R</mml:mi><mml:mi>D</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mo>&#x1EF6;</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x003D;</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mrow><mml:msubsup><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi>U</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:msubsup><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow><mml:mo>|</mml:mo></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:msubsup><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mn>2</mml:mn></mml:mrow><mml:mi>M</mml:mi></mml:msubsup><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow><mml:mo>|</mml:mo></mml:mrow></mml:mrow><mml:mo>|</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi>U</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:mfrac></mml:mrow></mml:mstyle></mml:math>
<!--</alternatives>--></disp-formula></p>
<p>The <inline-formula id="ieqn-16">
<!--<alternatives><inline-graphic xlink:href="ieqn-16.tif"/><tex-math id="tex-ieqn-16"><![CDATA[\left| {{A_{kj}}} \right|]]></tex-math>--><mml:math id="mml-ieqn-16"><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow><mml:mo>|</mml:mo></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula> in <xref ref-type="disp-formula" rid="eqn-3">Eq. (3)</xref> is the number of instances that the <italic>k</italic>-th value of <inline-formula id="ieqn-17">
<!--<alternatives><inline-graphic xlink:href="ieqn-17.tif"/><tex-math id="tex-ieqn-17"><![CDATA[{A_i}]]></tex-math>--><mml:math id="mml-ieqn-17"><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula> belongs to the <italic>j</italic>-th class of decision attribute &#x1EF6;. According to <xref ref-type="disp-formula" rid="eqn-3">Eq. (3)</xref>, we can calculate the weighted degree as <xref ref-type="disp-formula" rid="eqn-4">Eq. (4)</xref>.</p>
<p><disp-formula id="eqn-4">
<label>(4)</label>
<!--<alternatives>
<graphic mimetype="image" mime-subtype="png" xlink:href="eqn-4.png"/><tex-math id="tex-eqn-4"><![CDATA[WRD\left( {{A_i},&#x1EF6; } \right) = \displaystyle{{RD\left( {{A_i},&#x1EF6; } \right)} \over {\mathop \sum \nolimits_{i = 1}^N RD\left( {{A_i},&#x1EF6;} \right)}}]]></tex-math>--><mml:math id="mml-eqn-4" display="block"><mml:mi>W</mml:mi><mml:mi>R</mml:mi><mml:mi>D</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mo>&#x1EF6;</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x003D;</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mrow><mml:mi>R</mml:mi><mml:mi>D</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mo>&#x1EF6;</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msubsup><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:msubsup><mml:mo>&#x2061;</mml:mo><mml:mi>R</mml:mi><mml:mi>D</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mo>&#x1EF6;</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:mrow></mml:mstyle></mml:math>
<!--</alternatives>--></disp-formula></p>
<p>Assuming that the training data samples are in <inline-formula id="ieqn-18">
<!--<alternatives><inline-graphic xlink:href="ieqn-18.tif"/><tex-math id="tex-ieqn-18"><![CDATA[S = \left\{ {\left( {{x_i},&#x1EF6; _i}} \right){\rm |}{x_i} \in {R_1}*{R_2}* \ldots *{R_N},\; {&#x1EF6;} _i} \in {R_{\rm \acute{y}} }} \right\}]]></tex-math>--><mml:math id="mml-ieqn-18"><mml:mi>S</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mo>&#x1EF6;</mml:mo><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mo>&#x2217;</mml:mo><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow><mml:mo>&#x2217;</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>&#x2217;</mml:mo><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mi>N</mml:mi></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mspace width="thickmathspace"></mml:mspace><mml:mrow><mml:msub><mml:mo>&#x1EF6;</mml:mo><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mrow><mml:mover><mml:mi mathvariant="normal">y</mml:mi><mml:mo>&#x00B4;</mml:mo></mml:mover></mml:mrow></mml:mrow></mml:msub></mml:mrow></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula>, where <inline-formula id="ieqn-19">
<!--<alternatives><inline-graphic xlink:href="ieqn-19.tif"/><tex-math id="tex-ieqn-19"><![CDATA[{x_i}]]></tex-math>--><mml:math id="mml-ieqn-19"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula> has a corresponding output class label <inline-formula id="ieqn-20">
<!--<alternatives><inline-graphic xlink:href="ieqn-20.tif"/><tex-math id="tex-ieqn-20"><![CDATA[{&#x1EF6;} _i}]]></tex-math>--><mml:math id="mml-ieqn-20"><mml:mrow><mml:msub><mml:mo>&#x1EF6;</mml:mo><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula>. Let <inline-formula id="ieqn-21">
<!--<alternatives><inline-graphic xlink:href="ieqn-21.tif"/><tex-math id="tex-ieqn-21"><![CDATA[{P_j}]]></tex-math>--><mml:math id="mml-ieqn-21"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula> be the percentage of training samples belonging to the class <italic>j</italic> of decision attribute <inline-formula id="ieqn-22">
<!--<alternatives><inline-graphic xlink:href="ieqn-22.tif"/><tex-math id="tex-ieqn-22"><![CDATA[&#x1EF6;]]></tex-math>--><mml:math id="mml-ieqn-22"><mml:mrow><mml:mo>&#x1EF6;</mml:mo></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula>. Then, the class involved entropy <inline-formula id="ieqn-23">
<!--<alternatives><inline-graphic xlink:href="ieqn-23.tif"/><tex-math id="tex-ieqn-23"><![CDATA[E\left( &#x1EF6; \right)]]></tex-math>--><mml:math id="mml-ieqn-23"><mml:mi>E</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mo>&#x1EF6;</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula> for the attribute <inline-formula id="ieqn-24">
<!--<alternatives><inline-graphic xlink:href="ieqn-24.tif"/><tex-math id="tex-ieqn-24"><![CDATA[&#x1EF6;]]></tex-math>--><mml:math id="mml-ieqn-24"><mml:mrow><mml:mo>&#x1EF6;</mml:mo></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula> is defined as follows.</p>
<p><disp-formula id="eqn-5">
<label>(5)</label>
<!--<alternatives>
<graphic mimetype="image" mime-subtype="png" xlink:href="eqn-5.png"/><tex-math id="tex-eqn-5"><![CDATA[E\left( &#x1EF6; \right) = - \mathop \sum \nolimits_{j = 1}^M {P_j}*{\log _2}{P_j}]]></tex-math>--><mml:math id="mml-eqn-5" display="block"><mml:mi>E</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mo>&#x1EF6;</mml:mo><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x003D;</mml:mo><mml:mo>&#x2212;</mml:mo><mml:msubsup><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>M</mml:mi></mml:msubsup><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mo>&#x2217;</mml:mo><mml:mrow><mml:msub><mml:mi>log</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:math>
<!--</alternatives>--></disp-formula></p>
<p>Similarly, the condition entropy <inline-formula id="ieqn-25">
<!--<alternatives><inline-graphic xlink:href="ieqn-25.tif"/><tex-math id="tex-ieqn-25"><![CDATA[E\left( &#x1EF6;{\rm |}{A_i}} \right)]]></tex-math>--><mml:math id="mml-ieqn-25"><mml:mi>E</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>&#x1EF6;</mml:mo><mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula> for each attribute <inline-formula id="ieqn-26">
<!--<alternatives><inline-graphic xlink:href="ieqn-26.tif"/><tex-math id="tex-ieqn-26"><![CDATA[{A_i}]]></tex-math>--><mml:math id="mml-ieqn-26"><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula> can be defined in <xref ref-type="disp-formula" rid="eqn-6">Eq. (6)</xref>.</p>
<p><disp-formula id="eqn-6">
<label>(6)</label>
<!--<alternatives>
<graphic mimetype="image" mime-subtype="png" xlink:href="eqn-6.png"/><tex-math id="tex-eqn-6"><![CDATA[E\left( &#x1EF6; {\rm |}{A_i}} \right) = \mathop \sum \nolimits_{k = 1}^{{U_i}} E\left( &#x1EF6; {\rm |}{a_k}} \right) = - \mathop \sum \nolimits_{k = 1}^{{U_i}} \left( {\mathop \sum \limits_{j = 1}^M {P_{kj}}*{{\log }_2}{P_{kj}}} \right)]]></tex-math>--><mml:math id="mml-eqn-6" display="block"><mml:mi>E</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>&#x1EF6;</mml:mo><mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x003D;</mml:mo><mml:msubsup><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi>U</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:msubsup><mml:mo>&#x2061;</mml:mo><mml:mi>E</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>&#x1EF6;</mml:mo><mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msub><mml:mi>a</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x003D;</mml:mo><mml:mo>&#x2212;</mml:mo><mml:msubsup><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi>U</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:msubsup><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:munderover><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>M</mml:mi></mml:munderover><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>&#x2217;</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>log</mml:mi></mml:mrow><mml:mn>2</mml:mn></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
<!--</alternatives>--></disp-formula></p>
<p>Therefore, the formula of calculating the information gain of the condition attribute <inline-formula id="ieqn-27">
<!--<alternatives><inline-graphic xlink:href="ieqn-27.tif"/><tex-math id="tex-ieqn-27"><![CDATA[{A_i}]]></tex-math>--><mml:math id="mml-ieqn-27"><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula> can be defined as follows.</p>
<p><disp-formula id="eqn-7">
<label>(7)</label>
<!--<alternatives>
<graphic mimetype="image" mime-subtype="png" xlink:href="eqn-7.png"/><tex-math id="tex-eqn-7"><![CDATA[Gain\left( &#x1EF6; {\rm |}{A_i}} \right) = E\left( &#x1EF6; \right) - E\left( &#x1EF6; {\rm |}{A_i}} \right)]]></tex-math>--><mml:math id="mml-eqn-7" display="block"><mml:mi>G</mml:mi><mml:mi>a</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mo>&#x1EF6;</mml:mo><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x003D;</mml:mo><mml:mi>E</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mo>&#x1EF6;</mml:mo><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mi>E</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mo>&#x1EF6;</mml:mo><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
<!--</alternatives>--></disp-formula></p>
<p>The ID3 decision tree algorithm starts with the dataset at the root node and recursively partitions the data into lower-level nodes based on the split criterion. Only nodes that contain multiple different classes need to be split further. Eventually, the decision tree-based algorithm stops the growth of the tree based on a certain stopping criterion. We can set two stopping criteria for the algorithm. The criterion I is whether all samples in the training dataset are labeled as a single class or not. Criterion II is whether the attribute set <inline-formula id="ieqn-28">
<!--<alternatives><inline-graphic xlink:href="ieqn-28.tif"/><tex-math id="tex-ieqn-28"><![CDATA[A]]></tex-math>--><mml:math id="mml-ieqn-28"><mml:mi>A</mml:mi></mml:math>
<!--</alternatives>--></inline-formula> is empty (or all attribute values of <italic>S</italic> are the same) or not. Accordingly, we propose an improved blockchain-based ID3 decision tree algorithm as the following steps.</p>
<p><italic>Step 1</italic>. Check the stopping Criteria I and II. If Criterion I is true, mark the current node as a class <inline-formula id="ieqn-29">
<!--<alternatives><inline-graphic xlink:href="ieqn-29.tif"/><tex-math id="tex-ieqn-29"><![CDATA[&#x1EF6;]]></tex-math>--><mml:math id="mml-ieqn-29"><mml:mrow><mml:mo>&#x1EF6;</mml:mo></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula> leaf node; if Criterion II is true, mark the Tree as a leaf and set the most common value of <inline-formula id="ieqn-30">
<!--<alternatives><inline-graphic xlink:href="ieqn-30.tif"/><tex-math id="tex-ieqn-30"><![CDATA[{\rm \acute{Y}}]]></tex-math>--><mml:math id="mml-ieqn-30"><mml:mrow><mml:mrow><mml:mover><mml:mi mathvariant="normal">Y</mml:mi><mml:mo>&#x00B4;</mml:mo></mml:mover></mml:mrow></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula> in <inline-formula id="ieqn-31">
<!--<alternatives><inline-graphic xlink:href="ieqn-31.tif"/><tex-math id="tex-ieqn-31"><![CDATA[S]]></tex-math>--><mml:math id="mml-ieqn-31"><mml:mi>S</mml:mi></mml:math>
<!--</alternatives>--></inline-formula> as the label. Otherwise, go to step 2.</p>
<p><italic>Step 2</italic>. Calculate the information gain <inline-formula id="ieqn-32">
<!--<alternatives><inline-graphic xlink:href="ieqn-32.tif"/><tex-math id="tex-ieqn-32"><![CDATA[Gain\left( &#x1EF6; {\rm |}{A_i}} \right)]]></tex-math>--><mml:math id="mml-ieqn-32"><mml:mi>G</mml:mi><mml:mi>a</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mo>&#x1EF6;</mml:mo><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula> of each condition attribute <inline-formula id="ieqn-33">
<!--<alternatives><inline-graphic xlink:href="ieqn-33.tif"/><tex-math id="tex-ieqn-33"><![CDATA[{A_i}]]></tex-math>--><mml:math id="mml-ieqn-33"><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula> according to <xref ref-type="disp-formula" rid="eqn-7">Eq. (7)</xref>; and set the parameter <inline-formula id="ieqn-34">
<!--<alternatives><inline-graphic xlink:href="ieqn-34.tif"/><tex-math id="tex-ieqn-34"><![CDATA[sW]]></tex-math>--><mml:math id="mml-ieqn-34"><mml:mi>s</mml:mi><mml:mi>W</mml:mi></mml:math>
<!--</alternatives>--></inline-formula> &#x003D;0 and <inline-formula id="ieqn-35">
<!--<alternatives><inline-graphic xlink:href="ieqn-35.tif"/><tex-math id="tex-ieqn-35"><![CDATA[pW]]></tex-math>--><mml:math id="mml-ieqn-35"><mml:mi>p</mml:mi><mml:mi>W</mml:mi></mml:math>
<!--</alternatives>--></inline-formula> &#x003D;0. For attribute value <inline-formula id="ieqn-36">
<!--<alternatives><inline-graphic xlink:href="ieqn-36.tif"/><tex-math id="tex-ieqn-36"><![CDATA[{a_i} \in {R_i}]]></tex-math>--><mml:math id="mml-ieqn-36"><mml:mrow><mml:msub><mml:mi>a</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula>, calculate the weight of each attribute using the training set <inline-formula id="ieqn-37">
<!--<alternatives><inline-graphic xlink:href="ieqn-37.tif"/><tex-math id="tex-ieqn-37"><![CDATA[{S_i}]]></tex-math>--><mml:math id="mml-ieqn-37"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula> of each value <inline-formula id="ieqn-38">
<!--<alternatives><inline-graphic xlink:href="ieqn-38.tif"/><tex-math id="tex-ieqn-38"><![CDATA[{a_i}]]></tex-math>--><mml:math id="mml-ieqn-38"><mml:mrow><mml:msub><mml:mi>a</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula>.</p>
<p><italic>Step 3</italic>. For attribute values in <inline-formula id="ieqn-39">
<!--<alternatives><inline-graphic xlink:href="ieqn-39.tif"/><tex-math id="tex-ieqn-39"><![CDATA[{A_j} \in A\backslash \left\{ {{A_i}} \right\}]]></tex-math>--><mml:math id="mml-ieqn-39"><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:mi>A</mml:mi><mml:mi mathvariant="normal">&#x2216;</mml:mi><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula>, calculate the relationship degree using <xref ref-type="disp-formula" rid="eqn-3">Eq. (3)</xref> and calculate the weighted relationship degree as <xref ref-type="disp-formula" rid="eqn-4">Eq. (4)</xref>. Then the new value of <inline-formula id="ieqn-40">
<!--<alternatives><inline-graphic xlink:href="ieqn-40.tif"/><tex-math id="tex-ieqn-40"><![CDATA[pW]]></tex-math>--><mml:math id="mml-ieqn-40"><mml:mi>p</mml:mi><mml:mi>W</mml:mi></mml:math>
<!--</alternatives>--></inline-formula> is obtained as: <inline-formula id="ieqn-41">
<!--<alternatives><inline-graphic xlink:href="ieqn-41.tif"/><tex-math id="tex-ieqn-41"><![CDATA[pW \leftarrow pW*WRD\left( {{A_j},&#x1EF6; } \right)]]></tex-math>--><mml:math id="mml-ieqn-41"><mml:mi>p</mml:mi><mml:mi>W</mml:mi><mml:mo stretchy="false">&#x2190;</mml:mo><mml:mi>p</mml:mi><mml:mi>W</mml:mi><mml:mo>&#x2217;</mml:mo><mml:mi>W</mml:mi><mml:mi>R</mml:mi><mml:mi>D</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mo>&#x1EF6;</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula> and the new value of <inline-formula id="ieqn-42">
<!--<alternatives><inline-graphic xlink:href="ieqn-42.tif"/><tex-math id="tex-ieqn-42"><![CDATA[sW]]></tex-math>--><mml:math id="mml-ieqn-42"><mml:mi>s</mml:mi><mml:mi>W</mml:mi></mml:math>
<!--</alternatives>--></inline-formula> is set: <inline-formula id="ieqn-43">
<!--<alternatives><inline-graphic xlink:href="ieqn-43.tif"/><tex-math id="tex-ieqn-43"><![CDATA[sW \leftarrow sW + \displaystyle{{\left| {{S_i}} \right|} \over {\left| S \right|}}*pW]]></tex-math>--><mml:math id="mml-ieqn-43"><mml:mi>s</mml:mi><mml:mi>W</mml:mi><mml:mo stretchy="false">&#x2190;</mml:mo><mml:mi>s</mml:mi><mml:mi>W</mml:mi><mml:mo>&#x002B;</mml:mo><mml:mstyle scriptlevel="0" displaystyle="true"><mml:mrow><mml:mfrac><mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>|</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:mi>S</mml:mi><mml:mo>|</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:mrow><mml:mo>&#x2217;</mml:mo><mml:mi>p</mml:mi><mml:mi>W</mml:mi></mml:mstyle></mml:math>
<!--</alternatives>--></inline-formula>. The value of the comprehensive information gain can be achieved as: <inline-formula id="ieqn-44">
<!--<alternatives><inline-graphic xlink:href="ieqn-44.tif"/><tex-math id="tex-ieqn-44"><![CDATA[Gain\left( &#x1EF6; {\rm |}{A_i}} \right) \leftarrow Gain\left( &#x1EF6; {\rm |}{A_i}} \right)*sW.]]></tex-math>--><mml:math id="mml-ieqn-44"><mml:mi>G</mml:mi><mml:mi>a</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mo>&#x1EF6;</mml:mo><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo stretchy="false">&#x2190;</mml:mo><mml:mi>G</mml:mi><mml:mi>a</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mo>&#x1EF6;</mml:mo><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2217;</mml:mo><mml:mi>s</mml:mi><mml:mi>W</mml:mi><mml:mo>.</mml:mo></mml:math>
<!--</alternatives>--></inline-formula></p>
<p><italic>Step 4</italic>. Determine the best splitting attribute <inline-formula id="ieqn-45">
<!--<alternatives><inline-graphic xlink:href="ieqn-45.tif"/><tex-math id="tex-ieqn-45"><![CDATA[{A_{best}}]]></tex-math>--><mml:math id="mml-ieqn-45"><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:mi>b</mml:mi><mml:mi>e</mml:mi><mml:mi>s</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula> that has the maximum comprehensive information gain: <inline-formula id="ieqn-46">
<!--<alternatives><inline-graphic xlink:href="ieqn-46.tif"/><tex-math id="tex-ieqn-46"><![CDATA[{A_{best}} \leftarrow \arg ma{x_A}Gain\left( &#x1EF6; {\rm |}A} \right)]]></tex-math>--><mml:math id="mml-ieqn-46"><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:mi>b</mml:mi><mml:mi>e</mml:mi><mml:mi>s</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">&#x2190;</mml:mo><mml:mi>arg</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>A</mml:mi></mml:msub></mml:mrow><mml:mi>G</mml:mi><mml:mi>a</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mo>&#x1EF6;</mml:mo><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mi>A</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula>, and go to Step 1.</p>
</sec>
<sec id="s4_2">
<label>4.2</label>
<title>Enhanced Homomorphic Encryption</title>
<p>To consider both privacy and efficiency, we adopt the vector homomorphic encryption (VHE) method [<xref ref-type="bibr" rid="ref-39">39</xref>] for the proposed BIDTC framework. Assuming that the data requestor and the data provider are denoted as <inline-formula id="ieqn-47">
<!--<alternatives><inline-graphic xlink:href="ieqn-47.tif"/><tex-math id="tex-ieqn-47"><![CDATA[R]]></tex-math>--><mml:math id="mml-ieqn-47"><mml:mi>R</mml:mi></mml:math>
<!--</alternatives>--></inline-formula> and <inline-formula id="ieqn-48">
<!--<alternatives><inline-graphic xlink:href="ieqn-48.tif"/><tex-math id="tex-ieqn-48"><![CDATA[P]]></tex-math>--><mml:math id="mml-ieqn-48"><mml:mi>P</mml:mi></mml:math>
<!--</alternatives>--></inline-formula>, respectively, we present the setup, training, and classification processes of BIDTC as follows.</p>
<p><italic>Phase 1</italic>. <inline-formula id="ieqn-49">
<!--<alternatives><inline-graphic xlink:href="ieqn-49.tif"/><tex-math id="tex-ieqn-49"><![CDATA[P.Setup\left( {\lambda ,{D^t}} \right)]]></tex-math>--><mml:math id="mml-ieqn-49"><mml:mi>P</mml:mi><mml:mo>.</mml:mo><mml:mi>S</mml:mi><mml:mi>e</mml:mi><mml:mi>t</mml:mi><mml:mi>u</mml:mi><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>&#x03BB;</mml:mi><mml:mo>,</mml:mo><mml:mrow><mml:msup><mml:mi>D</mml:mi><mml:mi>t</mml:mi></mml:msup></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula>: The data providers identify the security parameter <inline-formula id="ieqn-50">
<!--<alternatives><inline-graphic xlink:href="ieqn-50.tif"/><tex-math id="tex-ieqn-50"><![CDATA[\lambda]]></tex-math>--><mml:math id="mml-ieqn-50"><mml:mi>&#x03BB;</mml:mi></mml:math>
<!--</alternatives>--></inline-formula> and the training data <inline-formula id="ieqn-51">
<!--<alternatives><inline-graphic xlink:href="ieqn-51.tif"/><tex-math id="tex-ieqn-51"><![CDATA[{D^t},\left( {t = 1,2\cdot\cdot\cdot} \right)]]></tex-math>--><mml:math id="mml-ieqn-51"><mml:mrow><mml:msup><mml:mi>D</mml:mi><mml:mi>t</mml:mi></mml:msup></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn><mml:mo>&#x22C5;</mml:mo><mml:mo>&#x22C5;</mml:mo><mml:mo>&#x22C5;</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula>, where <inline-formula id="ieqn-52">
<!--<alternatives><inline-graphic xlink:href="ieqn-52.tif"/><tex-math id="tex-ieqn-52"><![CDATA[t]]></tex-math>--><mml:math id="mml-ieqn-52"><mml:mi>t</mml:mi></mml:math>
<!--</alternatives>--></inline-formula> represents the sequence number of the transferring data. With the key generation algorithm <inline-formula id="ieqn-53">
<!--<alternatives><inline-graphic xlink:href="ieqn-53.tif"/><tex-math id="tex-ieqn-53"><![CDATA[KeyGen\left( \lambda \right)]]></tex-math>--><mml:math id="mml-ieqn-53"><mml:mi>K</mml:mi><mml:mi>e</mml:mi><mml:mi>y</mml:mi><mml:mi>G</mml:mi><mml:mi>e</mml:mi><mml:mi>n</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x03BB;</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula>, the data providers obtain the VHE public, private keys, and the <inline-formula id="ieqn-54">
<!--<alternatives><inline-graphic xlink:href="ieqn-54.tif"/><tex-math id="tex-ieqn-54"><![CDATA[H]]></tex-math>--><mml:math id="mml-ieqn-54"><mml:mi>H</mml:mi></mml:math>
<!--</alternatives>--></inline-formula> matrix. The data providers will encrypt the <inline-formula id="ieqn-55">
<!--<alternatives><inline-graphic xlink:href="ieqn-55.tif"/><tex-math id="tex-ieqn-55"><![CDATA[{D^t}]]></tex-math>--><mml:math id="mml-ieqn-55"><mml:mrow><mml:msup><mml:mi>D</mml:mi><mml:mi>t</mml:mi></mml:msup></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula> <inline-formula id="ieqn-56">
<!--<alternatives><inline-graphic xlink:href="ieqn-56.tif"/><tex-math id="tex-ieqn-56"><![CDATA[\left( {{D^t} = \left\{ {x_1^t,x_2^t,\cdot\cdot\cdot,x_n^t} \right\}} \right)]]></tex-math>--><mml:math id="mml-ieqn-56"><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msup><mml:mi>D</mml:mi><mml:mi>t</mml:mi></mml:msup></mml:mrow><mml:mo>&#x003D;</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mn>1</mml:mn><mml:mi>t</mml:mi></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>x</mml:mi><mml:mn>2</mml:mn><mml:mi>t</mml:mi></mml:msubsup><mml:mo>,</mml:mo><mml:mo>&#x22C5;</mml:mo><mml:mo>&#x22C5;</mml:mo><mml:mo>&#x22C5;</mml:mo><mml:mo>,</mml:mo><mml:msubsup><mml:mi>x</mml:mi><mml:mi>n</mml:mi><mml:mi>t</mml:mi></mml:msubsup></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula> to <inline-formula id="ieqn-57">
<!--<alternatives><inline-graphic xlink:href="ieqn-57.tif"/><tex-math id="tex-ieqn-57"><![CDATA[{D^{{t}^{\prime}}}\left( {{D^{{t}^{\prime}}} = \left\{ {c_1^t,c_2^t,\cdot\cdot\cdot,c_n^t} \right\}} \right)]]></tex-math>--><mml:math id="mml-ieqn-57"><mml:mrow><mml:msup><mml:mi>D</mml:mi><mml:mrow><mml:msup><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msup><mml:mi>D</mml:mi><mml:mrow><mml:msup><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:msup></mml:mrow><mml:mo>&#x003D;</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msubsup><mml:mi>c</mml:mi><mml:mn>1</mml:mn><mml:mi>t</mml:mi></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>c</mml:mi><mml:mn>2</mml:mn><mml:mi>t</mml:mi></mml:msubsup><mml:mo>,</mml:mo><mml:mo>&#x22C5;</mml:mo><mml:mo>&#x22C5;</mml:mo><mml:mo>&#x22C5;</mml:mo><mml:mo>,</mml:mo><mml:msubsup><mml:mi>c</mml:mi><mml:mi>n</mml:mi><mml:mi>t</mml:mi></mml:msubsup></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula> by using the encryption algorithm <inline-formula id="ieqn-58">
<!--<alternatives><inline-graphic xlink:href="ieqn-58.tif"/><tex-math id="tex-ieqn-58"><![CDATA[Encrypt\left( {pk,{x_i}} \right)]]></tex-math>--><mml:math id="mml-ieqn-58"><mml:mi>E</mml:mi><mml:mi>n</mml:mi><mml:mi>c</mml:mi><mml:mi>r</mml:mi><mml:mi>y</mml:mi><mml:mi>p</mml:mi><mml:mi>t</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>p</mml:mi><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula>. Then, the data providers send the <inline-formula id="ieqn-59">
<!--<alternatives><inline-graphic xlink:href="ieqn-59.tif"/><tex-math id="tex-ieqn-59"><![CDATA[Encrypt\left( {pk,{x_i}} \right)]]></tex-math>--><mml:math id="mml-ieqn-59"><mml:mi>E</mml:mi><mml:mi>n</mml:mi><mml:mi>c</mml:mi><mml:mi>r</mml:mi><mml:mi>y</mml:mi><mml:mi>p</mml:mi><mml:mi>t</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>p</mml:mi><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula>, <inline-formula id="ieqn-60">
<!--<alternatives><inline-graphic xlink:href="ieqn-60.tif"/><tex-math id="tex-ieqn-60"><![CDATA[{\rm \; }{D^{{t}^{\prime}}}]]></tex-math>--><mml:math id="mml-ieqn-60"><mml:mrow><mml:mspace width="thickmathspace"></mml:mspace></mml:mrow><mml:mrow><mml:msup><mml:mi>D</mml:mi><mml:mrow><mml:msup><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:msup></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula>and matrix <inline-formula id="ieqn-61">
<!--<alternatives><inline-graphic xlink:href="ieqn-61.tif"/><tex-math id="tex-ieqn-61"><![CDATA[H]]></tex-math>--><mml:math id="mml-ieqn-61"><mml:mi>H</mml:mi></mml:math>
<!--</alternatives>--></inline-formula> to the corresponding storage servers.</p>
<p><italic>Phase 2</italic>. <inline-formula id="ieqn-62">
<!--<alternatives><inline-graphic xlink:href="ieqn-62.tif"/><tex-math id="tex-ieqn-62"><![CDATA[R.Training\_Classifier\_ID3\left( {{D^{ \cup &#x0027;}}} \right)]]></tex-math>--><mml:math id="mml-ieqn-62"><mml:mi>R</mml:mi><mml:mo>.</mml:mo><mml:mi>T</mml:mi><mml:mi>r</mml:mi><mml:mi>a</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>g</mml:mi><mml:mi mathvariant="normal">_</mml:mi><mml:mi>C</mml:mi><mml:mi>l</mml:mi><mml:mi>a</mml:mi><mml:mi>s</mml:mi><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>f</mml:mi><mml:mi>i</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mi mathvariant="normal">_</mml:mi><mml:mi>I</mml:mi><mml:mi>D</mml:mi><mml:mn>3</mml:mn><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msup><mml:mi>D</mml:mi><mml:mrow><mml:msup><mml:mo>&#x222A;</mml:mo><mml:mo>&#x2032;</mml:mo></mml:msup></mml:mrow></mml:msup></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula>: The data requestors encrypt the local dataset <inline-formula id="ieqn-63">
<!--<alternatives><inline-graphic xlink:href="ieqn-63.tif"/><tex-math id="tex-ieqn-63"><![CDATA[D]]></tex-math>--><mml:math id="mml-ieqn-63"><mml:mi>D</mml:mi></mml:math>
<!--</alternatives>--></inline-formula> to <inline-formula id="ieqn-64">
<!--<alternatives><inline-graphic xlink:href="ieqn-64.tif"/><tex-math id="tex-ieqn-64"><![CDATA[{D}^{\prime}]]></tex-math>--><mml:math id="mml-ieqn-64"><mml:msup><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup></mml:math>
<!--</alternatives>--></inline-formula> by using the encryption algorithm <inline-formula id="ieqn-65">
<!--<alternatives><inline-graphic xlink:href="ieqn-65.tif"/><tex-math id="tex-ieqn-65"><![CDATA[Encrypt\left( {pk,{x_i}} \right)]]></tex-math>--><mml:math id="mml-ieqn-65"><mml:mi>E</mml:mi><mml:mi>n</mml:mi><mml:mi>c</mml:mi><mml:mi>r</mml:mi><mml:mi>y</mml:mi><mml:mi>p</mml:mi><mml:mi>t</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>p</mml:mi><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula>, which will be combined with the received dataset <inline-formula id="ieqn-66">
<!--<alternatives><inline-graphic xlink:href="ieqn-66.tif"/><tex-math id="tex-ieqn-66"><![CDATA[{D^{{t}^{\prime}}}]]></tex-math>--><mml:math id="mml-ieqn-66"><mml:mrow><mml:msup><mml:mi>D</mml:mi><mml:mrow><mml:msup><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:msup></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula> to generate a new dataset <inline-formula id="ieqn-67">
<!--<alternatives><inline-graphic xlink:href="ieqn-67.tif"/><tex-math id="tex-ieqn-67"><![CDATA[{D^{ \cup &#x0027;}}]]></tex-math>--><mml:math id="mml-ieqn-67"><mml:mrow><mml:msup><mml:mi>D</mml:mi><mml:mrow><mml:msup><mml:mo>&#x222A;</mml:mo><mml:mo>&#x2032;</mml:mo></mml:msup></mml:mrow></mml:msup></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula>. Then the classification model will be trained by performing the improved ID3 algorithm on the federated training dataset <inline-formula id="ieqn-68">
<!--<alternatives><inline-graphic xlink:href="ieqn-68.tif"/><tex-math id="tex-ieqn-68"><![CDATA[{D^{ \cup &#x0027;}}]]></tex-math>--><mml:math id="mml-ieqn-68"><mml:mrow><mml:msup><mml:mi>D</mml:mi><mml:mrow><mml:msup><mml:mo>&#x222A;</mml:mo><mml:mo>&#x2032;</mml:mo></mml:msup></mml:mrow></mml:msup></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula>.</p>
<p><italic>Phase 3</italic>. <inline-formula id="ieqn-69">
<!--<alternatives><inline-graphic xlink:href="ieqn-69.tif"/><tex-math id="tex-ieqn-69"><![CDATA[R.Testing\_ID3\left( {V{D^{&#x0027;}}} \right)]]></tex-math>--><mml:math id="mml-ieqn-69"><mml:mi>R</mml:mi><mml:mo>.</mml:mo><mml:mi>T</mml:mi><mml:mi>e</mml:mi><mml:mi>s</mml:mi><mml:mi>t</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>g</mml:mi><mml:mi mathvariant="normal">_</mml:mi><mml:mi>I</mml:mi><mml:mi>D</mml:mi><mml:mn>3</mml:mn><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>V</mml:mi><mml:mrow><mml:msup><mml:mi>D</mml:mi><mml:mrow><mml:msup><mml:mi></mml:mi><mml:mo>&#x2032;</mml:mo></mml:msup></mml:mrow></mml:msup></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula>: The data requestors encrypt the local testing dataset <inline-formula id="ieqn-70">
<!--<alternatives><inline-graphic xlink:href="ieqn-70.tif"/><tex-math id="tex-ieqn-70"><![CDATA[VD = \left\{ {{x_1},{x_2},\cdot\cdot\cdot,{x_m}} \right\}]]></tex-math>--><mml:math id="mml-ieqn-70"><mml:mi>V</mml:mi><mml:mi>D</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mo>&#x22C5;</mml:mo><mml:mo>&#x22C5;</mml:mo><mml:mo>&#x22C5;</mml:mo><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>m</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula> to obtain the encrypted testing dataset <inline-formula id="ieqn-71">
<!--<alternatives><inline-graphic xlink:href="ieqn-71.tif"/><tex-math id="tex-ieqn-71"><![CDATA[V{D^{&#x0027;}} = \left\{ {{c_1},{c_2},\cdot\cdot\cdot,{c_m}} \right\}]]></tex-math>--><mml:math id="mml-ieqn-71"><mml:mi>V</mml:mi><mml:mrow><mml:msup><mml:mi>D</mml:mi><mml:mrow><mml:msup><mml:mi></mml:mi><mml:mo>&#x2032;</mml:mo></mml:msup></mml:mrow></mml:msup></mml:mrow><mml:mo>&#x003D;</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>c</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>c</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mo>&#x22C5;</mml:mo><mml:mo>&#x22C5;</mml:mo><mml:mo>&#x22C5;</mml:mo><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>c</mml:mi><mml:mi>m</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula> by using the same encryption operations as mentioned above. The classification accuracy will be calculated by the data requestors when completing the classification task on the testing dataset <inline-formula id="ieqn-72">
<!--<alternatives><inline-graphic xlink:href="ieqn-72.tif"/><tex-math id="tex-ieqn-72"><![CDATA[V{D^{&#x0027;}}]]></tex-math>--><mml:math id="mml-ieqn-72"><mml:mi>V</mml:mi><mml:mrow><mml:msup><mml:mi>D</mml:mi><mml:mrow><mml:msup><mml:mi></mml:mi><mml:mo>&#x2032;</mml:mo></mml:msup></mml:mrow></mml:msup></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula>.</p>
</sec>
<sec id="s4_3">
<label>4.3</label>
<title>Stimulation Scheme with Smart Contract</title>
<p>In this section, we develop a stimulation scheme with smart contracts for the proposed BIDTC framework.</p>
<p>In the blockchain network, the transactions in a smart contract can be executed automatically, and the corresponding inputs, outputs, and states affected by executing the smart contracts are negotiated and agreed on by all participating nodes [<xref ref-type="bibr" rid="ref-40">40</xref>,<xref ref-type="bibr" rid="ref-41">41</xref>]. Here, we propose a stimulation scheme to incentivize the providers to share more valuable data. For each transaction of data sharing, there are two types of transaction fees: basic transaction fee and additional transaction fee. We assume that the basic transaction fee the data providers can receive from the data requestors is <inline-formula id="ieqn-73">
<!--<alternatives><inline-graphic xlink:href="ieqn-73.tif"/><tex-math id="tex-ieqn-73"><![CDATA[\Delta]]></tex-math>--><mml:math id="mml-ieqn-73"><mml:mi mathvariant="normal">&#x0394;</mml:mi></mml:math>
<!--</alternatives>--></inline-formula> ethers. The additional transaction fee depends on the percentage increase of the classification accuracy due to the data sharing. Let <inline-formula id="ieqn-74">
<!--<alternatives><inline-graphic xlink:href="ieqn-74.tif"/><tex-math id="tex-ieqn-74"><![CDATA[\Delta acc]]></tex-math>--><mml:math id="mml-ieqn-74"><mml:mi mathvariant="normal">&#x0394;</mml:mi><mml:mi>a</mml:mi><mml:mi>c</mml:mi><mml:mi>c</mml:mi></mml:math>
<!--</alternatives>--></inline-formula> denote the percentage increase of the classification accuracy between the original classification model and the newly constructed one (i.e., after the data sharing). If the <inline-formula id="ieqn-75">
<!--<alternatives><inline-graphic xlink:href="ieqn-75.tif"/><tex-math id="tex-ieqn-75"><![CDATA[\Delta acc > 0]]></tex-math>--><mml:math id="mml-ieqn-75"><mml:mi mathvariant="normal">&#x0394;</mml:mi><mml:mi>a</mml:mi><mml:mi>c</mml:mi><mml:mi>c</mml:mi><mml:mo>&#x003E;</mml:mo><mml:mn>0</mml:mn></mml:math>
<!--</alternatives>--></inline-formula>, then the data requestors will pay an additional transaction fee to the data providers, according to <xref ref-type="table" rid="table-1">Tab. 1</xref>. If the classification accuracy is not increased when comparing with the original model, the data requestors will not pay an additional transaction fee to the data providers for the data sharing.</p>

<table-wrap id="table-1">
<label>Table 1</label>
<caption>
<title>Stimulation Mechanism for Data Providers</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Increment of accuracy</th>
<th>Basic transaction fee</th>
<th>Additional transaction fee</th>
</tr>
</thead>
<tbody>
<tr>
<td><inline-formula id="ieqn-76">
<!--<alternatives><inline-graphic xlink:href="ieqn-76.tif"/><tex-math id="tex-ieqn-76"><![CDATA[\Delta acc \le threshold]]></tex-math>--><mml:math id="mml-ieqn-76"><mml:mi mathvariant="normal">&#x0394;</mml:mi><mml:mi>a</mml:mi><mml:mi>c</mml:mi><mml:mi>c</mml:mi><mml:mo>&#x2264;</mml:mo><mml:mi>t</mml:mi><mml:mi>h</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>s</mml:mi><mml:mi>h</mml:mi><mml:mi>o</mml:mi><mml:mi>l</mml:mi><mml:mi>d</mml:mi></mml:math>
<!--</alternatives>--></inline-formula></td>
<td><inline-formula id="ieqn-77">
<!--<alternatives><inline-graphic xlink:href="ieqn-77.tif"/><tex-math id="tex-ieqn-77"><![CDATA[\Delta]]></tex-math>--><mml:math id="mml-ieqn-77"><mml:mi mathvariant="normal">&#x0394;</mml:mi></mml:math>
<!--</alternatives>--></inline-formula> ether</td>
<td>0 ether</td>
</tr>
<tr>
<td><inline-formula id="ieqn-78">
<!--<alternatives><inline-graphic xlink:href="ieqn-78.tif"/><tex-math id="tex-ieqn-78"><![CDATA[\Delta acc > threshold]]></tex-math>--><mml:math id="mml-ieqn-78"><mml:mi mathvariant="normal">&#x0394;</mml:mi><mml:mi>a</mml:mi><mml:mi>c</mml:mi><mml:mi>c</mml:mi><mml:mo>&#x003E;</mml:mo><mml:mi>t</mml:mi><mml:mi>h</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>s</mml:mi><mml:mi>h</mml:mi><mml:mi>o</mml:mi><mml:mi>l</mml:mi><mml:mi>d</mml:mi></mml:math>
<!--</alternatives>--></inline-formula></td>
<td><inline-formula id="ieqn-79">
<!--<alternatives><inline-graphic xlink:href="ieqn-79.tif"/><tex-math id="tex-ieqn-79"><![CDATA[\Delta]]></tex-math>--><mml:math id="mml-ieqn-79"><mml:mi mathvariant="normal">&#x0394;</mml:mi></mml:math>
<!--</alternatives>--></inline-formula> ether</td>
<td>[<inline-formula id="ieqn-80">
<!--<alternatives><inline-graphic xlink:href="ieqn-80.tif"/><tex-math id="tex-ieqn-80"><![CDATA[\Delta acc]]></tex-math>--><mml:math id="mml-ieqn-80"><mml:mi mathvariant="normal">&#x0394;</mml:mi><mml:mi>a</mml:mi><mml:mi>c</mml:mi><mml:mi>c</mml:mi></mml:math>
<!--</alternatives>--></inline-formula>] ether</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The higher quality of the data shared by the providers, the better classification accuracy, and the more financial profits the data providers can obtain during the procedure of the sharing of the training data. Therefore, the data providers in various blockchain networks have incentives to share more valuable datasets.</p>
</sec>
<sec id="s4_4">
<label>4.4</label>
<title>The Proposed BIDTC Framework</title>
<p>The proposed Blockchain-based ID3 Decision Tree Classification (BIDTC) framework takes advantage of three techniques: blockchain-based ID3 decision tree, enhanced homomorphic encryption, and stimulation smart contract to conduct the classification in the distributed environment while effectively considering the data privacy and the value of the user data. <xref ref-type="fig" rid="fig-2">Fig. 2</xref> shows the overall process of the proposed BIDTC framework, whose primary operations are listed below.</p>
<list list-type="roman-lower">
<list-item><p>The distributed blockchain network is set up, and the Ethereum-based consortium chains are constructed. The distributed blockchain network consists of a large number of data providers, the ledgering nodes, and the data requestors.</p></list-item>
<list-item><p>The data providers encrypt their local training data by using the vector homomorphic encryption, then upload the encrypted data to a storage server in the blockchain network. The ledgering nodes with the consensus algorithm can validate the transactions involved with sharing data. All the transactions will be stored in the consortium chain.</p></list-item>
<list-item><p>The data requestors train the local training dataset with the ID3-based algorithm and obtain a classification model. This model is then validated on the testing dataset, and the accuracy (say <inline-formula id="ieqn-81">
<!--<alternatives><inline-graphic xlink:href="ieqn-81.tif"/><tex-math id="tex-ieqn-81"><![CDATA[ac{c_0}]]></tex-math>--><mml:math id="mml-ieqn-81"><mml:mi>a</mml:mi><mml:mi>c</mml:mi><mml:mrow><mml:msub><mml:mi>c</mml:mi><mml:mn>0</mml:mn></mml:msub></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula>) is obtained. The data requestors can then issue requests to the blockchain network for more shared training data. With the authentication by the ledgering nodes, the data requestors can receive the encrypted training data shared by the providers. At the same time, a smart contract is bounded between the data providers and the corresponding data requestors. Once receiving the encrypted training data, the data requestors encrypt the local training data by using the same encryption scheme from the data providers, federate it with the received encrypted training dataset and perform the improved ID3 algorithm to obtain a new classification model and accuracy (say <inline-formula id="ieqn-82">
<!--<alternatives><inline-graphic xlink:href="ieqn-82.tif"/><tex-math id="tex-ieqn-82"><![CDATA[ac{c_1}]]></tex-math>--><mml:math id="mml-ieqn-82"><mml:mi>a</mml:mi><mml:mi>c</mml:mi><mml:mrow><mml:msub><mml:mi>c</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula>).</p></list-item>
<list-item><p>The smart contracts and the stimulation scheme will be triggered when the accuracy difference: <inline-formula id="ieqn-83">
<!--<alternatives><inline-graphic xlink:href="ieqn-83.tif"/><tex-math id="tex-ieqn-83"><![CDATA[\Delta acc = ac{c_1} - ac{c_0}]]></tex-math>--><mml:math id="mml-ieqn-83"><mml:mi mathvariant="normal">&#x0394;</mml:mi><mml:mi>a</mml:mi><mml:mi>c</mml:mi><mml:mi>c</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mi>a</mml:mi><mml:mi>c</mml:mi><mml:mrow><mml:msub><mml:mi>c</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mi>a</mml:mi><mml:mi>c</mml:mi><mml:mrow><mml:msub><mml:mi>c</mml:mi><mml:mn>0</mml:mn></mml:msub></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula> is above a certain threshold.</p></list-item></list>
<fig id="fig-2">
<label>Figure 2</label>
<caption>
<title>The flow diagram of the proposed blockchain-based scheme</title>
</caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="IASC_17154-fig-2.png"/>
</fig>
</sec>
</sec>
<sec id="s5">
<label>5</label>
<title>Performance Evaluation</title>
<p>In this section, we conduct simulations to validate the proposed blockchain-based BIDTC framework and analyze the performance.</p>
<sec id="s5_1">
<label>5.1</label>
<title>Experiment Settings</title>
<p>We simulated the blockchain-based BIDTC network with Python 3.7. The simulation platform is built on a machine with Ubuntu 16.04 LTS, Intel Core 3.40 GHz i5-8250U CPU, and 8.0 GB of RAM. In the consortium blockchain network, each node is deployed based on the Geth 1.7.2 (Go Ethereum). The configuration file <italic>genesis. json</italic> includes the identifier of the chain <inline-formula id="ieqn-84">
<!--<alternatives><inline-graphic xlink:href="ieqn-84.tif"/><tex-math id="tex-ieqn-84"><![CDATA[id]]></tex-math>--><mml:math id="mml-ieqn-84"><mml:mi>i</mml:mi><mml:mi>d</mml:mi></mml:math>
<!--</alternatives>--></inline-formula>, the random number <inline-formula id="ieqn-85">
<!--<alternatives><inline-graphic xlink:href="ieqn-85.tif"/><tex-math id="tex-ieqn-85"><![CDATA[nounce]]></tex-math>--><mml:math id="mml-ieqn-85"><mml:mi>n</mml:mi><mml:mi>o</mml:mi><mml:mi>u</mml:mi><mml:mi>n</mml:mi><mml:mi>c</mml:mi><mml:mi>e</mml:mi></mml:math>
<!--</alternatives>--></inline-formula>, and the <inline-formula id="ieqn-86">
<!--<alternatives><inline-graphic xlink:href="ieqn-86.tif"/><tex-math id="tex-ieqn-86"><![CDATA[timestamp]]></tex-math>--><mml:math id="mml-ieqn-86"><mml:mi>t</mml:mi><mml:mi>i</mml:mi><mml:mi>m</mml:mi><mml:mi>e</mml:mi><mml:mi>s</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mi>m</mml:mi><mml:mi>p</mml:mi></mml:math>
<!--</alternatives>--></inline-formula>. The Remix-based coding and testing for smart contracts are implemented in a browser-based IDE environment. The account address, the balance, and the indexes of datasets are defined in the structs, as shown in <xref ref-type="fig" rid="fig-3">Fig. 3</xref>.</p>
<fig id="fig-3">
<label>Figure 3</label>
<caption>
<title>The illustration of the deployment for blockchain-based smart contracts</title>
</caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="IASC_17154-fig-3.png"/>
</fig>
<p>We carry out the experiments using the MNIST dataset [<xref ref-type="bibr" rid="ref-42">42</xref>]. We set 60000 samples as the training dataset and 10000 samples as the testing dataset. The training dataset is further divided into four equal parts and stored in four random nodes, namely, Node <inline-formula id="ieqn-87">
<!--<alternatives><inline-graphic xlink:href="ieqn-87.tif"/><tex-math id="tex-ieqn-87"><![CDATA[A]]></tex-math>--><mml:math id="mml-ieqn-87"><mml:mi>A</mml:mi></mml:math>
<!--</alternatives>--></inline-formula>, Node <inline-formula id="ieqn-88">
<!--<alternatives><inline-graphic xlink:href="ieqn-88.tif"/><tex-math id="tex-ieqn-88"><![CDATA[B]]></tex-math>--><mml:math id="mml-ieqn-88"><mml:mi>B</mml:mi></mml:math>
<!--</alternatives>--></inline-formula>, Node <inline-formula id="ieqn-89">
<!--<alternatives><inline-graphic xlink:href="ieqn-89.tif"/><tex-math id="tex-ieqn-89"><![CDATA[C]]></tex-math>--><mml:math id="mml-ieqn-89"><mml:mi>C</mml:mi></mml:math>
<!--</alternatives>--></inline-formula>, and Node <inline-formula id="ieqn-90">
<!--<alternatives><inline-graphic xlink:href="ieqn-90.tif"/><tex-math id="tex-ieqn-90"><![CDATA[D]]></tex-math>--><mml:math id="mml-ieqn-90"><mml:mi>D</mml:mi></mml:math>
<!--</alternatives>--></inline-formula>.</p>
</sec>
<sec id="s5_2">
<label>5.2</label>
<title>Experimental Results</title>
<p>As the data privacy is built-in encrypted data sharing, here, we focus on evaluating the accuracy and speed of the proposed BIDTC. The confusion matrix includes True Positives (<inline-formula id="ieqn-91">
<!--<alternatives><inline-graphic xlink:href="ieqn-91.tif"/><tex-math id="tex-ieqn-91"><![CDATA[TP]]></tex-math>--><mml:math id="mml-ieqn-91"><mml:mi>T</mml:mi><mml:mi>P</mml:mi></mml:math>
<!--</alternatives>--></inline-formula>), True Negatives (<inline-formula id="ieqn-92">
<!--<alternatives><inline-graphic xlink:href="ieqn-92.tif"/><tex-math id="tex-ieqn-92"><![CDATA[TN]]></tex-math>--><mml:math id="mml-ieqn-92"><mml:mi>T</mml:mi><mml:mi>N</mml:mi></mml:math>
<!--</alternatives>--></inline-formula>), False Positives (<inline-formula id="ieqn-93">
<!--<alternatives><inline-graphic xlink:href="ieqn-93.tif"/><tex-math id="tex-ieqn-93"><![CDATA[FP]]></tex-math>--><mml:math id="mml-ieqn-93"><mml:mi>F</mml:mi><mml:mi>P</mml:mi></mml:math>
<!--</alternatives>--></inline-formula>), and False Negatives (<inline-formula id="ieqn-94">
<!--<alternatives><inline-graphic xlink:href="ieqn-94.tif"/><tex-math id="tex-ieqn-94"><![CDATA[FN]]></tex-math>--><mml:math id="mml-ieqn-94"><mml:mi>F</mml:mi><mml:mi>N</mml:mi></mml:math>
<!--</alternatives>--></inline-formula>). The <inline-formula id="ieqn-95">
<!--<alternatives><inline-graphic xlink:href="ieqn-95.tif"/><tex-math id="tex-ieqn-95"><![CDATA[TP]]></tex-math>--><mml:math id="mml-ieqn-95"><mml:mi>T</mml:mi><mml:mi>P</mml:mi></mml:math>
<!--</alternatives>--></inline-formula> represents the sample that is actually positive and predicted to be positive; the <inline-formula id="ieqn-96">
<!--<alternatives><inline-graphic xlink:href="ieqn-96.tif"/><tex-math id="tex-ieqn-96"><![CDATA[TN]]></tex-math>--><mml:math id="mml-ieqn-96"><mml:mi>T</mml:mi><mml:mi>N</mml:mi></mml:math>
<!--</alternatives>--></inline-formula> represents the sample that is actually negative and predicted to be negative; the <inline-formula id="ieqn-97">
<!--<alternatives><inline-graphic xlink:href="ieqn-97.tif"/><tex-math id="tex-ieqn-97"><![CDATA[FP]]></tex-math>--><mml:math id="mml-ieqn-97"><mml:mi>F</mml:mi><mml:mi>P</mml:mi></mml:math>
<!--</alternatives>--></inline-formula> represents the sample that is actually negative and predicted to be positive; and the <inline-formula id="ieqn-98">
<!--<alternatives><inline-graphic xlink:href="ieqn-98.tif"/><tex-math id="tex-ieqn-98"><![CDATA[FN]]></tex-math>--><mml:math id="mml-ieqn-98"><mml:mi>F</mml:mi><mml:mi>N</mml:mi></mml:math>
<!--</alternatives>--></inline-formula> represents the sample that is actually positive and predicted to be negative. If there are <inline-formula id="ieqn-99">
<!--<alternatives><inline-graphic xlink:href="ieqn-99.tif"/><tex-math id="tex-ieqn-99"><![CDATA[M]]></tex-math>--><mml:math id="mml-ieqn-99"><mml:mi>M</mml:mi></mml:math>
<!--</alternatives>--></inline-formula> classes, we can calculate the classification accuracy according to the following formula.</p>
<p><disp-formula id="eqn-8">
<label>(8)</label>
<!--<alternatives>
<graphic mimetype="image" mime-subtype="png" xlink:href="eqn-8.png"/><tex-math id="tex-eqn-8"><![CDATA[AC = \displaystyle{{\mathop \sum \nolimits_{i = 1}^M \displaystyle{{T{P_i} + T{N_i}} \over {T{P_i} + F{N_i} + T{N_i} + F{P_i}}}} \over M}]]></tex-math>--><mml:math id="mml-eqn-8" display="block"><mml:mi>A</mml:mi><mml:mi>C</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mstyle scriptlevel="0" displaystyle="true"><mml:mrow><mml:mfrac><mml:mrow><mml:msubsup><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>M</mml:mi></mml:msubsup><mml:mstyle scriptlevel="0" displaystyle="true"><mml:mrow><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>&#x002B;</mml:mo><mml:mi>T</mml:mi><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mrow><mml:mi>T</mml:mi><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>&#x002B;</mml:mo><mml:mi>F</mml:mi><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>&#x002B;</mml:mo><mml:mi>T</mml:mi><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>&#x002B;</mml:mo><mml:mi>F</mml:mi><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:mfrac></mml:mrow></mml:mstyle></mml:mrow><mml:mi>M</mml:mi></mml:mfrac></mml:mrow></mml:mstyle></mml:math>
<!--</alternatives>--></disp-formula></p>
<p>As <xref ref-type="disp-formula" rid="eqn-8">Eq. (8)</xref> shows, the classification accuracy <inline-formula id="ieqn-100">
<!--<alternatives><inline-graphic xlink:href="ieqn-100.tif"/><tex-math id="tex-ieqn-100"><![CDATA[AC]]></tex-math>--><mml:math id="mml-ieqn-100"><mml:mi>A</mml:mi><mml:mi>C</mml:mi></mml:math>
<!--</alternatives>--></inline-formula> equals the rate between all the true classified samples and all the classified samples in the corresponding testing dataset. The speed of the classification can be measured based on the time consumed in training the model and classifying the testing samples.</p>
<sec id="s5_2_1">
<label>5.2.1</label>
<title>Classification Accuracy versus Data Volume</title>
<p><xref ref-type="fig" rid="fig-4">Fig. 4</xref> shows the classification accuracy for the four random nodes. From <xref ref-type="fig" rid="fig-4">Fig. 4</xref>, we can see that the classification accuracy of all four nodes is improved significantly when increasing their training data volume. The initial values of the classification accuracy of the four nodes are different in <xref ref-type="fig" rid="fig-4">Fig. 4</xref>. Specifically, <xref ref-type="fig" rid="fig-4">Fig. 4a</xref> has the maximum accuracy of 0.84, and <xref ref-type="fig" rid="fig-4">Fig. 4b</xref> has the minimum accuracy of 0.8. This is because the quality of the training dataset in Node <inline-formula id="ieqn-101">
<!--<alternatives><inline-graphic xlink:href="ieqn-101.tif"/><tex-math id="tex-ieqn-101"><![CDATA[A]]></tex-math>--><mml:math id="mml-ieqn-101"><mml:mi>A</mml:mi></mml:math>
<!--</alternatives>--></inline-formula> is the highest among the four nodes, while Node <inline-formula id="ieqn-102">
<!--<alternatives><inline-graphic xlink:href="ieqn-102.tif"/><tex-math id="tex-ieqn-102"><![CDATA[B]]></tex-math>--><mml:math id="mml-ieqn-102"><mml:mi>B</mml:mi></mml:math>
<!--</alternatives>--></inline-formula> has the worst data quality. We use <inline-formula id="ieqn-103">
<!--<alternatives><inline-graphic xlink:href="ieqn-103.tif"/><tex-math id="tex-ieqn-103"><![CDATA[{Q_i}]]></tex-math>--><mml:math id="mml-ieqn-103"><mml:mrow><mml:msub><mml:mi>Q</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula> to denote the dataset quality of Node <inline-formula id="ieqn-104">
<!--<alternatives><inline-graphic xlink:href="ieqn-104.tif"/><tex-math id="tex-ieqn-104"><![CDATA[i]]></tex-math>--><mml:math id="mml-ieqn-104"><mml:mi>i</mml:mi></mml:math>
<!--</alternatives>--></inline-formula>. The quality relationship among the four nodes: <inline-formula id="ieqn-105">
<!--<alternatives><inline-graphic xlink:href="ieqn-105.tif"/><tex-math id="tex-ieqn-105"><![CDATA[{Q_A} > {Q_D} > {Q_C} > {Q_B}]]></tex-math>--><mml:math id="mml-ieqn-105"><mml:mrow><mml:msub><mml:mi>Q</mml:mi><mml:mi>A</mml:mi></mml:msub></mml:mrow><mml:mo>&#x003E;</mml:mo><mml:mrow><mml:msub><mml:mi>Q</mml:mi><mml:mi>D</mml:mi></mml:msub></mml:mrow><mml:mo>&#x003E;</mml:mo><mml:mrow><mml:msub><mml:mi>Q</mml:mi><mml:mi>C</mml:mi></mml:msub></mml:mrow><mml:mo>&#x003E;</mml:mo><mml:mrow><mml:msub><mml:mi>Q</mml:mi><mml:mi>B</mml:mi></mml:msub></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula> is further verified in <xref ref-type="fig" rid="fig-5">Fig. 5</xref>, where each node works as a data requestor and federates more training data from the other three data providers. From <xref ref-type="fig" rid="fig-5">Fig. 5</xref>, we can see that the classification accuracy improves as the amount of the data federation increases, and the nodes with high-quality datasets can achieve a greater gain of the classification accuracy.</p>
<fig id="fig-4">
<label>Figure 4</label>
<caption>
<title>The classification accuracy of BIDTC when varying the data volume</title>
</caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="IASC_17154-fig-4.png"/>
</fig>
<fig id="fig-5">
<label>Figure 5</label>
<caption>
<title>The trends of classification accuracy by BIDTC with multiple nodes</title>
</caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="IASC_17154-fig-5.png"/>
</fig>
</sec>
<sec id="s5_2_2">
<label>5.2.2</label>
<title>Classification Accuracy versus Data Quality</title>
<p><xref ref-type="disp-formula" rid="eqn-9">Eq. (9)</xref> is defined to measure the quality of the training dataset, where <inline-formula id="ieqn-106">
<!--<alternatives><inline-graphic xlink:href="ieqn-106.tif"/><tex-math id="tex-ieqn-106"><![CDATA[{N_S}\left( i \right)]]></tex-math>--><mml:math id="mml-ieqn-106"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>S</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula> is the total number of samples in the training dataset of Node <inline-formula id="ieqn-107">
<!--<alternatives><inline-graphic xlink:href="ieqn-107.tif"/><tex-math id="tex-ieqn-107"><![CDATA[i]]></tex-math>--><mml:math id="mml-ieqn-107"><mml:mi>i</mml:mi></mml:math>
<!--</alternatives>--></inline-formula>, and <inline-formula id="ieqn-108">
<!--<alternatives><inline-graphic xlink:href="ieqn-108.tif"/><tex-math id="tex-ieqn-108"><![CDATA[{N_{{S_e}}}\left( i \right)]]></tex-math>--><mml:math id="mml-ieqn-108"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi>e</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula> is the number of low-quality samples in the training dataset of Node <inline-formula id="ieqn-109">
<!--<alternatives><inline-graphic xlink:href="ieqn-109.tif"/><tex-math id="tex-ieqn-109"><![CDATA[i]]></tex-math>--><mml:math id="mml-ieqn-109"><mml:mi>i</mml:mi></mml:math>
<!--</alternatives>--></inline-formula>. The sample with the blurry picture or an incorrect class label in the training dataset can be marked as a low-quality sample.</p>
<p><disp-formula id="eqn-9">
<label>(9)</label>
<!--<alternatives>
<graphic mimetype="image" mime-subtype="png" xlink:href="eqn-9.png"/><tex-math id="tex-eqn-9"><![CDATA[{Q_i} = 1 - \displaystyle{{{N_{{S_e}}}\left( i \right)} \over {{N_S}\left( i \right)}}]]></tex-math>--><mml:math id="mml-eqn-9" display="block"><mml:mrow><mml:msub><mml:mi>Q</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>&#x003D;</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mstyle scriptlevel="0" displaystyle="true"><mml:mrow><mml:mfrac><mml:mrow><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi>e</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>S</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:mrow></mml:mstyle></mml:math>
<!--</alternatives>--></disp-formula></p>
<p>In this experiment, we uniformly select 10% of the original MNIST training dataset from each class and replace their class with random integer numbers in the range of 0&#x007E;9. As a result, we obtain 6000 low-quality training samples, denoted by <inline-formula id="ieqn-110">
<!--<alternatives><inline-graphic xlink:href="ieqn-110.tif"/><tex-math id="tex-ieqn-110"><![CDATA[LQ]]></tex-math>--><mml:math id="mml-ieqn-110"><mml:mi>L</mml:mi><mml:mi>Q</mml:mi></mml:math>
<!--</alternatives>--></inline-formula>. For each network node, when the volume of training data reaches a threshold &#x0398;, we add some low-quality training samples into the corresponding nodes. For example, when the volume of the federated training data in Node <inline-formula id="ieqn-111">
<!--<alternatives><inline-graphic xlink:href="ieqn-111.tif"/><tex-math id="tex-ieqn-111"><![CDATA[A]]></tex-math>--><mml:math id="mml-ieqn-111"><mml:mi>A</mml:mi></mml:math>
<!--</alternatives>--></inline-formula> reaches 20000, we gradually add 0%&#x007E;20% of low-quality training samples from <inline-formula id="ieqn-112">
<!--<alternatives><inline-graphic xlink:href="ieqn-112.tif"/><tex-math id="tex-ieqn-112"><![CDATA[LQ]]></tex-math>--><mml:math id="mml-ieqn-112"><mml:mi>L</mml:mi><mml:mi>Q</mml:mi></mml:math>
<!--</alternatives>--></inline-formula> into its training dataset. <xref ref-type="fig" rid="fig-6">Fig. 6</xref> shows the experiment results when setting &#x0398; as 10<sup>4</sup>, 2&#x002A;10<sup>4</sup>, and 3&#x002A;10<sup>4</sup>. We can see that <xref ref-type="fig" rid="fig-6">Fig. 6a</xref> has the maximum initial accuracy of 0.79 when the data volume amounts to 30000. <xref ref-type="fig" rid="fig-6">Fig. 6b</xref> has the minimum initial accuracy of 0.66 when the data volume amounts to 10000. This is due to the fact that the initial data quality of the training dataset in Node <inline-formula id="ieqn-113">
<!--<alternatives><inline-graphic xlink:href="ieqn-113.tif"/><tex-math id="tex-ieqn-113"><![CDATA[A]]></tex-math>--><mml:math id="mml-ieqn-113"><mml:mi>A</mml:mi></mml:math>
<!--</alternatives>--></inline-formula> is highest while the one in Node <inline-formula id="ieqn-114">
<!--<alternatives><inline-graphic xlink:href="ieqn-114.tif"/><tex-math id="tex-ieqn-114"><![CDATA[B]]></tex-math>--><mml:math id="mml-ieqn-114"><mml:mi>B</mml:mi></mml:math>
<!--</alternatives>--></inline-formula> is the lowest. Again, we can see that for a given node, the classification accuracy improves significantly when increasing the training dataset volume. In addition, the better training data quality will result in higher classification accuracy from BIDTC.</p>
<fig id="fig-6">
<label>Figure 6</label>
<caption>
<title>The classification accuracy versus training data quality</title>
</caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="IASC_17154-fig-6.png"/>
</fig>
</sec>
<sec id="s5_2_3">
<label>5.2.3</label>
<title>Comparing BIDTC with Traditional Classification Algorithms</title>
<p>In this experiment, we compare the proposed BIDTC algorithm with the existing algorithms, including the original ID3 algorithm (OIDA), the Neural Networks algorithm (NNA) [<xref ref-type="bibr" rid="ref-43">43</xref>], and the Random Forest algorithm (RFA) [<xref ref-type="bibr" rid="ref-44">44</xref>]. Without loss of generality, we generate a dataset based on the MNIST and argument it with low-quality samples from <inline-formula id="ieqn-115">
<!--<alternatives><inline-graphic xlink:href="ieqn-115.tif"/><tex-math id="tex-ieqn-115"><![CDATA[LQ]]></tex-math>--><mml:math id="mml-ieqn-115"><mml:mi>L</mml:mi><mml:mi>Q</mml:mi></mml:math>
<!--</alternatives>--></inline-formula> such that the average quality level is 0.9. The volume of the initial training dataset is 10000 in each node, while the volume of the testing dataset is 2000. <xref ref-type="table" rid="table-2">Tab. 2</xref> shows the running time and accuracy of all algorithms in the same distributed network environment.</p>

<table-wrap id="table-2">
<label>Table 2</label>
<caption>
<title>The running time and accuracy from different algorithms</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Algorithm</th>
<th>Classification time (s)</th>
<th>Classification accuracy</th>
</tr>
</thead>
<tbody>
<tr>
<td>NNA<break/>RFA<break/>OIDA<break/>BIDTC</td>
<td>38.5<break/>30.0<break/>25.2<break/>27.0</td>
<td>0.92<break/>0.86<break/>0.76<break/>0.85</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>From <xref ref-type="table" rid="table-2">Tab. 2</xref>, we can see that the running time of both the OIDA and the BIDTC is smaller than that of NNA and RFA, at the cost of slight accuracy loss. Here we define the average classification efficiency for <inline-formula id="ieqn-116">
<!--<alternatives><inline-graphic xlink:href="ieqn-116.tif"/><tex-math id="tex-ieqn-116"><![CDATA[K]]></tex-math>--><mml:math id="mml-ieqn-116"><mml:mi>K</mml:mi></mml:math>
<!--</alternatives>--></inline-formula> nodes in <xref ref-type="disp-formula" rid="eqn-10">Eq. (10)</xref>, where the <inline-formula id="ieqn-117">
<!--<alternatives><inline-graphic xlink:href="ieqn-117.tif"/><tex-math id="tex-ieqn-117"><![CDATA[CE]]></tex-math>--><mml:math id="mml-ieqn-117"><mml:mi>C</mml:mi><mml:mi>E</mml:mi></mml:math>
<!--</alternatives>--></inline-formula> is the average value of classification efficiency; the <inline-formula id="ieqn-118">
<!--<alternatives><inline-graphic xlink:href="ieqn-118.tif"/><tex-math id="tex-ieqn-118"><![CDATA[A{C_i}]]></tex-math>--><mml:math id="mml-ieqn-118"><mml:mi>A</mml:mi><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula> is the classification accuracy of the node <inline-formula id="ieqn-119">
<!--<alternatives><inline-graphic xlink:href="ieqn-119.tif"/><tex-math id="tex-ieqn-119"><![CDATA[i]]></tex-math>--><mml:math id="mml-ieqn-119"><mml:mi>i</mml:mi></mml:math>
<!--</alternatives>--></inline-formula> and the <inline-formula id="ieqn-120">
<!--<alternatives><inline-graphic xlink:href="ieqn-120.tif"/><tex-math id="tex-ieqn-120"><![CDATA[C{T_i}]]></tex-math>--><mml:math id="mml-ieqn-120"><mml:mi>C</mml:mi><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula> is the corresponding classification running time of node<inline-formula id="ieqn-121">
<!--<alternatives><inline-graphic xlink:href="ieqn-121.tif"/><tex-math id="tex-ieqn-121"><![CDATA[\; i]]></tex-math>--><mml:math id="mml-ieqn-121"><mml:mspace width="thickmathspace"></mml:mspace><mml:mi>i</mml:mi></mml:math>
<!--</alternatives>--></inline-formula>.</p>
<p><disp-formula id="eqn-10">
<label>(10)</label>
<!--<alternatives>
<graphic mimetype="image" mime-subtype="png" xlink:href="eqn-10.png"/><tex-math id="tex-eqn-10"><![CDATA[CE = 100 \times \displaystyle{{\mathop \sum \nolimits_{i = 1}^K \displaystyle{{A{C_i}} \over {C{T_i}}}} \over K}]]></tex-math>--><mml:math id="mml-eqn-10" display="block"><mml:mi>C</mml:mi><mml:mi>E</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mn>100</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mstyle scriptlevel="0" displaystyle="true"><mml:mrow><mml:mfrac><mml:mrow><mml:msubsup><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>K</mml:mi></mml:msubsup><mml:mstyle scriptlevel="0" displaystyle="true"><mml:mrow><mml:mfrac><mml:mrow><mml:mi>A</mml:mi><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mrow><mml:mi>C</mml:mi><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:mfrac></mml:mrow></mml:mstyle></mml:mrow><mml:mi>K</mml:mi></mml:mfrac></mml:mrow></mml:mstyle></mml:math>
<!--</alternatives>--></disp-formula></p>
<p><xref ref-type="fig" rid="fig-7">Fig. 7</xref> shows how the classification efficiency <inline-formula id="ieqn-122">
<!--<alternatives><inline-graphic xlink:href="ieqn-122.tif"/><tex-math id="tex-ieqn-122"><![CDATA[CE]]></tex-math>--><mml:math id="mml-ieqn-122"><mml:mi>C</mml:mi><mml:mi>E</mml:mi></mml:math>
<!--</alternatives>--></inline-formula> varies when increasing the volume of the training datasets from 10<sup>4</sup> to 3&#x002A;10<sup>4</sup>. From <xref ref-type="fig" rid="fig-7">Fig. 7</xref>, we can see that the average classification efficiency of the BIDTC is significantly higher than the other three algorithms. This is because the proposed BIDTC can take advantage of the three techniques: blockchain-based ID3 decision tree, enhanced homomorphic encryption, and stimulation smart contract to effectively conduct classification in the distributed environment.</p>
<fig id="fig-7">
<label>Figure 7</label>
<caption>
<title>The classification efficiency from different algorithms</title>
</caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="IASC_17154-fig-7.png"/>
</fig>
</sec>
</sec>
</sec>
<sec id="s6">
<label>6</label>
<title>Conclusion and Future Direction</title>
<p>In this work, we have proposed a Blockchain-based improved ID3 Decision Tree Classification (BIDTC) algorithm for the distributed environment. The proposed BIDTC takes advantage of three techniques: blockchain-based ID3 decision tree, enhanced homomorphic encryption, and stimulation smart contract to conduct classification while effectively considering the data privacy and the value of the user data. The proposed BIDTC employs the proposed blockchain-based data sharing architecture to enlarge the volume of the training datasets, which is coupled with a smart contract-based stimulation scheme to enhance the quality of the training data. Our extensive experiments have shown that our algorithm significantly outperformed the existing techniques in terms of classification efficiency. In the future, we will explore how to improve the performances of the proposed algorithm for online data with high dimensions.</p>
</sec>
</body>
<back>
<ack>
<p>The authors would like to thank the anonymous reviewers of the manuscript for their valuable feedback and suggestions.</p>
</ack><fn-group>
<fn fn-type="other">
<p><bold>Funding Statement:</bold> This work was in part supported by the National Natural Science Foundation of China under Grant 11471110, the Scientific Research Fund of Hunan Provincial Education Department of China under Grant 20C1143, Hunan Province&#x2019;s Strategic and Emerging Industrial Projects under Grant 2018GK4035, Hunan Provincial Science and Technology Project Foundation under Grant 2018TP-1018 and Hunan Province&#x2019;s Changsha Zhuzhou Xiangtan National Independent Innovation Demonstration Zone projects under Grant 2017XK2058.</p>
</fn>
<fn fn-type="conflict">
<p><bold>Conflicts of Interest:</bold> The authors declare that they have no conflicts of interest to report regarding the present study.</p>
</fn>
</fn-group>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1">
<label>[1]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>A.</given-names> 
<surname>Sandryhaila</surname></string-name> and <string-name>
<given-names>J. M.</given-names> 
<surname>Moura</surname></string-name>
</person-group>, &#x201C;
<article-title>Big data analysis with signal processing on graphs: Representation and processing of massive data sets with irregular structure</article-title>,&#x201D; 
<source>IEEE Signal Processing Magazine</source>, vol. 
<volume>31</volume>, no. 
<issue>5</issue>, pp. 
<fpage>80</fpage>&#x2013;
<lpage>90</lpage>, 
<year>2014</year>.</mixed-citation>
</ref>
<ref id="ref-2">
<label>[2]</label><mixed-citation publication-type="book">
<person-group person-group-type="author"><string-name>
<given-names>V.</given-names> 
<surname>Cherkassky</surname></string-name> and <string-name>
<given-names>F. M.</given-names> 
<surname>Mulier</surname></string-name>
</person-group>, &#x201C;<chapter-title>Learning from data: Concepts, theory, and methods</chapter-title>,&#x0022; in 
<person-group person-group-type="author">
<collab>John Wiley &#x0026; Sons</collab></person-group>, 
<edition>2nd</edition> ed., 
<publisher-loc>New Jersey, USA</publisher-loc>, pp. 
<fpage>15</fpage>&#x2013;
<lpage>18</lpage>, 
<year>2007</year>. [Online]. Available: <uri>https://media.wiley.com</uri></mixed-citation>
</ref>
<ref id="ref-3">
<label>[3]</label><mixed-citation publication-type="book">
<person-group person-group-type="author"><string-name>
<given-names>C. C.</given-names> 
<surname>Aggarwal</surname></string-name>
</person-group>, &#x201C;
<chapter-title>Data ming</chapter-title>,&#x201D; <publisher-name>IBM Watson Research Center</publisher-name>, 
<publisher-loc>New York, USA</publisher-loc>: 
<publisher-name>York-town Heights</publisher-name>, pp. 
<fpage>285</fpage>&#x2013;
<lpage>292</lpage>, 
<year>2015</year>. [Online]. Available at: <uri>https://link. springer.com</uri>.</mixed-citation>
</ref>
<ref id="ref-4">
<label>[4]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>A.</given-names> 
<surname>Jain</surname></string-name>, <string-name>
<given-names>R.</given-names> 
<surname>Duin</surname></string-name> and <string-name>
<given-names>J.</given-names> 
<surname>Mao</surname></string-name>
</person-group>, &#x201C;
<article-title>Statistical pattern recognition: A review</article-title>,&#x201D; 
<source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>, vol. 
<volume>22</volume>, no. 
<issue>1</issue>, pp. 
<fpage>4</fpage>&#x2013;
<lpage>37</lpage>, 
<year>2000</year>.</mixed-citation>
</ref>
<ref id="ref-5">
<label>[5]</label><mixed-citation publication-type="book">
<person-group person-group-type="author"><string-name>
<given-names>C.</given-names> 
<surname>Aggarwal</surname></string-name>
</person-group>, 
<source>Data classification: Algorithms and applications</source>. 
<publisher-name>CRC Press</publisher-name>, 
<year>2014</year>.</mixed-citation>
</ref>
<ref id="ref-6">
<label>[6]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>H. H.</given-names> 
<surname>Ang</surname></string-name>, <string-name>
<given-names>V.</given-names> 
<surname>Gopalkrishnan</surname></string-name>, <string-name>
<given-names>I.</given-names> 
<surname>&#x017D;liobait&#x0117;</surname></string-name>, <string-name>
<given-names>M.</given-names> 
<surname>Pechenizkiy</surname></string-name> and <string-name>
<given-names>S. C.</given-names> 
<surname>Hoi</surname></string-name>
</person-group>, &#x201C;
<article-title>Predictive handling of asynchronous concept drifts in distributed environments</article-title>,&#x201D; 
<source>IEEE Transactions on Knowledge and Data Engineering</source>, vol. 
<volume>25</volume>, no. 
<issue>10</issue>, pp. 
<fpage>2343</fpage>&#x2013;
<lpage>2355</lpage>, 
<year>2013</year>.</mixed-citation>
</ref>
<ref id="ref-7">
<label>[7]</label><mixed-citation publication-type="book">
<person-group person-group-type="author"><string-name>
<given-names>M. U.</given-names> 
<surname>Khan</surname></string-name>, <string-name>
<given-names>A.</given-names> 
<surname>Nanopoulos</surname></string-name> and <string-name>
<given-names>L.</given-names> 
<surname>Schmidt-Thieme</surname></string-name>
</person-group>, &#x201C;<chapter-title>P2P RVM for distributed classification</chapter-title>,&#x201D; 
<source>Data Science, Learning by Latent Structures, and Knowledge Discovery</source>. 
<publisher-loc>Berlin Heidelberg</publisher-loc>: 
<publisher-name>Springer</publisher-name>, 
<year>2015</year>.</mixed-citation>
</ref>
<ref id="ref-8">
<label>[8]</label><mixed-citation publication-type="conf-proc">
<person-group person-group-type="author"><string-name>
<given-names>Z.</given-names> 
<surname>Xu</surname></string-name>, <string-name>
<given-names>Y.</given-names> 
<surname>Zhai</surname></string-name> and <string-name>
<given-names>Y.</given-names> 
<surname>Liu</surname></string-name>
</person-group>, &#x201C;
<article-title>Distributed semi-supervised multi-label classification with quantized communication</article-title>,&#x201D; in <conf-name>Proc. of the 12th Int&#x2019;l Conf. on Machine Learning and Computing</conf-name>, 
<conf-loc>Shenzhen, China</conf-loc>, pp. 
<fpage>57</fpage>&#x2013;
<lpage>62</lpage>, 
<year>2020</year>.</mixed-citation>
</ref>
<ref id="ref-9">
<label>[9]</label><mixed-citation publication-type="conf-proc">
<person-group person-group-type="author"><string-name>
<given-names>L.</given-names> 
<surname>Vu</surname></string-name>, <string-name>
<given-names>H. V.</given-names> 
<surname>Thuy</surname></string-name>, <string-name>
<given-names>Q. U.</given-names> 
<surname>Nguyen</surname></string-name>, <string-name>
<given-names>T. N.</given-names> 
<surname>Ngoc</surname></string-name> and <string-name>
<given-names>E.</given-names> 
<surname>Dutkiewicz</surname></string-name>
</person-group>, &#x201C;
<article-title>Time series analysis for encrypted traffic classification: A deep learning approach</article-title>,&#x201D; in <conf-name>Proc. of the 18th Int&#x2019;l Sym. on Communications and Information Technologies (ISCIT)</conf-name>, 
<conf-loc>Bangkok</conf-loc>, pp. 
<fpage>121</fpage>&#x2013;
<lpage>126</lpage>, 
<year>2018</year>.</mixed-citation>
</ref>
<ref id="ref-10">
<label>[10]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>S.</given-names> 
<surname>Rezaei</surname></string-name> and <string-name>
<given-names>X.</given-names> 
<surname>Liu</surname></string-name>
</person-group>, &#x201C;
<article-title>Deep learning for encrypted traffic classification: An overview</article-title>,&#x201D; 
<source>IEEE Communications Magazine</source>, vol. 
<volume>57</volume>, no. 
<issue>5</issue>, pp. 
<fpage>76</fpage>&#x2013;
<lpage>81</lpage>, 
<year>2019</year>.</mixed-citation>
</ref>
<ref id="ref-11">
<label>[11]</label><mixed-citation publication-type="conf-proc">
<person-group person-group-type="author"><string-name>
<given-names>W.</given-names> 
<surname>Zheng</surname></string-name>, <string-name>
<given-names>C.</given-names> 
<surname>Gou</surname></string-name>, <string-name>
<given-names>L.</given-names> 
<surname>Yan</surname></string-name> and <string-name>
<given-names>S.</given-names> 
<surname>Mo</surname></string-name>
</person-group>, &#x201C;
<article-title>Learning to classify: A flow-based relation network for encrypted traffic classification</article-title>,&#x201D; in <conf-name>Proc. of the Web Conf. 2020</conf-name>, 
<conf-loc>Taipei, China</conf-loc>, pp. 
<fpage>13</fpage>&#x2013;
<lpage>22</lpage>, 
<year>2020</year>.</mixed-citation>
</ref>
<ref id="ref-12">
<label>[12]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>Q.</given-names> 
<surname>Hu</surname></string-name>, <string-name>
<given-names>X.</given-names> 
<surname>Che</surname></string-name>, <string-name>
<given-names>L.</given-names> 
<surname>Zhang</surname></string-name>, <string-name>
<given-names>D.</given-names> 
<surname>Zhang</surname></string-name> and <string-name>
<given-names>M.</given-names> 
<surname>Guo</surname></string-name>
</person-group>, &#x201C;
<article-title>Rank entropy-based decision trees for monotonic classification</article-title>,&#x201D; 
<source>IEEE Transactions on Knowledge and Data Engineering</source>, vol. 
<volume>24</volume>, no. 
<issue>11</issue>, pp. 
<fpage>2052</fpage>&#x2013;
<lpage>2064</lpage>, 
<year>2012</year>.</mixed-citation>
</ref>
<ref id="ref-13">
<label>[13]</label><mixed-citation publication-type="conf-proc">
<person-group person-group-type="author"><string-name>
<given-names>S.</given-names> 
<surname>Patil</surname></string-name> and <string-name>
<given-names>U.</given-names> 
<surname>Kulkarni</surname></string-name>
</person-group>, &#x201C;
<article-title>Accuracy prediction for distributed decision tree using machine learning approach</article-title>,&#x201D; in <conf-name>Proc. of the 2019 3rd Int&#x2019;l Conf. on Trends in Electronics and Informatics (ICOEI)</conf-name>, 
<conf-loc>Tirunelveli, India</conf-loc>, pp. 
<fpage>1365</fpage>&#x2013;
<lpage>1371</lpage>, 
<year>2019</year>.</mixed-citation>
</ref>
<ref id="ref-14">
<label>[14]</label><mixed-citation publication-type="conf-proc">
<person-group person-group-type="author"><string-name>
<given-names>F.</given-names> 
<surname>Es-Sabery</surname></string-name> and <string-name>
<given-names>A.</given-names> 
<surname>Hair</surname></string-name>
</person-group>, &#x201C;
<article-title>An improved ID3 classification algorithm based on correlation function and weighted attribute&#x002A;</article-title>,&#x201D; in <conf-name>Proc. of the 2019 Int&#x2019;l Conf. on Intelligent Systems and Advanced Computing Sciences (ISACS)</conf-name>, 
<conf-loc>Taza, Morocco</conf-loc>, pp. 
<fpage>1</fpage>&#x2013;
<lpage>8</lpage>, 
<year>2019</year>.</mixed-citation>
</ref>
<ref id="ref-15">
<label>[15]</label><mixed-citation publication-type="conf-proc">
<person-group person-group-type="author"><string-name>
<given-names>R.</given-names> 
<surname>Choudhary</surname></string-name> and <string-name>
<given-names>M.</given-names> 
<surname>Kapoor</surname></string-name>
</person-group>, &#x201C;
<article-title>Optimal tree led approach for effective decision making to mitigate mortality rates in a varied demographic dataset</article-title>,&#x201D; in <conf-name>Proc. of the 3rd Int&#x2019;l Conf. on Internet of Things: Smart Innovation and Usages (IoT-SIU), Bhimtal</conf-name>, pp. 
<fpage>1</fpage>&#x2013;
<lpage>5</lpage>, 
<year>2018</year>.</mixed-citation>
</ref>
<ref id="ref-16">
<label>[16]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>Y.</given-names> 
<surname>Zheng</surname></string-name>
</person-group>, &#x201C;
<article-title>Decision tree algorithm for precision marketing via network channel</article-title>,&#x201D; 
<source>Computer Systems Science and Engineering</source>, vol. 
<volume>35</volume>, no. 
<issue>4</issue>, pp. 
<fpage>293</fpage>&#x2013;
<lpage>298</lpage>, 
<year>2020</year>.</mixed-citation>
</ref>
<ref id="ref-17">
<label>[17]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>K. V.</given-names> 
<surname>Uma</surname></string-name> and <string-name>
<given-names>A.</given-names> 
<surname>Alias</surname></string-name>
</person-group>, &#x201C;
<article-title>C5.0 decision tree model using tsallis entropy and association function for general and medical dataset</article-title>,&#x201D; 
<source>Intelligent Automation &#x0026; Soft Computing</source>, vol. 
<volume>26</volume>, no. 
<issue>1</issue>, pp. 
<fpage>61</fpage>&#x2013;
<lpage>70</lpage>, 
<year>2020</year>.</mixed-citation>
</ref>
<ref id="ref-18">
<label>[18]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>Z.</given-names> 
<surname>Jia</surname></string-name>, <string-name>
<given-names>Q.</given-names> 
<surname>Han</surname></string-name>, <string-name>
<given-names>Y.</given-names> 
<surname>Li</surname></string-name>, <string-name>
<given-names>Y.</given-names> 
<surname>Yang</surname></string-name> and <string-name>
<given-names>X.</given-names> 
<surname>Xing</surname></string-name>
</person-group>, &#x201C;
<article-title>Prediction of web services reliability based on decision tree classification method</article-title>,&#x201D; 
<source>Computers, Materials &#x0026; Continua</source>, vol. 
<volume>63</volume>, no. 
<issue>3</issue>, pp. 
<fpage>1221</fpage>&#x2013;
<lpage>1235</lpage>, 
<year>2020</year>.</mixed-citation>
</ref>
<ref id="ref-19">
<label>[19]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>H.</given-names> 
<surname>Xiao</surname></string-name> and <string-name>
<given-names>M.</given-names> 
<surname>Wei</surname></string-name>
</person-group>, &#x201C;
<article-title>An early warning method for sea typhoon detection based on remote sensing imagery</article-title>,&#x201D; 
<source>Journal of Coastal Research</source>, vol. 
<volume>82</volume>, no. 
<issue>sp1</issue>, pp. 
<fpage>200</fpage>&#x2013;
<lpage>205</lpage>, 
<year>2018</year>.</mixed-citation>
</ref>
<ref id="ref-20">
<label>[20]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>S.</given-names> 
<surname>Yang</surname></string-name>, <string-name>
<given-names>J. Z.</given-names> 
<surname>Guo</surname></string-name> and <string-name>
<given-names>J. W.</given-names> 
<surname>Jin</surname></string-name>
</person-group>, &#x201C;
<article-title>An improved Id3 algorithm for medical data classification</article-title>,&#x201D; 
<source>Computers &#x0026; Electrical Engineering</source>, vol. 
<volume>65</volume>, no. 
<issue>4</issue>, pp. 
<fpage>474</fpage>&#x2013;
<lpage>487</lpage>, 
<year>2018</year>.</mixed-citation>
</ref>
<ref id="ref-21">
<label>[21]</label><mixed-citation publication-type="conf-proc">
<person-group person-group-type="author"><string-name>
<given-names>S.</given-names> 
<surname>Kraidech</surname></string-name> and <string-name>
<given-names>K.</given-names> 
<surname>Jearanaitanakij</surname></string-name>
</person-group>, &#x201C;
<article-title>Reducing the depth of ID3 algorithm by combining values from neighboring important attributes</article-title>,&#x201D; in <conf-name>Proc. of the 22nd Int&#x2019;l Computer Science and Engineering Conf. (ICSEC)</conf-name>, 
<conf-loc>Chiang Mai, Thailand</conf-loc>, pp. 
<fpage>1</fpage>&#x2013;
<lpage>5</lpage>, 
<year>2018</year>.</mixed-citation>
</ref>
<ref id="ref-22">
<label>[22]</label><mixed-citation publication-type="other">
<person-group person-group-type="author"><string-name>
<given-names>R.</given-names> 
<surname>Bost</surname></string-name>, <string-name>
<given-names>R. A.</given-names> 
<surname>Popa</surname></string-name>, <string-name>
<given-names>S.</given-names> 
<surname>Tu</surname></string-name> and <string-name>
<given-names>S.</given-names> 
<surname>Goldwasser</surname></string-name>
</person-group>, &#x201C;
<article-title>Machine learning classification over encrypted data</article-title>,&#x201D; 
<comment>Cryptology ePrint Archive, Report 2014/331</comment>, 
<year>2014</year>, <uri>http://eprint.iacr.org</uri>.</mixed-citation>
</ref>
<ref id="ref-23">
<label>[23]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>X.</given-names> 
<surname>Liu</surname></string-name>, <string-name>
<given-names>R.</given-names> 
<surname>Lu</surname></string-name>, <string-name>
<given-names>J.</given-names> 
<surname>Ma</surname></string-name>, <string-name>
<given-names>L.</given-names> 
<surname>Chen</surname></string-name> and <string-name>
<given-names>B.</given-names> 
<surname>Qin</surname></string-name>
</person-group>, &#x201C;
<article-title>Privacy-preserving patient-centric clinical decision support system on Naive Bayesian classification</article-title>,&#x201D; 
<source>IEEE Journal of Biomedical and Health Informatics</source>, vol. 
<volume>20</volume>, no. 
<issue>2</issue>, pp. 
<fpage>655</fpage>&#x2013;
<lpage>668</lpage>, 
<year>2016</year>. DOI 
<pub-id pub-id-type="doi">10.1109/JBHI.2015.2407157</pub-id>.</mixed-citation>
</ref>
<ref id="ref-24">
<label>[24]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>J.</given-names> 
<surname>Bajard</surname></string-name>, <string-name>
<given-names>P.</given-names> 
<surname>Martins</surname></string-name>, <string-name>
<given-names>L.</given-names> 
<surname>Sousa</surname></string-name> and <string-name>
<given-names>V.</given-names> 
<surname>Zucca</surname></string-name>
</person-group>, &#x201C;
<article-title>Improving the efficiency of SVM classification with FHE</article-title>,&#x201D; 
<source>IEEE Transactions on Information Forensics and Security</source>, vol. 
<volume>15</volume>, pp. 
<fpage>1709</fpage>&#x2013;
<lpage>1722</lpage>, 
<year>2020</year>. DOI 
<pub-id pub-id-type="doi">10.1109/TIFS.2019.2946097</pub-id>.</mixed-citation>
</ref>
<ref id="ref-25">
<label>[25]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>M.</given-names> 
<surname>Vedara</surname></string-name> and <string-name>
<given-names>P.</given-names> 
<surname>Ezhumalai</surname></string-name>
</person-group>, &#x201C;
<article-title>Enhanced privacy preservation of cloud data by using ElGamal Elliptic Curve (EGEC) homomorphic encryption scheme</article-title>,&#x201D; 
<source>KSII Transaction on Internet and Information Systems</source>, vol. 
<volume>14</volume>, no. 
<issue>11</issue>, pp. 
<fpage>4522</fpage>&#x2013;
<lpage>4536</lpage>, 
<year>2020</year>.</mixed-citation>
</ref>
<ref id="ref-26">
<label>[26]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>W.</given-names> 
<surname>Ren</surname></string-name>, <string-name>
<given-names>X.</given-names> 
<surname>Tong</surname></string-name>, <string-name>
<given-names>J.</given-names> 
<surname>Du</surname></string-name>, <string-name>
<given-names>N.</given-names> 
<surname>Wang</surname></string-name> and <string-name>
<given-names>A. K.</given-names> 
<surname>Bashir</surname></string-name>
</person-group>, &#x201C;
<article-title>Privacy-preserving using homomorphic encryption in Mobile IoT systems</article-title>,&#x201D; 
<source>Computer Communications</source>, vol. 
<volume>165</volume>, no. 
<issue>1</issue>, pp. 
<fpage>105</fpage>&#x2013;
<lpage>111</lpage>, 
<year>2021</year>. DOI 
<pub-id pub-id-type="doi">10.1016/j.comcom.2020.10.022</pub-id>.</mixed-citation>
</ref>
<ref id="ref-27">
<label>[27]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>A.</given-names> 
<surname>Alabdulatif</surname></string-name>, <string-name>
<given-names>I.</given-names> 
<surname>Khalil</surname></string-name> and <string-name>
<given-names>X.</given-names> 
<surname>Yi</surname></string-name>
</person-group>, &#x201C;
<article-title>Towards secure big data analytic for cloud-enabled applications with fully homomorphic encryption</article-title>,&#x201D; 
<source>Journal of Parallel and Distributed Computing</source>, vol. 
<volume>137</volume>, no. 
<issue>3</issue>, pp. 
<fpage>192</fpage>&#x2013;
<lpage>204</lpage>, 
<year>2020</year>. DOI 
<pub-id pub-id-type="doi">10.1016/j.jpdc.2019.10.008</pub-id>.</mixed-citation>
</ref>
<ref id="ref-28">
<label>[28]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>N. P.</given-names> 
<surname>Smart</surname></string-name> and <string-name>
<given-names>F.</given-names> 
<surname>Vercauteren</surname></string-name>
</person-group>, &#x201C;
<article-title>Fully homomorphic SIMD operations</article-title>,&#x201D; 
<source>Designs Codes Cryptography</source>, vol. 
<volume>71</volume>, no. 
<issue>1</issue>, pp. 
<fpage>57</fpage>&#x2013;
<lpage>81</lpage>, 
<year>2014</year>. DOI 
<pub-id pub-id-type="doi">10.1007/s10623-012-9720-4</pub-id>.</mixed-citation>
</ref>
<ref id="ref-29">
<label>[29]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>X.</given-names> 
<surname>Sun</surname></string-name>, <string-name>
<given-names>P.</given-names> 
<surname>Zhang</surname></string-name>, <string-name>
<given-names>J. K.</given-names> 
<surname>Liu</surname></string-name>, <string-name>
<given-names>J.</given-names> 
<surname>Yu</surname></string-name> and <string-name>
<given-names>W.</given-names> 
<surname>Xie</surname></string-name>
</person-group>, &#x201C;
<article-title>Private machine learning classification based on fully homomorphic encryption</article-title>,&#x201D; 
<source>IEEE Transactions on Emerging Topics in Computing</source>, vol. 
<volume>8</volume>, no. 
<issue>2</issue>, pp. 
<fpage>352</fpage>&#x2013;
<lpage>364</lpage>, 
<year>2020</year>.</mixed-citation>
</ref>
<ref id="ref-30">
<label>[30]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>Y.</given-names> 
<surname>Yuan</surname></string-name> and <string-name>
<given-names>F.</given-names> 
<surname>Wang</surname></string-name>
</person-group>, &#x201C;
<article-title>Blockchain and cryptocurrencies: Model, techniques, and applications</article-title>,&#x201D; 
<source>IEEE Transactions on Systems, Man, and Cybernetics: Systems</source>, vol. 
<volume>48</volume>, no. 
<issue>9</issue>, pp. 
<fpage>1421</fpage>&#x2013;
<lpage>1428</lpage>, 
<year>2018</year>. DOI 
<pub-id pub-id-type="doi">10.1109/TSMC.2018.2854904</pub-id>.</mixed-citation>
</ref>
<ref id="ref-31">
<label>[31]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>Y.</given-names> 
<surname>Yu</surname></string-name>, <string-name>
<given-names>Y.</given-names> 
<surname>Li</surname></string-name>, <string-name>
<given-names>J.</given-names> 
<surname>Tian</surname></string-name> and <string-name>
<given-names>J.</given-names> 
<surname>Liu</surname></string-name>
</person-group>, &#x201C;
<article-title>Blockchain-based solutions to security and privacy issues in the Internet of Things</article-title>,&#x201D; 
<source>IEEE Wireless Communications</source>, vol. 
<volume>25</volume>, no. 
<issue>6</issue>, pp. 
<fpage>12</fpage>&#x2013;
<lpage>18</lpage>, 
<year>2018</year>. DOI 
<pub-id pub-id-type="doi">10.1109/MWC.2017.1800116</pub-id>.</mixed-citation>
</ref>
<ref id="ref-32">
<label>[32]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>S.</given-names> 
<surname>Wang</surname></string-name>, <string-name>
<given-names>J.</given-names> 
<surname>Wang</surname></string-name>, <string-name>
<given-names>X.</given-names> 
<surname>Wang</surname></string-name>, <string-name>
<given-names>T.</given-names> 
<surname>Qiu</surname></string-name>, <string-name>
<given-names>Y.</given-names> 
<surname>Yuan</surname></string-name> <etal>et al.</etal>
</person-group><italic>,</italic> &#x201C;
<article-title>Blockchain-powered parallel healthcare systems based on the ACP approach</article-title>,&#x201D; 
<source>IEEE Transactions on Computational Social Systems</source>, vol. 
<volume>5</volume>, no. 
<issue>4</issue>, pp. 
<fpage>942</fpage>&#x2013;
<lpage>950</lpage>, 
<year>2018</year>.</mixed-citation>
</ref>
<ref id="ref-33">
<label>[33]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>P.</given-names> 
<surname>Cui</surname></string-name>, <string-name>
<given-names>J.</given-names> 
<surname>Dixon</surname></string-name>, <string-name>
<given-names>U.</given-names> 
<surname>Guin</surname></string-name> and <string-name>
<given-names>D.</given-names> 
<surname>Dimase</surname></string-name>
</person-group>, &#x201C;
<article-title>A blockchain-based framework for supply chain provenance</article-title>,&#x201D; 
<source>IEEE Access</source>, vol. 
<volume>7</volume>, pp. 
<fpage>157113</fpage>&#x2013;
<lpage>157125</lpage>, 
<year>2019</year>.</mixed-citation>
</ref>
<ref id="ref-34">
<label>[34]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>B.</given-names> 
<surname>Bordel</surname></string-name>, <string-name>
<given-names>R.</given-names> 
<surname>Alcarria</surname></string-name>, <string-name>
<given-names>D.</given-names> 
<surname>Martin</surname></string-name> and <string-name>
<given-names>A.</given-names> 
<surname>Sanchez-Picot</surname></string-name>
</person-group>, &#x201C;
<article-title>Trust provision in the internet of things using transversal blockchain networks</article-title>,&#x201D; 
<source>Intelligent Automation &#x0026; Soft Computing</source>, vol. 
<volume>25</volume>, no. 
<issue>1</issue>, pp. 
<fpage>155</fpage>&#x2013;
<lpage>170</lpage>, 
<year>2019</year>.</mixed-citation>
</ref>
<ref id="ref-35">
<label>[35]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>B. L.</given-names> 
<surname>Nguyen</surname></string-name>, <string-name>
<given-names>E. L.</given-names> 
<surname>Lydia</surname></string-name>, <string-name>
<given-names>M.</given-names> 
<surname>Elhoseny</surname></string-name>, <string-name>
<given-names>I. V.</given-names> 
<surname>Pustokhina</surname></string-name>, <string-name>
<given-names>D. A.</given-names> 
<surname>Pustokhin</surname></string-name> <etal>et al.</etal>
</person-group><italic>,</italic> &#x201C;
<article-title>Privacy preserving blockchain technique to achieve secure and reliable sharing of IoT data</article-title>,&#x201D; 
<source>Computers, Materials &#x0026; Continua</source>, vol. 
<volume>65</volume>, no. 
<issue>1</issue>, pp. 
<fpage>87</fpage>&#x2013;
<lpage>107</lpage>, 
<year>2020</year>.</mixed-citation>
</ref>
<ref id="ref-36">
<label>[36]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>J.</given-names> 
<surname>Wang</surname></string-name>, <string-name>
<given-names>W.</given-names> 
<surname>Chen</surname></string-name>, <string-name>
<given-names>L.</given-names> 
<surname>Wang</surname></string-name>, <string-name>
<given-names>R. S.</given-names> 
<surname>Sherratt</surname></string-name> and <string-name>
<given-names>A.</given-names> 
<surname>Tolba</surname></string-name>
</person-group>, &#x201C;
<article-title>Data secure storage mechanism of sensor networks based on blockchain</article-title>,&#x201D; 
<source>Computers, Materials &#x0026; Continua</source>, vol. 
<volume>65</volume>, no. 
<issue>3</issue>, pp. 
<fpage>2365</fpage>&#x2013;
<lpage>2384</lpage>, 
<year>2020</year>.</mixed-citation>
</ref>
<ref id="ref-37">
<label>[37]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>Y.</given-names> 
<surname>Qian</surname></string-name>, <string-name>
<given-names>Y.</given-names> 
<surname>Jiang</surname></string-name>, <string-name>
<given-names>L.</given-names> 
<surname>Hu</surname></string-name>, <string-name>
<given-names>M. S.</given-names> 
<surname>Hossain</surname></string-name>, <string-name>
<given-names>M.</given-names> 
<surname>Alrashoud</surname></string-name> <etal>et al.</etal>
</person-group><italic>,</italic> &#x201C;
<article-title>Blockchain-based privacy-aware content caching in cognitive Internet of vehicles</article-title>,&#x201D; 
<source>IEEE Network</source>, vol. 
<volume>34</volume>, no. 
<issue>2</issue>, pp. 
<fpage>46</fpage>&#x2013;
<lpage>51</lpage>, 
<year>2020</year>.</mixed-citation>
</ref>
<ref id="ref-38">
<label>[38]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>L.</given-names> 
<surname>Zhu</surname></string-name>, <string-name>
<given-names>H.</given-names> 
<surname>Yu</surname></string-name>, <string-name>
<given-names>S.</given-names> 
<surname>Zhan</surname></string-name>, <string-name>
<given-names>W.</given-names> 
<surname>Qiu</surname></string-name> and <string-name>
<given-names>Q.</given-names> 
<surname>Li</surname></string-name>
</person-group>, &#x201C;
<article-title>Research on high-performance consortium blockchain technology</article-title>,&#x201D; 
<source>Journal of Software</source>, vol. 
<volume>30</volume>, no. 
<issue>6</issue>, pp. 
<fpage>1577</fpage>&#x2013;
<lpage>1593</lpage>, 
<comment>(in Chinese)</comment>, 
<year>2019</year>, <uri>http://www.jos.org.cn/1000-9825/5737.htm</uri>.</mixed-citation>
</ref>
<ref id="ref-39">
<label>[39]</label><mixed-citation publication-type="book">
<person-group person-group-type="author"><string-name>
<given-names>W.</given-names> 
<surname>He</surname></string-name>
</person-group>, &#x201C;
<chapter-title>Research on key technologies of privacy-preserving machine learning based on homomorphic encryption</chapter-title>,&#x201D; 
<comment>M.S. thesis</comment>, <publisher-name>Dept. of Comp. Science &#x0026; Eng., Univ. of Electronic Science and Technology of China</publisher-name>, 
<publisher-loc>Chengdu, China</publisher-loc>, 
<year>2019</year>.</mixed-citation>
</ref>
<ref id="ref-40">
<label>[40]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>X.</given-names> 
<surname>Huang</surname></string-name>, <string-name>
<given-names>D.</given-names> 
<surname>Ye</surname></string-name>, <string-name>
<given-names>R.</given-names> 
<surname>Yu</surname></string-name> and <string-name>
<given-names>L.</given-names> 
<surname>Shu</surname></string-name>
</person-group>, &#x201C;
<article-title>Securing parked vehicle assisted fog computing with blockchain and optimal smart contract design</article-title>,&#x201D; 
<source>IEEE/ CAA Journal of Automatica Sinica</source>, vol. 
<volume>7</volume>, no. 
<issue>2</issue>, pp. 
<fpage>426</fpage>&#x2013;
<lpage>441</lpage>, 
<year>2020</year>.</mixed-citation>
</ref>
<ref id="ref-41">
<label>[41]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>Z.</given-names> 
<surname>Zheng</surname></string-name>, <string-name>
<given-names>S.</given-names> 
<surname>Xie</surname></string-name>, <string-name>
<given-names>H. N.</given-names> 
<surname>Dai</surname></string-name>, <string-name>
<given-names>W.</given-names> 
<surname>Chen</surname></string-name> and <string-name>
<given-names>M.</given-names> 
<surname>Imran</surname></string-name>
</person-group>, &#x201C;
<article-title>An overview on smart contracts: Challenges, advances and platforms</article-title>,&#x201D; 
<source>Future Generation Computer Systems</source>, vol. 
<volume>105</volume>, no. 
<issue>5</issue>, pp. 
<fpage>475</fpage>&#x2013;
<lpage>491</lpage>, 
<year>2020</year>.</mixed-citation>
</ref>
<ref id="ref-42">
<label>[42]</label><mixed-citation publication-type="other">
<person-group person-group-type="author"><string-name>
<given-names>Y.</given-names> 
<surname>LeCun</surname></string-name> and <string-name>
<given-names>C.</given-names> 
<surname>Cortes</surname></string-name>
</person-group>, 
<article-title>MNIST Handwritten Digit Database</article-title>. 
<year>2010</year>. [Online]. Available at: <uri>http://yann.lecun.com/exdb/mnist</uri>.</mixed-citation>
</ref>
<ref id="ref-43">
<label>[43]</label><mixed-citation publication-type="conf-proc">
<person-group person-group-type="author"><string-name>
<given-names>M. M. A.</given-names> 
<surname>Ghosh</surname></string-name> and <string-name>
<given-names>A. Y.</given-names> 
<surname>Maghari</surname></string-name>
</person-group>, &#x201C;
<article-title>A comparative study on hand-writing digit recognition using neural networks</article-title>,&#x201D; in <conf-name>Proc. of the 2017 Int&#x2019;l Conf. on Promising Electronic Technologies (ICPET), Deir El-Balah</conf-name>, pp. 
<fpage>77</fpage>&#x2013;
<lpage>81</lpage>, 
<year>2017</year>.</mixed-citation>
</ref>
<ref id="ref-44">
<label>[44]</label><mixed-citation publication-type="conf-proc">
<person-group person-group-type="author"><string-name>
<given-names>A.</given-names> 
<surname>More</surname></string-name> and <string-name>
<given-names>D.</given-names> 
<surname>Rana</surname></string-name>
</person-group>, &#x201C;
<article-title>Review of random forest classification techniques to resolve data imbalance</article-title>,&#x201D; in <conf-name>Proc. of the 2017 Int&#x2019;l Conf. on Intelligent Systems and Information Management (ICISIM), Aurangabad</conf-name>, pp. 
<fpage>72</fpage>&#x2013;
<lpage>78</lpage>, 
<year>2017</year>.</mixed-citation>
</ref>
</ref-list>
</back>
</article>