<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">Oncologie</journal-id>
<journal-id journal-id-type="nlm-ta">Oncologie</journal-id>
<journal-id journal-id-type="publisher-id">Oncologie</journal-id>
<journal-title-group>
<journal-title>Oncologie</journal-title>
</journal-title-group>
<issn pub-type="epub">1765-2839</issn>
<issn pub-type="ppub">1292-3818</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">21256</article-id>
<article-id pub-id-type="doi">10.32604/oncologie.2022.021256</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>An ISSA-RF Algorithm for Prediction Model of Drug Compound Molecules Antagonizing ER&#x03B1; Gene Activity</article-title><alt-title alt-title-type="left-running-head">An ISSA-RF Algorithm for Prediction Model of Drug Compound Molecules Antagonizing ER&#x03B1; Gene Activity</alt-title><alt-title alt-title-type="right-running-head">An ISSA-RF Algorithm for Prediction Model of Drug Compound Molecules Antagonizing ER&#x03B1; Gene Activity</alt-title>
</title-group>
<contrib-group content-type="authors">
<contrib id="author-1" contrib-type="author">
<name name-style="western"><surname>Rong</surname><given-names>Minxi</given-names></name>
<xref ref-type="aff" rid="aff-1">1</xref>
</contrib>
<contrib id="author-2" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Li</surname><given-names>Yong</given-names></name>
<xref ref-type="aff" rid="aff-1">1</xref><email>liyong3880@163.com</email>
</contrib>
<contrib id="author-3" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Guo</surname><given-names>Xiaoli</given-names></name>
<xref ref-type="aff" rid="aff-1">1</xref><email>xlguo@zzuli.edu.cn</email>
</contrib>
<contrib id="author-4" contrib-type="author">
<name name-style="western"><surname>Zong</surname><given-names>Tao</given-names></name>
<xref ref-type="aff" rid="aff-2">2</xref>
</contrib>
<contrib id="author-5" contrib-type="author">
<name name-style="western"><surname>Ma</surname><given-names>Zhiyuan</given-names></name>
<xref ref-type="aff" rid="aff-2">2</xref>
</contrib>
<contrib id="author-6" contrib-type="author">
<name name-style="western"><surname>Li</surname><given-names>Penglei</given-names></name>
<xref ref-type="aff" rid="aff-2">2</xref>
</contrib>
<aff id="aff-1"><label>1</label><institution>College of Mathematics and Information Science, Zhengzhou University of Light Industry</institution>, <addr-line>Zhengzhou, 450002</addr-line>, <country>China</country></aff>
<aff id="aff-2"><label>2</label><institution>College of Electrical and Information Engineering, Zhengzhou University of Light Industry</institution>, <addr-line>Zhengzhou, 450002</addr-line>, <country>China</country></aff>
</contrib-group><author-notes><corresp id="cor1"><label>&#x002A;</label>Corresponding Authors: Xiaoli Guo. Email: <email>xlguo@zzuli.edu.cn</email>; Yong Li. Email: <email>liyong3880@163.com</email></corresp></author-notes>
<pub-date pub-type="epub" date-type="pub" iso-8601-date="2022-06-28"><day>28</day>
<month>06</month>
<year>2022</year></pub-date>
<volume>24</volume>
<issue>2</issue>
<fpage>309</fpage>
<lpage>327</lpage>
<history>
<date date-type="received"><day>05</day><month>1</month><year>2022</year></date>
<date date-type="accepted"><day>25</day><month>4</month><year>2022</year></date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2022 Rong et al.</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Rong et al.</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_Oncologie_21256.pdf"></self-uri>
<abstract>
<p><bold>Objectives:</bold> The ER&#x03B1; biological activity prediction model is constructed by the compound molecular data of the anti-breast cancer therapeutic target ER&#x03B1; and its biological activity data, which improves the screening efficiency of anti-breast cancer drug candidates and saves the time and cost of drug development. <bold>Methods:</bold> In this paper, Ridge model is used to screen out molecular descriptors with a high degree of influence on the biological activity of Er&#x03B1; and divide datasets with different numbers of the molecular descriptors by screening results. Random Forest (RF) is trained by Root Mean Square Error (RMSE) and Coefficient of determination (<inline-formula id="ieqn-1">
<mml:math id="mml-ieqn-1"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:math>
</inline-formula>) to determine the parameter range of RF optimized by Improved Sparrow Search Algorithm (ISSA-RF) which adds adaptive weights compared with the ordinary Sparrow Search Algorithm (SSA). Then the divided datasets were put into the ISSA-RF with defined parameter ranges to construct a regression prediction model for the biological activity of compounds on Er&#x03B1;, and compared with Genetic Algorithm Optimized Support Vector Machine (GA-SVM), Back Propagation Neural Network (BP), Extreme Gradient Boosting (XGBoost) for analysis and comparison. <bold>Results:</bold> We have tried a variety of combinations of molecular descriptors with different numbers and the above four models all achieve the best accuracy model on the dataset constructed when using 100 molecular descriptors. The ISSA-RF model proposed in this paper has a high degree of agreement between the predicted biological value of ER&#x03B1; and the actual value and prediction accuracy (RMSE) is 0.6876389. <bold>Conclusions:</bold> In the training model, ISSA-RF is proposed and it is proved that adding adaptive weights can greatly optimize the fitness accuracy of the sparrow algorithm. In the experimental part, this paper uses a variety of molecular descriptors for training, which reduces the chance of model training accuracy caused by the number of different molecular descriptors, and limits the search range of the ISSA-RF model to avoid the local optimization of the model. Secondly, the parameter optimization time is greatly reduced. In conclusion, the prediction model of drug compound molecules that antagonize ER&#x03B1; gene activity (ISSA-RF) proposed in this paper improves the accuracy and efficiency of anti-breast cancer drug candidates, and provides a new idea for building a quantitative structure-activity relationship model.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Anti-breast cancer drug candidates</kwd>
<kwd>machine learning</kwd>
<kwd>ridge regression</kwd>
<kwd>random forest</kwd>
<kwd>sparrow search algorithm</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<label>1</label>
<title>Introduction</title>
<p>In recent years, with the rapid development of human society, the global environment has been irreversibly destroyed and bring various diseases that have never been seen before [<xref ref-type="bibr" rid="ref-1">1</xref>]. Drug therapy is an important means to control and treat diseases. Traditional drug research and development cycle is long and the efficiency is low [<xref ref-type="bibr" rid="ref-2">2</xref>]. To reduce the cost and time of drug development, Quantitative Structure-Activity Relationship models (QSAR) are often used to construct drug compounds and target cell activity in drug discovery and development. The model is then used to predict target cell activity corresponding to new or structurally altered drug compounds. The candidate drug compound molecules are screened out according to the predicted biological activity value of the target cell, so as to achieve the purpose of computer-aided selection of the candidate drug compound.</p>
<p>Breast cancer [<xref ref-type="bibr" rid="ref-3">3</xref>] is a common female disease and one of the cancers with a higher mortality rate. In the early 1970s, Pietras discovered that estrogen can rapidly up-regulate the cAMP level of endometrial cells through cell membrane binding sites, and therefore speculated that there is a membrane ER. For the first time, the definition of Estrogen Receptor (ER) has been elaborated. Afterwards, estrogen was confirmed to be directly related to the malignant proliferation of breast cancer cells, and the viewpoint that breast cancer cells depend on estrogen receptors for growth was recognized. ER&#x03B1; is an important target for the treatment of breast cancer, if it is possible to find suitable drug candidates based on ER&#x03B1; activity value and molecular related factors of candidate drug compounds, it will become an effective method. In recent years, the use of machine learning [<xref ref-type="bibr" rid="ref-4">4</xref>,<xref ref-type="bibr" rid="ref-5">5</xref>] in the medical field provides an effective way for drug research and development. Machine learning can study the potential relationship between ER&#x03B1; activity and drug compound molecules, and build a Quantitative Structure-Activity Relationship model (QSAR) of anti-breast cancer drug candidates in order to select suitable drug compound molecules, which can not only improve the time efficiency but also provide a variety of options for the development of anti-breast cancer drugs.</p>
</sec>
<sec id="s2">
<label>2</label>
<title>Related Research</title>
<p>Machine learning is a method in which a computer automatically finds rules from the input data and can predict unknown data through such rules. It has powerful big data computing capabilities and plays an important role in data processing and data mining. In previous studies, machine learning has a strong application background in classification in the medical field. Jiang et al. [<xref ref-type="bibr" rid="ref-6">6</xref>] used the annealing algorithm and Random Forest (RF) to determine the optimal characteristics of BCRP inhibitors, and used four machine learning methods, deep learning methods, and integrated learning methods to predict BCRP inhibitors, and then evaluated the drug&#x2019;s effectiveness. The results showed that the Support Vector Machine(SVM) classifier showed the best classification effect, the Mathew&#x2019;s Correlation Coefficient(MCC) value of the test set was 0.812, and the Area Under Curve (AUC) value was 0.958. Che et al. [<xref ref-type="bibr" rid="ref-7">7</xref>] used three non-integrated machine learning algorithms, Back Propagation Neural Network (BP) and three Boosting series algorithms to predict prostate cancer. The results showed that the Decision Trees model in the non-integrated algorithm is the best with an accuracy rate of 0.933, and Extreme Gradient Boosting model (XGBoost) in the Boosting series of algorithms is the best with an accuracy rate of 0.957. Wang et al. [<xref ref-type="bibr" rid="ref-8">8</xref>] used word vector representation technology to characterize the main feature data, and then used the XGBoost model to learn the correlation between the features to identify the pathogens of food-borne diseases. The results showed that the precision rate and recall rate are 68%. Lu et al. [<xref ref-type="bibr" rid="ref-9">9</xref>] used Support Vector Machine (SVM), K-Nearest Neighbor (K-NN), Decision Trees and other methods to construct classification models for neuraminidase (NA) inhibitors and non-neuraminidase inhibitors. The results showed that the SVM algorithm gave the optimal prediction accuracy is 92.6%.</p>
<p>Machine learning does not have such a significant application background in prediction in the medical field and the main reason is that the feature dimension of the medical data set is large, which has a complicated impact on the prediction results, and the prediction results cannot achieve accurate prediction. This reflects from the side that the main purpose of prediction in the medical field is to assist medical experiments. Sheridan et al. [<xref ref-type="bibr" rid="ref-10">10</xref>] mainly discussed the applicability of XGBoost in QSAR model in the paper and use Grid algorithm to optimize the model parameters. The experimental average determination coefficient (<inline-formula id="ieqn-2">
<mml:math id="mml-ieqn-2"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:math>
</inline-formula>) reached 0.42 which is similar to the neural network. However, the parameters range of grid optimization is defined by the author, and the range of optimization parameters has not been determined by relevant experiments. Mansouri et al. [<xref ref-type="bibr" rid="ref-11">11</xref>] provided a variety of open source QSAR models to predict the strongest acidic and strongest basic pKas of chemicals. The SVM algorithm combined with K-NN, XGBoost, and Deep Neural Network(DNN) are used to predict different open-source data sets. The optimal results show that the prediction accuracy of the deep neural network is high, and the RMSE of the optimal prediction value is 1.5, <inline-formula id="ieqn-3">
<mml:math id="mml-ieqn-3"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:math>
</inline-formula> is 0.8.</p>
<p>The widespread use of machine learning [<xref ref-type="bibr" rid="ref-12">12</xref>&#x2013;<xref ref-type="bibr" rid="ref-14">14</xref>] provides a new direction for drug research and development, and the idea of integrated learning has gradually developed in the field of machine learning. The idea of ensemble learning is to solve the shortcomings of a single model and to integrate multiple models to avoid the limitations of a single model. Random Forest(RF) is one of ensemble learning.</p>
<p>Since there are many training parameters for the integrated learning RF model, most scholars will train the parameters that have a greater impact on the model. Zheng et al. [<xref ref-type="bibr" rid="ref-15">15</xref>] chosed to train the n_estimators and max_depth in the coal spontaneous combustion temperature prediction model, but this did not give full play to the random selection of features by RF and the influence of the parameter max_features on the model was not considered. Most scholars believe that the choice of max_features will reduce the diversity of a single decision tree and reduce the accuracy of the RF model, so if we choose max_features, we should consider the above two aspects at the same time.</p>
<p>In this paper, Sparrow Search Algorithm (SSA) is combined with RF model, and adaptive weights are added to the SSA finder position update formula, and an ISSA-RF model is proposed. Before the training data was input into the model, the Ridge model was used to screen out molecular descriptors that had a greater impact on the activity of the ER&#x03B1; gene. Since the number of molecular descriptor inputs is uncertain, the dataset is divided by combinations of screening features with different numbers. These data are then trained using the RF model alone with RMSE and <inline-formula id="ieqn-4">
<mml:math id="mml-ieqn-4"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:math>
</inline-formula> to determine the sparrow search range of max_depth, n_estimators, max_features. Then, the divided datasets were put into the ISSA-RF model to construct the QSAR model.</p>
<p>In order to verify the accuracy of the model, Genetic Algorithm Optimized Support Vector Machine (GA-SVM), Back Propagation Neural Network (BP), and Extreme Gradient Boosting (XGBoost) were used to construct a quantitative prediction model for the biological activity of drug compound molecules on Er&#x03B1; in this paper. After experimental comparison, the ISSA-RF model proposed in this paper is superior to the other three models, which can improve the efficiency of screening candidate drug molecules while ensuring the accuracy of prediction, and provides a new idea for the construction of QSAR in terms of model optimization.</p>
<p>This paper introduces <bold>Principles and Methods</bold> (<italic>Data Source</italic>, <italic>Data Preprocessing</italic>, <italic>Basic Models</italic>, <italic>Model Construction</italic> and <italic>Model Evaluation Index</italic>) in the third part, and displays the experimental results in the fourth part <bold>Analysis of Results</bold>, and the fifth part is the <bold>Conclusion</bold> of the article.</p>
</sec>
<sec id="s3">
<label>3</label>
<title>Principles and Methods</title>
<sec id="s3_1">
<label>3.1</label>
<title>Data Source</title>
<p>The data in this article comes from the DrugBank drug molecule database at the University of Alberta [<xref ref-type="bibr" rid="ref-16">16</xref>]. In order to all readers to view the data, we have put the data on the github website (<uri xlink:href="https://github.com/Li519445444/candidate-drug-data-source/tree/master">https://github.com/Li519445444/candidate-drug-data-source/tree/master</uri>). This data set provides:<list list-type="alpha-lower"><list-item>
<p>The biological activity data of 1974 drug compounds on ER&#x03B1; and the biological activity value of the compound against ER&#x03B1; including IC50 and pIC50. The unit of IC50 is nM. The smaller the value, the greater the biological activity and the more effective it is to inhibit ER&#x03B1; activity. The pIC50 is the negative logarithm of the IC50, and this value is usually positively correlated with biological activity, that is, the larger the pIC50 value, the higher the biological activity. Generally, the pIC50 is used to indicate the biological activity.</p></list-item><list-item>
<p>729 molecular descriptor information for 1974 drug compounds. The molecular descriptor of a compound is a series of parameters used to describe the structure and properties of the compound, including physical and chemical properties (such as molecular weight, LogP, etc.), topological structure characteristics (such as the number of hydrogen bond donors, the number of hydrogen bond acceptors, etc.) and so on.</p></list-item></list></p>
</sec>
<sec id="s3_2">
<label>3.2</label>
<title>Data Preprocessing</title>
<p>In view of the problem of non-standard data standards, this paper adopts the following measures to solve the problem:<list list-type="alpha-lower"><list-item>
<p>Through data observation, it is found that there are columns with all 0 values in the data. Therefore, 729 molecular descriptors whose information is all 0 are eliminated. Because these descriptors have no role in feature screening and prediction and they have no practical significance in drug development.</p></list-item><list-item>
<p>The values of drug compound molecules have a high degree of dispersion and there are abnormal values. In order to improve the accuracy of the model prediction, this paper uses the RobustScaler function to scale the features by robust statistical information to the abnormal data. Before and after data processing are shown in the <xref ref-type="fig" rid="fig-1">Fig. 1</xref>.</p></list-item></list></p>
<fig id="fig-1">
<label>Figure 1</label>
<caption>
<title>Before and after data standardization</title></caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="ONCOLOGIE_21256-fig-1.png"/>
</fig>
</sec>
<sec id="s3_3">
<label>3.3</label>
<title>Basic Models</title>
<sec id="s3_3_1">
<label>3.3.1</label>
<title>Ridge Model</title>
<p>When inputting high-dimensional features into machine learning models, there will be some features in the model that are not related to the training target or the features are redundant [<xref ref-type="bibr" rid="ref-17">17</xref>], and these redundant features not only make the prediction results of the algorithm inaccurate, but also consume computing time and computer memory. There are many excellent algorithms for the selection of data features, such as: Lasso [<xref ref-type="bibr" rid="ref-18">18</xref>], Ridge, Principal Component Analysis [<xref ref-type="bibr" rid="ref-19">19</xref>], etc. This paper adopts Ridge algorithm which is faster in calculation and better in effect.</p>
<p>Enumerate the expression form of the ridge regression algorithm, in <xref ref-type="disp-formula" rid="eqn-1">Eq. (1)</xref>, <inline-formula id="ieqn-5">
<mml:math id="mml-ieqn-5"><mml:mi>&#x03BC;</mml:mi></mml:math>
</inline-formula> is called zero parameter.</p>
<p><disp-formula id="eqn-1"><label>(1)</label>
<mml:math id="mml-eqn-1" display="block"><mml:mrow><mml:msup><mml:mrow><mml:mover><mml:mi>&#x03B2;</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>R</mml:mi><mml:mi>i</mml:mi><mml:mi>d</mml:mi><mml:mi>g</mml:mi><mml:mi>e</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup><mml:mi>X</mml:mi><mml:mo>+</mml:mo><mml:mi>&#x03BC;</mml:mi><mml:mi>I</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:msup><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup><mml:mi>y</mml:mi></mml:math>
</disp-formula></p>
<p>Take the value of the minimum penalty likelihood function as the estimated value of the regression coefficient, in <xref ref-type="disp-formula" rid="eqn-2">Eq. (2)</xref>, The penalty is <inline-formula id="ieqn-6">
<mml:math id="mml-ieqn-6"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi>&#x03BB;</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:mi>B</mml:mi><mml:mo>|</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>&#x03BB;</mml:mi><mml:mrow><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>P</mml:mi></mml:munderover><mml:msup><mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>&#x03B2;</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>|</mml:mo></mml:mrow></mml:mrow><mml:mi>m</mml:mi></mml:msup></mml:mrow></mml:math>
</inline-formula>, <inline-formula id="ieqn-7">
<mml:math id="mml-ieqn-7"><mml:mi>m</mml:mi><mml:mo>&#x003E;</mml:mo><mml:mn>0</mml:mn></mml:math>
</inline-formula>, <inline-formula id="ieqn-8">
<mml:math id="mml-ieqn-8"><mml:mi>&#x03BB;</mml:mi></mml:math>
</inline-formula> is the adjustment parameter.</p>
<p><disp-formula id="eqn-2"><label>(2)</label>
<mml:math id="mml-eqn-2" display="block"><mml:mover><mml:mrow><mml:mi>&#x03B2;</mml:mi></mml:mrow><mml:mo>&#x2227;</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mi>arg</mml:mi><mml:mo>&#x2061;</mml:mo><mml:munder><mml:mrow><mml:mo form="prefix">min</mml:mo></mml:mrow><mml:mi>&#x03B2;</mml:mi></mml:munder><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mo symmetric="true">&#x2016;</mml:mo><mml:mrow><mml:mi>Y</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mi>&#x03B2;</mml:mi><mml:mi>X</mml:mi></mml:mrow><mml:mo symmetric="true">&#x2016;</mml:mo></mml:mrow></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi>&#x03BB;</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:mi>&#x03B2;</mml:mi><mml:mo>|</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
</disp-formula></p>
<p>When m is 2, that is the Ridge penalty item. The expression form of Ridge regression can be obtained in <xref ref-type="disp-formula" rid="eqn-3">Eq. (3)</xref>.</p>
<p><disp-formula id="eqn-3"><label>(3)</label>
<mml:math id="mml-eqn-3" display="block"><mml:mrow><mml:msup><mml:mrow><mml:mover><mml:mi>&#x03B2;</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>R</mml:mi><mml:mi>i</mml:mi><mml:mi>d</mml:mi><mml:mi>g</mml:mi><mml:mi>e</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mo>=</mml:mo><mml:mi>arg</mml:mi><mml:mo>&#x2061;</mml:mo><mml:munder><mml:mrow><mml:mo form="prefix">min</mml:mo></mml:mrow><mml:mi>&#x03B2;</mml:mi></mml:munder><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mrow><mml:msup><mml:mrow><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:munderover><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mi>&#x03B2;</mml:mi><mml:mn>0</mml:mn><mml:mo>&#x2212;</mml:mo><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>P</mml:mi></mml:munderover><mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>&#x03B2;</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow><mml:mo>+</mml:mo><mml:mi>&#x03BB;</mml:mi><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>P</mml:mi></mml:munderover><mml:mrow><mml:msup><mml:mrow><mml:msub><mml:mi>&#x03B2;</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:math>
</disp-formula></p>
<p>In Ridge regression, the high-dimensional data <inline-formula id="ieqn-9">
<mml:math id="mml-ieqn-9"><mml:mi>X</mml:mi></mml:math>
</inline-formula> has been centered and standardized, so that the size of the standardized ridge regression coefficients can be directly compared to judge the importance of high-dimensional features. The size of the regression coefficient reflects the importance of high-dimensional features to <inline-formula id="ieqn-10">
<mml:math id="mml-ieqn-10"><mml:mi>Y</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mo>.</mml:mo><mml:mo>.</mml:mo><mml:mo>.</mml:mo><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>N</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:math>
</inline-formula> in the model. In this paper, the low-importance features are eliminated, and the high-importance input model is used for training.</p>
</sec>
<sec id="s3_3_2">
<label>3.3.2</label>
<title>Sparrow Search Algorithm</title>
<p>Sparrow Search Algorithm (SSA) [<xref ref-type="bibr" rid="ref-20">20</xref>] is a group behavior algorithm inspired by the foraging behavior and anti-predation behavior of sparrows. Individuals in the population are divided into discoverers, followers and alerters according to the division of labor. The discoverers mainly provide foraging directions and areas for the entire population. The followers follow the discoverers to forage. The alerters are responsible for monitoring the foraging area. The optimization of the model parameters is achieved through the process of updating the position of the three.</p>
<p>Suppose the total number of sparrow individuals is <inline-formula id="ieqn-11">
<mml:math id="mml-ieqn-11"><mml:mi>n</mml:mi></mml:math>
</inline-formula>. The dimension of the variable to be optimized is <inline-formula id="ieqn-12">
<mml:math id="mml-ieqn-12"><mml:mi>d</mml:mi></mml:math>
</inline-formula>. Then the position of the population can be expressed as:</p>
<p><disp-formula id="eqn-4"><label>(4)</label>
<mml:math id="mml-eqn-4" display="block"><mml:mi>X</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:mtd><mml:mtd><mml:mo>&#x22EF;</mml:mo></mml:mtd><mml:mtd><mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mi>d</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:mtd><mml:mtd><mml:mo>&#x22EF;</mml:mo></mml:mtd><mml:mtd><mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mi>d</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mo>&#x22EE;</mml:mo></mml:mtd><mml:mtd><mml:mo>&#x22EE;</mml:mo></mml:mtd><mml:mtd><mml:mo>&#x22F1;</mml:mo></mml:mtd><mml:mtd><mml:mo>&#x22EE;</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>,</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:mtd><mml:mtd><mml:mo>&#x22EF;</mml:mo></mml:mtd><mml:mtd><mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>,</mml:mo><mml:mi>d</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:math>
</disp-formula></p>
<p>The discoverers are responsible for guiding the population to find food and guiding the population to a safe location. The location update formula is as follows:</p>
<p><disp-formula id="eqn-5"><label>(5)</label>
<mml:math id="mml-eqn-5" display="block"><mml:mi>D</mml:mi><mml:msubsup><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mi>t</mml:mi></mml:msubsup><mml:mo>&#x22C5;</mml:mo><mml:mi>exp</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mi>i</mml:mi><mml:mrow><mml:mi>&#x03B1;</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>T</mml:mi></mml:mrow></mml:mfrac></mml:mrow></mml:mstyle></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mrow></mml:mrow></mml:mtd></mml:mtr></mml:mtable><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow><mml:mo>&#x003E;</mml:mo><mml:mi>S</mml:mi><mml:mi>T</mml:mi></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mi>t</mml:mi></mml:msubsup><mml:mo>+</mml:mo><mml:mi>Q</mml:mi><mml:mo>&#x22C5;</mml:mo><mml:mi>L</mml:mi><mml:mo>,</mml:mo><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mrow></mml:mrow></mml:mtd><mml:mtd><mml:mrow></mml:mrow></mml:mtd><mml:mtd><mml:mrow></mml:mrow></mml:mtd></mml:mtr></mml:mtable><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mrow></mml:mrow></mml:mtd></mml:mtr></mml:mtable><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow><mml:mo>&#x2265;</mml:mo><mml:mi>S</mml:mi><mml:mi>T</mml:mi></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo></mml:mrow></mml:math>
</disp-formula></p>
<p>In <xref ref-type="disp-formula" rid="eqn-5">Eq. (5)</xref>, <inline-formula id="ieqn-13">
<mml:math id="mml-ieqn-13"><mml:mi>t</mml:mi></mml:math>
</inline-formula> represents the current iteration number, <inline-formula id="ieqn-14">
<mml:math id="mml-ieqn-14"><mml:msubsup><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mi>t</mml:mi></mml:msubsup></mml:math>
</inline-formula> represents the position of the <inline-formula id="ieqn-15">
<mml:math id="mml-ieqn-15"><mml:mi>i</mml:mi></mml:math>
</inline-formula> sparrow in the <inline-formula id="ieqn-16">
<mml:math id="mml-ieqn-16"><mml:mi>j</mml:mi></mml:math>
</inline-formula> dimension at <inline-formula id="ieqn-17">
<mml:math id="mml-ieqn-17"><mml:mi>t</mml:mi></mml:math>
</inline-formula> iteration, <inline-formula id="ieqn-18">
<mml:math id="mml-ieqn-18"><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mo>&#x22EF;</mml:mo><mml:mo>,</mml:mo><mml:mi>n</mml:mi></mml:math>
</inline-formula>, <inline-formula id="ieqn-19">
<mml:math id="mml-ieqn-19"><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mo>&#x22EF;</mml:mo><mml:mo>,</mml:mo><mml:mi>d</mml:mi></mml:math>
</inline-formula>. <inline-formula id="ieqn-20">
<mml:math id="mml-ieqn-20"><mml:mi>&#x03B1;</mml:mi></mml:math>
</inline-formula> is a random number between 0 and 1. <inline-formula id="ieqn-21">
<mml:math id="mml-ieqn-21"><mml:mi>T</mml:mi></mml:math>
</inline-formula> is the maximum number of iterations. <inline-formula id="ieqn-22">
<mml:math id="mml-ieqn-22"><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow></mml:math>
</inline-formula> is warning value between 0 and 1. <inline-formula id="ieqn-23">
<mml:math id="mml-ieqn-23"><mml:mi>S</mml:mi><mml:mi>T</mml:mi></mml:math>
</inline-formula> is the preset safety threshold between 0.5 and 1. <inline-formula id="ieqn-24">
<mml:math id="mml-ieqn-24"><mml:mi>Q</mml:mi></mml:math>
</inline-formula> is a Gaussian distribution random number. <inline-formula id="ieqn-25">
<mml:math id="mml-ieqn-25"><mml:mi>L</mml:mi></mml:math>
</inline-formula> a is a matrix whose shape is <inline-formula id="ieqn-26">
<mml:math id="mml-ieqn-26"><mml:mo stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2217;</mml:mo><mml:mi>d</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math>
</inline-formula> and elements are all 1.</p>
<p>The followers will always follow the discoverers and compete for food resources in order to obtain more food resources. When the fitness of the discoverers is low, the followers will move to other positions, The follower&#x2019;s position update formula is as follows:</p>
<p><disp-formula id="eqn-6"><label>(6)</label>
<mml:math id="mml-eqn-6" display="block"><mml:mi>F</mml:mi><mml:msubsup><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mrow><mml:mi>Q</mml:mi><mml:mo>&#x22C5;</mml:mo><mml:mi>exp</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mrow><mml:mi>w</mml:mi><mml:mi>o</mml:mi><mml:mi>r</mml:mi><mml:mi>s</mml:mi><mml:mi>t</mml:mi></mml:mrow><mml:mi>t</mml:mi></mml:msubsup><mml:mo>&#x2212;</mml:mo><mml:msubsup><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mi>t</mml:mi></mml:msubsup></mml:mrow><mml:mrow><mml:mrow><mml:msup><mml:mi>i</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mrow></mml:mfrac></mml:mrow></mml:mstyle></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mrow><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mrow></mml:mrow></mml:mtd><mml:mtd><mml:mrow></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mtd></mml:mtr></mml:mtable><mml:mi>i</mml:mi><mml:mo>&#x003E;</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mi>n</mml:mi><mml:mn>2</mml:mn></mml:mfrac></mml:mrow></mml:mstyle></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mi>P</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mi>t</mml:mi></mml:msubsup><mml:mo>&#x2212;</mml:mo><mml:msubsup><mml:mi>x</mml:mi><mml:mi>P</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup></mml:mrow><mml:mo>|</mml:mo></mml:mrow><mml:mo>&#x22C5;</mml:mo><mml:mrow><mml:msup><mml:mi>A</mml:mi><mml:mo>+</mml:mo></mml:msup></mml:mrow><mml:mo>&#x22C5;</mml:mo><mml:mi>L</mml:mi><mml:mo>,</mml:mo><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mrow></mml:mrow></mml:mtd></mml:mtr></mml:mtable><mml:mi>i</mml:mi><mml:mo>&#x2264;</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mi>n</mml:mi><mml:mn>2</mml:mn></mml:mfrac></mml:mrow></mml:mstyle></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo></mml:mrow></mml:math>
</disp-formula></p>
<p>In <xref ref-type="disp-formula" rid="eqn-6">Eq. (6)</xref>, <inline-formula id="ieqn-27">
<mml:math id="mml-ieqn-27"><mml:msubsup><mml:mi>x</mml:mi><mml:mrow><mml:mi>w</mml:mi><mml:mi>o</mml:mi><mml:mi>r</mml:mi><mml:mi>s</mml:mi><mml:mi>t</mml:mi></mml:mrow><mml:mi>t</mml:mi></mml:msubsup></mml:math>
</inline-formula> represents the position of the individual with the lowest fitness in <inline-formula id="ieqn-28">
<mml:math id="mml-ieqn-28"><mml:mi>t</mml:mi></mml:math>
</inline-formula> iteration. <inline-formula id="ieqn-29">
<mml:math id="mml-ieqn-29"><mml:msubsup><mml:mi>x</mml:mi><mml:mi>P</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup></mml:math>
</inline-formula> represents the position of the individual with the highest fitness in <inline-formula id="ieqn-30">
<mml:math id="mml-ieqn-30"><mml:mi>t</mml:mi></mml:math>
</inline-formula> iterations. <inline-formula id="ieqn-31">
<mml:math id="mml-ieqn-31"><mml:mrow><mml:msup><mml:mi>A</mml:mi><mml:mo>+</mml:mo></mml:msup></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:msup><mml:mi>A</mml:mi><mml:mi>T</mml:mi></mml:msup></mml:mrow><mml:mrow><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>A</mml:mi><mml:mrow><mml:msup><mml:mi>A</mml:mi><mml:mi>T</mml:mi></mml:msup></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math>
</inline-formula>, <inline-formula id="ieqn-32">
<mml:math id="mml-ieqn-32"><mml:mi>A</mml:mi></mml:math>
</inline-formula> is shape of <inline-formula id="ieqn-33">
<mml:math id="mml-ieqn-33"><mml:mo stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2217;</mml:mo><mml:mi>d</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math>
</inline-formula> and each element of a is randomly preset to &#x2212;1 or 1.</p>
<p>When the population realizes the danger, alerters will quickly make an anti-predation response. The update formula of the position of the alerters is as follows:</p>
<p><disp-formula id="eqn-7"><label>(7)</label>
<mml:math id="mml-eqn-7" display="block"><mml:msubsup><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mrow><mml:mi>b</mml:mi><mml:mi>e</mml:mi><mml:mi>s</mml:mi><mml:mi>t</mml:mi></mml:mrow><mml:mi>t</mml:mi></mml:msubsup><mml:mo>+</mml:mo><mml:mi>&#x03B2;</mml:mi><mml:mo>&#x22C5;</mml:mo><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mi>t</mml:mi></mml:msubsup><mml:mo>&#x2212;</mml:mo><mml:msubsup><mml:mi>x</mml:mi><mml:mrow><mml:mi>b</mml:mi><mml:mi>e</mml:mi><mml:mi>s</mml:mi><mml:mi>t</mml:mi></mml:mrow><mml:mi>t</mml:mi></mml:msubsup></mml:mrow><mml:mo>|</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mrow></mml:mrow></mml:mtd><mml:mtd><mml:mrow></mml:mrow></mml:mtd></mml:mtr></mml:mtable><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>&#x2260;</mml:mo><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>g</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mrow><mml:mi>b</mml:mi><mml:mi>e</mml:mi><mml:mi>s</mml:mi><mml:mi>t</mml:mi></mml:mrow><mml:mi>t</mml:mi></mml:msubsup><mml:mo>+</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x22C5;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mi>t</mml:mi></mml:msubsup><mml:mo>&#x2212;</mml:mo><mml:msubsup><mml:mi>x</mml:mi><mml:mrow><mml:mi>b</mml:mi><mml:mi>e</mml:mi><mml:mi>s</mml:mi><mml:mi>t</mml:mi></mml:mrow><mml:mi>t</mml:mi></mml:msubsup></mml:mrow><mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>w</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>|</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mi>&#x03B5;</mml:mi></mml:mrow></mml:mfrac></mml:mrow></mml:mstyle></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mrow></mml:mrow></mml:mtd></mml:mtr></mml:mtable><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>g</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo></mml:mrow></mml:math>
</disp-formula></p>
<p>In <xref ref-type="disp-formula" rid="eqn-7">Eq. (7)</xref>, <inline-formula id="ieqn-34">
<mml:math id="mml-ieqn-34"><mml:msubsup><mml:mi>x</mml:mi><mml:mrow><mml:mi>b</mml:mi><mml:mi>e</mml:mi><mml:mi>s</mml:mi><mml:mi>t</mml:mi></mml:mrow><mml:mi>t</mml:mi></mml:msubsup></mml:math>
</inline-formula> represents the global optimal position in <inline-formula id="ieqn-35">
<mml:math id="mml-ieqn-35"><mml:mi>t</mml:mi></mml:math>
</inline-formula> iterations. <inline-formula id="ieqn-36">
<mml:math id="mml-ieqn-36"><mml:mi>&#x03B2;</mml:mi></mml:math>
</inline-formula> is the step size control parameter, which is a Gaussian distribution random number with mean 0 and variance 1. <inline-formula id="ieqn-37">
<mml:math id="mml-ieqn-37"><mml:mi>k</mml:mi></mml:math>
</inline-formula> is a random number between &#x2212;1 and 1. <inline-formula id="ieqn-38">
<mml:math id="mml-ieqn-38"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math>
</inline-formula> represents the fitness of the current individual. <inline-formula id="ieqn-39">
<mml:math id="mml-ieqn-39"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>g</mml:mi></mml:msub></mml:mrow></mml:math>
</inline-formula> and <inline-formula id="ieqn-40">
<mml:math id="mml-ieqn-40"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>w</mml:mi></mml:msub></mml:mrow></mml:math>
</inline-formula> represents the fitness of the current global best and worst individuals. <inline-formula id="ieqn-41">
<mml:math id="mml-ieqn-41"><mml:mi>&#x03B5;</mml:mi></mml:math>
</inline-formula> is the smallest constant used to avoid the situation where the denominator is 0. It can be seen from this formula, <inline-formula id="ieqn-42">
<mml:math id="mml-ieqn-42"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>&#x2260;</mml:mo><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>g</mml:mi></mml:msub></mml:mrow></mml:math>
</inline-formula> means that the individual is at the periphery of the population and needs to constantly change positions to obtain higher fitness. <inline-formula id="ieqn-43">
<mml:math id="mml-ieqn-43"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>g</mml:mi></mml:msub></mml:mrow></mml:math>
</inline-formula> means that the individual at the center of the population is aware of the danger, and will continue to approach other nearby sparrows to stay away from the danger area.</p>
</sec>
<sec id="s3_3_3">
<label>3.3.3</label>
<title>Random Forest Prediction Model</title>
<p>Random Forest (RF) [<xref ref-type="bibr" rid="ref-21">21</xref>] is a flexible and easy-to-use machine learning algorithm. It uses multiple regression trees as a basis for training and incorporates the idea of bagging. In the tree training process, random feature selection is used to reduce the correlation between sample features, thereby solving the problem of overfitting of a single decision tree model, so that the model has a better prediction effect. The basic process sees <xref ref-type="fig" rid="fig-2">Fig. 2</xref> below.</p>
<fig id="fig-2">
<label>Figure 2</label>
<caption>
<title>Random Forest algorithm flow</title></caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="ONCOLOGIE_21256-fig-2.png"/>
</fig>
<p>From the flowchart, we can see the Random Forest generation process, and the details are presented below:<list list-type="alpha-lower"><list-item>
<p>Random replacement sampling in training samples from the original training set, repeating S times;</p></list-item><list-item>
<p>Use these S data sets as training sets to train S CART tree models;</p></list-item><list-item>
<p>If the feature dimension is M, specify a constant m, randomly select m feature subsets from M features, and each time the tree is split, select the best from these m features;</p></list-item><list-item>
<p>The generated S decision trees are formed into a random forest to ensure that each tree grows to the maximum extent;</p></list-item><list-item>
<p>For classification problems, the classification results are generated by voting by S CART trees. For regression problems, the mean value of the prediction results of S trees is used as the final prediction result.</p></list-item></list></p>
<p>In addition, the Random Forest incorporates the bootstrap idea when selecting samples, that is, sampling with replacement. The out-of-bag data generated by the bootstrap algorithm can be used to test the generalization ability of the model.</p>
<p>For the establishment of this model, this paper adopts predictive Random Forest, selects the optimal feature <inline-formula id="ieqn-44">
<mml:math id="mml-ieqn-44"><mml:mi>j</mml:mi></mml:math>
</inline-formula> and segmentation position <inline-formula id="ieqn-45">
<mml:math id="mml-ieqn-45"><mml:mi>s</mml:mi></mml:math>
</inline-formula>.</p>
<p><disp-formula id="eqn-8"><label>(8)</label>
<mml:math id="mml-eqn-8" display="block"><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mi>m</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>s</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mo movablelimits="true" form="prefix">min</mml:mo><mml:msub><mml:mrow></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">[</mml:mo><mml:mrow><mml:munder><mml:mo movablelimits="true" form="prefix">min</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>c</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow></mml:mrow></mml:munder></mml:mrow><mml:munder><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow></mml:mrow></mml:munder><mml:mrow><mml:mrow><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:msub><mml:mi>c</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:munder><mml:mo movablelimits="true" form="prefix">min</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>c</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow></mml:mrow></mml:munder></mml:mrow><mml:munder><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow></mml:mrow></mml:munder><mml:mrow><mml:mrow><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:msub><mml:mi>c</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mrow><mml:mo stretchy="false">]</mml:mo></mml:math>
</disp-formula></p>
<p><disp-formula id="eqn-9"><label>(9)</label>
<mml:math id="mml-eqn-9" display="block"><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>s</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:msup><mml:mi>x</mml:mi><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>j</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msup></mml:mrow><mml:mo>&#x2264;</mml:mo><mml:mi>s</mml:mi></mml:mrow><mml:mo>}</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>s</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:msup><mml:mi>x</mml:mi><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>j</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msup></mml:mrow><mml:mo>&#x003E;</mml:mo><mml:mi>s</mml:mi></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:math>
</disp-formula></p>
<p>In <xref ref-type="disp-formula" rid="eqn-8">Eq. (8)</xref>, <inline-formula id="ieqn-46">
<mml:math id="mml-ieqn-46"><mml:mrow><mml:msub><mml:mi>c</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math>
</inline-formula> is to divide the set sample, <inline-formula id="ieqn-47">
<mml:math id="mml-ieqn-47"><mml:mi>m</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn></mml:math>
</inline-formula>. Then this paper uses the selected <inline-formula id="ieqn-48">
<mml:math id="mml-ieqn-48"><mml:mo stretchy="false">(</mml:mo><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>s</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math>
</inline-formula> to divide the area to find the corresponding output value in <xref ref-type="disp-formula" rid="eqn-10">Eq. (10)</xref>.</p>
<p><disp-formula id="eqn-10"><label>(10)</label>
<mml:math id="mml-eqn-10" display="block"><mml:mrow><mml:msub><mml:mi>c</mml:mi><mml:mi>m</mml:mi></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>m</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:mfrac></mml:mrow><mml:munder><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mi>m</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>s</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:munder><mml:mrow><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:mstyle></mml:math>
</disp-formula></p>
<p>The smaller the <inline-formula id="ieqn-49">
<mml:math id="mml-ieqn-49"><mml:mrow><mml:msub><mml:mi>c</mml:mi><mml:mi>m</mml:mi></mml:msub></mml:mrow></mml:math>
</inline-formula> is, the better the split performance of the selected feature and the segmentation point is. Continue to perform the above steps on the sub-regions to generate the optimal RF model.</p>
</sec>
</sec>
<sec id="s3_4">
<label>3.4</label>
<title>Model Construction</title>
<sec id="s3_4_1">
<label>3.4.1</label>
<title>Molecular Descriptor Screening</title>
<p>Ridgecv model is trained using 5-fold cross-validation and training regression coefficients are sorted by the size. The larger the regression coefficient, the higher the influence of the molecular descriptor on the change of biological activity. The top 20 molecular descriptors are shown in the <xref ref-type="fig" rid="fig-3">Fig. 3</xref>. In this paper, the top 20 to 100 drug compound variables with a high degree of influence are selected through the sorted characteristic regression coefficients and divide into features_num at intervals of 10 features. features_num &#x003D; [20,30,40,50,60,70,80,90,100]. Divide the dataset by features_num and put the divided dataset into the model for training.</p>
<fig id="fig-3">
<label>Figure 3</label>
<caption>
<title>The importance of molecular description of each variable on biological activity is ranked in the top 20</title></caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="ONCOLOGIE_21256-fig-3.png"/>
</fig>
</sec>
<sec id="s3_4_2">
<label>3.4.2</label>
<title>ISSA-RF Model</title>
<p>Like other intelligence optimization algorithms, SSA has the problem of easily falling into local optimum. In the later stage of the traditional SSA algorithm iteration, the position between the three sparrows will be updated in a small range near the optimal point, which is prone to the situation that the position update in a small range is stagnant. To solve this problem, this paper proposes an Improved Sparrow Search Algorithm(ISSA). We add dynamic adaptive weights to the sparrow finder position update formula to optimize the local exploration problem of the model. The formula is as follows:</p>
<p><disp-formula id="eqn-11"><label>(11)</label>
<mml:math id="mml-eqn-11" display="block"><mml:msubsup><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mrow><mml:mi>w</mml:mi><mml:mo>&#x22C5;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mi>t</mml:mi></mml:msubsup><mml:mo>&#x22C5;</mml:mo><mml:mi>exp</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mi>i</mml:mi><mml:mrow><mml:mi>&#x03B1;</mml:mi><mml:mo>&#x22C5;</mml:mo><mml:mi>T</mml:mi></mml:mrow></mml:mfrac></mml:mrow></mml:mstyle></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mrow></mml:mrow></mml:mtd></mml:mtr></mml:mtable><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow><mml:mo>&#x003C;</mml:mo><mml:mi>S</mml:mi><mml:mi>T</mml:mi></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mi>t</mml:mi></mml:msubsup><mml:mo>+</mml:mo><mml:mi>Q</mml:mi><mml:mo>&#x22C5;</mml:mo><mml:mi>L</mml:mi><mml:mo>,</mml:mo><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mrow></mml:mrow></mml:mtd><mml:mtd><mml:mrow></mml:mrow></mml:mtd><mml:mtd><mml:mrow></mml:mrow></mml:mtd></mml:mtr></mml:mtable><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mrow></mml:mrow></mml:mtd></mml:mtr></mml:mtable><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mrow></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow></mml:mrow></mml:mtd></mml:mtr></mml:mtable><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mrow></mml:mrow></mml:mtd><mml:mtd><mml:mrow></mml:mrow></mml:mtd></mml:mtr></mml:mtable><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow><mml:mo>&#x2265;</mml:mo><mml:mi>S</mml:mi><mml:mi>T</mml:mi></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo></mml:mrow></mml:math>
</disp-formula></p>
<p><disp-formula id="eqn-12"><label>(12)</label>
<mml:math id="mml-eqn-12" display="block"><mml:mi>w</mml:mi><mml:mo>=</mml:mo><mml:mi>sin</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mrow><mml:mi>&#x03C0;</mml:mi><mml:mo>&#x22C5;</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mo>&#x22C5;</mml:mo><mml:mi>T</mml:mi></mml:mrow></mml:mfrac></mml:mrow><mml:mo>+</mml:mo><mml:mi>&#x03C0;</mml:mi></mml:mstyle></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mi>b</mml:mi></mml:math>
</disp-formula></p>
<p>In <xref ref-type="disp-formula" rid="eqn-12">Eq. (12)</xref>, <inline-formula id="ieqn-50">
<mml:math id="mml-ieqn-50"><mml:mi>T</mml:mi></mml:math>
</inline-formula> represents the maximum number of iterations. <inline-formula id="ieqn-51">
<mml:math id="mml-ieqn-51"><mml:mi>t</mml:mi></mml:math>
</inline-formula> represents the current iteration number. <inline-formula id="ieqn-52">
<mml:math id="mml-ieqn-52"><mml:mi>b</mml:mi></mml:math>
</inline-formula> represents the bias term. The <xref ref-type="disp-formula" rid="eqn-12">Eq. (12)</xref> curve figure is as follows.</p>

<p>It can be seen from <xref ref-type="fig" rid="fig-4">Fig. 4</xref> that <inline-formula id="ieqn-53">
<mml:math id="mml-ieqn-53"><mml:mi>w</mml:mi></mml:math>
</inline-formula> is constantly changing as the number of iterations is updated. The sine function controls the range of <inline-formula id="ieqn-54">
<mml:math id="mml-ieqn-54"><mml:mi>w</mml:mi></mml:math>
</inline-formula> within the range of [&#x2212;1,0], and adjusts the range of <inline-formula id="ieqn-55">
<mml:math id="mml-ieqn-55"><mml:mi>w</mml:mi></mml:math>
</inline-formula> by modifying the bias term <inline-formula id="ieqn-56">
<mml:math id="mml-ieqn-56"><mml:mi>b</mml:mi></mml:math>
</inline-formula>. This paper sets <inline-formula id="ieqn-57">
<mml:math id="mml-ieqn-57"><mml:mi>b</mml:mi></mml:math>
</inline-formula> to 1. Giving the discoverer a larger weight in the early stage of the algorithm iteration is conducive to the global search. In the later stage of the algorithm search, <inline-formula id="ieqn-58">
<mml:math id="mml-ieqn-58"><mml:mi>w</mml:mi></mml:math>
</inline-formula> decreases slowly, and there is sufficient time for local exploration. And because <inline-formula id="ieqn-59">
<mml:math id="mml-ieqn-59"><mml:mi>w</mml:mi></mml:math>
</inline-formula> has a small decrease, it can also make a relatively large weight in the later stage of the iteration, thereby speeding up the speed of local exploration. The involvement of this weight also speeds up the overall convergence speed of the algorithm to a certain extent.</p>
<fig id="fig-4">
<label>Figure 4</label>
<caption>
<title><xref ref-type="disp-formula" rid="eqn-12">Eq. (12)</xref> curve</title></caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="ONCOLOGIE_21256-fig-4.png"/>
</fig>
<p>In order to illustrate the convergence effect of the ISSA algorithm proposed in this paper, this paper uses the Rosenbrock function to conduct simulation experiments. When it is a binary function, as shown in <xref ref-type="fig" rid="fig-5">Fig. 5</xref>. The Rastrigin formula is as follows:</p>
<p><disp-formula id="eqn-13"><label>(13)</label>
<mml:math id="mml-eqn-13" display="block"><mml:mi>R</mml:mi><mml:mi>o</mml:mi><mml:mi>s</mml:mi><mml:mi>e</mml:mi><mml:mi>n</mml:mi><mml:mi>b</mml:mi><mml:mi>r</mml:mi><mml:mi>o</mml:mi><mml:mi>c</mml:mi><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:munderover><mml:mrow><mml:mn>100</mml:mn><mml:mrow><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:msubsup><mml:mi>x</mml:mi><mml:mi>i</mml:mi><mml:mn>2</mml:mn></mml:msubsup><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:msup><mml:mo stretchy="false">)</mml:mo><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:math>
</disp-formula></p>
<fig id="fig-5">
<label>Figure 5</label>
<caption>
<title>Rosenbrock binary function</title></caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="ONCOLOGIE_21256-fig-5.png"/>
</fig>
<p>In this paper, the independent fitness convergence training of the Rosenbrock function is performed, and the results are shown in <xref ref-type="fig" rid="fig-6">Fig. 6</xref>. Within 500 iterations, SSA reached the convergence state at 27 iterations, and the convergence fitness value precision reached 10e&#x2212;7, while ISSA reached the convergence state at 41 iterations, and the convergence fitness value precision reached 10e&#x2212;23. The fitness convergence accuracy is much higher than that of SSA, which shows that the improved ISSA algorithm has much higher convergence fitness accuracy than the ordinary SSA algorithm.</p>
<fig id="fig-6">
<label>Figure 6</label>
<caption>
<title>SSA, ISSA to Rosenbrock function convergence curve</title></caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="ONCOLOGIE_21256-fig-6.png"/>
</fig>
<p>Considering the time benefit and model accuracy of ISSA algorithm parameter optimization, too large an optimization range will lead to long model training time, and too small an optimization range will lose model accuracy, so it is necessary to limit the optimization range of parameters. Among the parameters required by the RandomForestRegressor function, n_estimators, max_depth, and max_features have a greater impact on the accuracy of the RF model. The parameters of max_depth and n_estimators are limited to the optimization range by using RF model training with MSE and <inline-formula id="ieqn-60">
<mml:math id="mml-ieqn-60"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:math>
</inline-formula> as the judgment criteria. The smaller the MSE, the higher the accuracy of the prediction of biological activity. The higher the <inline-formula id="ieqn-61">
<mml:math id="mml-ieqn-61"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:math>
</inline-formula>, the better the model, and the stronger the interpretation of the biological activity by the molecular descriptor features of the selected compound. As shown in <xref ref-type="fig" rid="fig-7">Fig. 7</xref>. As the number of iterations increases, MSE keeps decreasing. After 175 iterations, MSE and <inline-formula id="ieqn-62">
<mml:math id="mml-ieqn-62"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:math>
</inline-formula> are almost stable with small fluctuations. This shows that when n_estimators &#x003D; 175 or so, the model is stable and the accuracy reaches the highest level, so the range of n_estimators is set to [160,180]. Similarly, according to the curves shown in <xref ref-type="fig" rid="fig-7">Figs. 7c</xref> and <xref ref-type="fig" rid="fig-7">7d</xref>, when max_depth &#x003D; 16, the degree of fluctuation is small, and the range of max_depth is set to [10,30]. In order to satisfy the characteristics of RF model selection of features, this paper selects max_features, but considering that the number of molecular descriptors input in each training is different, the optimization range is not limited, and the ISSA algorithm directly performs the optimization operation. The parameter optimization range can be seen in <xref ref-type="table" rid="table-1">Table 1</xref>.</p>
<fig id="fig-7">
<label>Figure 7</label>
<caption>
<title>The relationship between the number of features_num and n_estimators and max_depth</title></caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="ONCOLOGIE_21256-fig-7.png"/>
</fig>
<table-wrap id="table-1"><label>Table 1</label>
<caption>
<title>Parameter optimization range</title></caption>
<table><colgroup>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Parameters</th>
<th>Optimization range</th>
</tr>
</thead>
<tbody>
<tr>
<td>The number of populations</td>
<td>10</td>
</tr>
<tr>
<td>The maximum number of iterations</td>
<td>30</td>
</tr>
<tr>
<td>n_estimators</td>
<td>[160,180]</td>
</tr>
<tr>
<td>Max_depth</td>
<td>[10,30]</td>
</tr>
<tr>
<td>Max_features</td>
<td>[1, features_num]</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>This paper combines the ISSA with the RF algorithm to optimize the parameters of the RF model. First, the sparrow population is initialized. The the RF model is trained to calculate its fitness value. This paper sets the fitness function as the RMSE of RF model training and each iteration searches towards a position with a lower RMSE. Update the positions of discoverers, follower and alerters through the change of fitness value until the training termination condition is met. Finally, the ER&#x03B1; gene activity was predicted by the RF model with optimal parameters, and the model was tested by the validation set. The predicted model structures of the drug compound molecules that antagonize the activity of the ER&#x03B1; gene by the ISSA-RF model are shown in <xref ref-type="fig" rid="fig-8">Fig. 8</xref>.</p>
<fig id="fig-8">
<label>Figure 8</label>
<caption>
<title>ISSA-RF model flowchart</title></caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="ONCOLOGIE_21256-fig-8.png"/>
</fig>
</sec>
</sec>
<sec id="s3_5">
<label>3.5</label>
<title>Model Evaluation Index</title>
<p>In order to objectively evaluate the prediction effect of the established quantitative prediction model of biological activity, we introduce several model evaluation indicators to evaluate the accuracy of the model such as Mean Square Error (MSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Root Mean Square Error (RMSE), Coefficient of determination (<inline-formula id="ieqn-63">
<mml:math id="mml-ieqn-63"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:math>
</inline-formula>).</p>
<p><disp-formula id="eqn-14"><label>(14)</label>
<mml:math id="mml-eqn-14" display="block"><mml:mi>M</mml:mi><mml:mi>S</mml:mi><mml:mi>E</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mi>m</mml:mi></mml:mfrac></mml:mrow><mml:mrow><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>m</mml:mi></mml:munderover><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mover><mml:mi>y</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow></mml:mrow><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mstyle></mml:math>
</disp-formula></p>
<p><disp-formula id="eqn-15"><label>(15)</label>
<mml:math id="mml-eqn-15" display="block"><mml:mi>M</mml:mi><mml:mi>A</mml:mi><mml:mi>E</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mi>m</mml:mi></mml:mfrac></mml:mrow><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>m</mml:mi></mml:munderover><mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mover><mml:mi>y</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow></mml:mrow><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>|</mml:mo></mml:mrow></mml:mrow></mml:mstyle></mml:math>
</disp-formula></p>
<p><disp-formula id="eqn-16"><label>(16)</label>
<mml:math id="mml-eqn-16" display="block"><mml:mi>M</mml:mi><mml:mi>A</mml:mi><mml:mi>P</mml:mi><mml:mi>E</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mrow><mml:mn>100</mml:mn><mml:mi mathvariant="normal">&#x0025;</mml:mi></mml:mrow><mml:mi>m</mml:mi></mml:mfrac></mml:mrow><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>m</mml:mi></mml:munderover><mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mover><mml:mi>y</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow></mml:mrow><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:mfrac></mml:mrow></mml:mstyle></mml:mrow><mml:mo>|</mml:mo></mml:mrow></mml:mrow></mml:mstyle></mml:math>
</disp-formula></p>
<p><disp-formula id="eqn-17"><label>(17)</label>
<mml:math id="mml-eqn-17" display="block"><mml:mi>R</mml:mi><mml:mi>M</mml:mi><mml:mi>S</mml:mi><mml:mi>E</mml:mi><mml:mo>=</mml:mo><mml:msqrt><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mi>m</mml:mi></mml:mfrac></mml:mrow><mml:mrow><mml:msup><mml:mrow><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>m</mml:mi></mml:munderover><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mover><mml:mi>y</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow></mml:mrow><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mstyle></mml:msqrt></mml:math>
</disp-formula></p>
<p><disp-formula id="eqn-18"><label>(18)</label>
<mml:math id="mml-eqn-18" display="block"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>R</mml:mi><mml:mn>2</mml:mn><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi mathvariant="normal">n</mml:mi></mml:mrow></mml:munderover><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mover><mml:mi>y</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow></mml:mrow><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:mrow><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:munderover><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mover><mml:mi>y</mml:mi><mml:mo accent="false">&#x00AF;</mml:mo></mml:mover><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:math>
</disp-formula></p>
<p>In the formula, <inline-formula id="ieqn-64">
<mml:math id="mml-ieqn-64"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math>
</inline-formula> is the pIC50 actually given in the test set, <inline-formula id="ieqn-65">
<mml:math id="mml-ieqn-65"><mml:mrow><mml:msub><mml:mrow><mml:mover><mml:mi>y</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math>
</inline-formula> is the pIC50 predicted by the model in the test set, <inline-formula id="ieqn-66">
<mml:math id="mml-ieqn-66"><mml:mi>i</mml:mi></mml:math>
</inline-formula> is the collection position, and <inline-formula id="ieqn-67">
<mml:math id="mml-ieqn-67"><mml:mi>m</mml:mi></mml:math>
</inline-formula> is the number of samplings. When MSE, MAE, MAPE, TMSE are at a lower level, it proves that the prediction results of the model are better. The smaller the value, the higher the prediction accuracy of the established model. In addition, in <xref ref-type="disp-formula" rid="eqn-18">Eq. (18)</xref>, <inline-formula id="ieqn-68">
<mml:math id="mml-ieqn-68"><mml:mover><mml:mi>y</mml:mi><mml:mo accent="false">&#x00AF;</mml:mo></mml:mover></mml:math>
</inline-formula> is the average value of the pIC50 of the test sample. The determination coefficient <inline-formula id="ieqn-69">
<mml:math id="mml-ieqn-69"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:math>
</inline-formula> takes a value between 0 and 1, The closer the value of <inline-formula id="ieqn-70">
<mml:math id="mml-ieqn-70"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:math>
</inline-formula> is to 1, the better the performance of the model.</p>
</sec>
</sec>
<sec id="s4">
<label>4</label>
<title>Analysis of Results</title>
<sec id="s4_1">
<label>4.1</label>
<title>Results of ISSA-RF Predictions</title>
<p>In this paper, 85% of the data is used as Train set, 15% of the data is used as Test set, the random seed parameter random_state is 0, and a 5-fold cross-validation model is used with <inline-formula id="ieqn-71">
<mml:math id="mml-ieqn-71"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:math>
</inline-formula> as the target value (about 17% of the data). Dataset is divided with the selected top 20 to 100 highly influential drug compound variables, and put them into the ISSA-RF model for training, and finally predict the Test set through the RF model with the optimal parameters to obtain the final model effect. The optimal parameters can be seen in <xref ref-type="table" rid="table-2">Table 2</xref>. model evaluation results can be seen in <xref ref-type="table" rid="table-3">Table 3</xref>.</p>
<table-wrap id="table-2"><label>Table 2</label>
<caption>
<title>Features_num corresponds to the optimal parameters for training</title></caption>
<table><colgroup>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Features_num</th>
<th>n_estimators</th>
<th>Max_features</th>
<th>Max_depth</th>
</tr>
</thead>
<tbody>
<tr>
<td>20</td>
<td>163</td>
<td>9</td>
<td>13</td>
</tr>
<tr>
<td>30</td>
<td>180</td>
<td>13</td>
<td>20</td>
</tr>
<tr>
<td>40</td>
<td>161</td>
<td>13</td>
<td>14</td>
</tr>
<tr>
<td>50</td>
<td>163</td>
<td>22</td>
<td>19</td>
</tr>
<tr>
<td>60</td>
<td>160</td>
<td>14</td>
<td>17</td>
</tr>
<tr>
<td>70</td>
<td>160</td>
<td>15</td>
<td>13</td>
</tr>
<tr>
<td>80</td>
<td>170</td>
<td>23</td>
<td>18</td>
</tr>
<tr>
<td>90</td>
<td>163</td>
<td>26</td>
<td>16</td>
</tr>
<tr>
<td>100</td>
<td>160</td>
<td>65</td>
<td>13</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="table-3"><label>Table 3</label>
<caption>
<title>Features_num corresponds to the evaluation index of training</title></caption>
<table><colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Features_num</th>
<th>MAE</th>
<th>RMSE</th>
<th>MSE</th>
<th>MAPE</th>
<th><inline-formula id="ieqn-72">
<mml:math id="mml-ieqn-72"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:math>
</inline-formula></th>
<th><inline-formula id="ieqn-73">
<mml:math id="mml-ieqn-73"><mml:mi>V</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi><mml:mspace width="negativethinmathspace" /><mml:mspace width="negativethinmathspace" /><mml:mo>&#x2212;</mml:mo><mml:mspace width="negativethinmathspace" /><mml:mspace width="negativethinmathspace" /><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:math>
</inline-formula></th>
</tr>
</thead>
<tbody>
<tr>
<td>20</td>
<td>0.5951410</td>
<td>0.7875591</td>
<td>0.6202494</td>
<td>0.0953065</td>
<td>0.6728614</td>
<td>0.6698001</td>
</tr>
<tr>
<td>30</td>
<td>0.5267269</td>
<td>0.7084049</td>
<td>0.5018375</td>
<td>0.0821251</td>
<td>0.7353155</td>
<td>0.7317090</td>
</tr>
<tr>
<td>40</td>
<td>0.5252527</td>
<td>0.7052240</td>
<td>0.4973409</td>
<td>0.0819899</td>
<td>0.7376871</td>
<td>0.7352390</td>
</tr>
<tr>
<td>50</td>
<td>0.5226297</td>
<td>0.7084105</td>
<td>0.5018454</td>
<td>0.0819970</td>
<td>0.7353113</td>
<td>0.7428036</td>
</tr>
<tr>
<td>60</td>
<td>0.5238194</td>
<td>0.7033877</td>
<td>0.4947543</td>
<td>0.0822189</td>
<td>0.7390513</td>
<td>0.7406314</td>
</tr>
<tr>
<td>70</td>
<td>0.5149397</td>
<td>0.6959719</td>
<td>0.4843770</td>
<td>0.0809725</td>
<td>0.7445247</td>
<td>0.7437330</td>
</tr>
<tr>
<td>80</td>
<td>0.5125422</td>
<td>0.6972933</td>
<td>0.4862180</td>
<td>0.0805919</td>
<td>0.7435537</td>
<td>0.7450803</td>
</tr>
<tr>
<td>90</td>
<td>0.5119813</td>
<td>0.6947606</td>
<td>0.4826924</td>
<td>0.0802745</td>
<td>0.7454132</td>
<td>0.7482465</td>
</tr>
<tr>
<td>100</td>
<td>0.4979396</td>
<td>0.6876389</td>
<td>0.4728473</td>
<td>0.0776072</td>
<td>0.7506058</td>
<td>0.7463062</td>
</tr>
</tbody>
</table>
</table-wrap>

<p>As can be seen from <xref ref-type="fig" rid="fig-9">Fig. 9</xref>, when features_num is increased from 20 to 30, the overall prediction effect is significantly improved. After that, the overall prediction effect is slowly improved with the increase of features_num, but the improvement effect is not obvious, which means that the number of features will increase the performance of model within a specific range. prediction accuracy. When features_num &#x003D; 100, the effect is the best that RMSE is 0.6876389 and <inline-formula id="ieqn-74">
<mml:math id="mml-ieqn-74"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:math>
</inline-formula> is 0.75 and <inline-formula id="ieqn-75">
<mml:math id="mml-ieqn-75"><mml:mi>V</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:math>
</inline-formula> is 0.746. It can be seen from <xref ref-type="fig" rid="fig-9">Fig. 9(d)</xref> that the Test set and cross-validation show the same line trend, and the Test set is slightly better than the cross-validation in the prediction results, which shows that the generalization ability of the ISSA-RF model is better in the antagonism Molecular prediction of drug compounds for ER&#x03B1; gene activity.</p>
<fig id="fig-9">
<label>Figure 9</label>
<caption>
<title>Evaluation metrics visualization</title></caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="ONCOLOGIE_21256-fig-9.png"/>
</fig>
</sec>
<sec id="s4_2">
<label>4.2</label>
<title>Model Comparison</title>
<p>In order to verify the accuracy of the quantitative prediction model for ER&#x03B1; biological activity constructed by ISSA-RF model, GA_SVM, BP and XGBoost were introduced in this paper to predict the biological activity of ER&#x03B1;. Under the same experimental conditions, the three models are trained using the top 20, 50, and 100 molecular descriptors with a high degree of influence. The model evaluation is shown in <xref ref-type="table" rid="table-4">Table 4</xref>. The comparison between the predicted results and the actual value results is shown in <xref ref-type="fig" rid="fig-10">Fig. 10</xref>.</p>

<table-wrap id="table-4"><label>Table 4</label>
<caption>
<title>features_num corresponds to the evaluation index of training</title></caption>
<table><colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Algorithm</th>
<th>Features_num</th>
<th>MAE</th>
<th>RMSE</th>
<th>MSE</th>
<th>MAPE</th>
<th><inline-formula id="ieqn-76">
<mml:math id="mml-ieqn-76"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:math>
</inline-formula></th>
</tr>
</thead>
<tbody>
<tr>
<td rowspan="3">GA_SVM</td>
<td>20</td>
<td>0.6547276</td>
<td>0.8719524</td>
<td>0.7603010</td>
<td>0.1048008</td>
<td>0.5989939</td>
</tr>
<tr>
<td>50</td>
<td>0.5245315</td>
<td>0.7275946</td>
<td>0.5293939</td>
<td>0.0819348</td>
<td>0.7207813</td>
</tr>
<tr>
<td style="background:#BDD6EE;">100</td>
<td style="background:#BDD6EE;">0.5165143</td>
<td style="background:#BDD6EE;">0.7256745</td>
<td style="background:#BDD6EE;">0.5266035</td>
<td style="background:#BDD6EE;">0.0813428</td>
<td style="background:#BDD6EE;">0.7222531</td>
</tr>
<tr>
<td rowspan="3">BP</td>
<td>20</td>
<td>0.7134478</td>
<td>0.9126131</td>
<td>0.8328628</td>
<td>0.1132963</td>
<td>0.5607226</td>
</tr>
<tr>
<td>50</td>
<td>0.5876721</td>
<td>0.7753266</td>
<td>0.6011314</td>
<td>0.0937422</td>
<td>0.6829448</td>
</tr>
<tr>
<td style="background:#BDD6EE;">100</td>
<td style="background:#BDD6EE;">0.5635194</td>
<td style="background:#BDD6EE;">0.7444502</td>
<td style="background:#BDD6EE;">0.5542061</td>
<td style="background:#BDD6EE;">0.0884355</td>
<td style="background:#BDD6EE;">0.7076947</td>
</tr>
<tr>
<td rowspan="3">XGBoost</td>
<td>20</td>
<td>0.7546805</td>
<td>0.9489264</td>
<td>0.9004613</td>
<td>0.1116271</td>
<td>0.5250690</td>
</tr>
<tr>
<td>50</td>
<td>0.7423109</td>
<td>0.9507362</td>
<td>0.9038994</td>
<td>0.1083279</td>
<td>0.5232557</td>
</tr>
<tr>
<td style="background:#BDD6EE;">100</td>
<td style="background:#BDD6EE;">0.7111056</td>
<td style="background:#BDD6EE;">0.9167204</td>
<td style="background:#BDD6EE;">0.8403764</td>
<td style="background:#BDD6EE;">0.1037813</td>
<td style="background:#BDD6EE;">0.5567597</td>
</tr>
<tr>
<td rowspan="3">ISSA_RF</td>
<td>20</td>
<td>0.5951410</td>
<td>0.7875591</td>
<td>0.6202494</td>
<td>0.0953065</td>
<td>0.6728614</td>
</tr>
<tr>
<td>50</td>
<td>0.5226297</td>
<td>0.7084105</td>
<td>0.5018454</td>
<td>0.0819970</td>
<td>0.7353113</td>
</tr>
<tr>
<td style="background:#BDD6EE;">100</td>
<td style="background:#BDD6EE;">0.4979396</td>
<td style="background:#BDD6EE;">0.6876389</td>
<td style="background:#BDD6EE;">0.4728473</td>
<td style="background:#BDD6EE;">0.0776072</td>
<td style="background:#BDD6EE;">0.7506058</td>
</tr>
</tbody>
</table>
</table-wrap>
<fig id="fig-10">
<label>Figure 10</label>
<caption>
<title>Comparison of pIC50 predicted value and true value of other models</title></caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="ONCOLOGIE_21256-fig-10.png"/>
</fig>
<p>The upper part of each image in <xref ref-type="fig" rid="fig-10">Fig. 10</xref> is the line of comparison between the predicted value and actual value of pIC50 of the model under different number of features, and the lower part is the difference between the predicted value and actual value of pIC50 of the model under different number of features Error line. According to the comparison figure, it can be seen that the predicted value of pIC50 of the model established by ISSA-RF algorithm is in good agreement with the actual value. After comparing the model evaluation results, it can be seen that different algorithm models achieve the best results when features_num &#x003D; 100. The detail image when features &#x003D; 100 is shown in <xref ref-type="fig" rid="fig-11">Fig. 11</xref>. Under the same experimental conditions, the ISSA-RF model proposed in this paper <inline-formula id="ieqn-77">
<mml:math id="mml-ieqn-77"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:math>
</inline-formula> is 3%, 5%, 20% higher than the other three models, and RMSE is improved by 5%, 7.6%, 24.9%. This shows that the ISSA-RF prediction model has a satisfactory effect on the quantitative prediction of the biological activity of drug compounds on ER&#x03B1;.</p>

<fig id="fig-11">
<label>Figure 11</label>
<caption>
<title>Comparison of the actual value and the predicted value under the optimal evaluation index</title></caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="ONCOLOGIE_21256-fig-11.png"/>
</fig>
</sec>
</sec>
<sec id="s5">
<label>5</label>
<title>Conclusion</title>
<p>In the practice of selecting anti-breast cancer drug candidates, it is usually necessary to analyze the structure-activity relationship between compound activity data and compound molecular descriptors, and select compound molecules that satisfy biological activity as drug candidates. In this paper, a RF model optimized by the Improved Sparrow Search Algorithm (ISSA-RF) is proposed. We added an adaptive weight formula to the sparrow finder position update formula to optimize the search range and speed of sparrows in different stages, and proved that the SSA algorithm with adaptive weights has better fitness training accuracy on the Rosenbrock function than the ordinary SSA algorithm. This paper uses multi-scale molecular descriptors for model training to reduce the chance of model training accuracy caused by the number of different molecular descriptors. In addition, this paper limits the search range of ISSA-RF through RF separate training. The main purpose of this is to avoid the problem of the ISSA algorithm falling into local optimality. Secondly, this can also greatly reduce the search time of sparrows and improve the efficiency of model optimization. Finally, the prediction effect is compared with a variety of common models to verify the accuracy of the ISSA-RF model. The experimental results show that compared with other models, the ISSA-RF algorithm model proposed in this paper has a lower RMSE in the prediction of the biological activity of drug compounds on ER&#x03B1;, and can accurately predict the biological activity according to the molecular descriptors of the compounds, which improves the accuracy and efficiency of anti-breast cancer drug candidate screening. In addition, this model can not only be used to screen anti-breast cancer drug candidates, but also provides new ideas for constructing quantitative structure-activity relationship models of compounds.</p>
<p><bold>Author Contributions:</bold> Minxi Rong, Xiaoli Guo contributed to the conception of the study; Yong Li performed the experiment and contributed significantly to manuscript preparation; Tao Zong, Zhiyuan Ma and Penglei Li helped perform the analysis with constructive discussions.</p>
<p><bold>Ethics Approval and Informed Consent Statement:</bold> The datasets used in this article is a public data set from the DrugBank drug molecule database of the University of Alberta, and the data set is used as competition data in the China 2021 &#x201C;Huawei Cup&#x201D; Mathematical Modeling Competition, so the datasets do not involve Ethical Approval and Informed Consent Statement.</p>
<p><bold>Availability of Data and Materials:</bold> The datasets used or analyzed during the current study have been posted to github website (<uri xlink:href="https://github.com/Li519445444/candidate-drug-data-source/tree/master">https://github.com/Li519445444/candidate-drug-data-source/tree/master</uri>).</p>
</sec>
</body>
<back>
<ack>
<p>The authors thank the National Natural Science Foundation of China (11601491). Thanks to China&#x2019;s 2021 &#x201C;Huawei Cup&#x201D; Mathematical Modeling Competition for offering the data.</p>
</ack><fn-group>
<fn fn-type="other">
<p><bold>Funding Statement:</bold> This research was supported by the National Natural Science Foundation of China (11601491).</p>
</fn>
<fn fn-type="conflict">
<p><bold>Conflicts of Interest:</bold> The authors declare that they have no conflicts of interest to report regarding the present study.</p>
</fn>
</fn-group>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>1.</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Sohn</surname>, <given-names>E.</given-names></string-name></person-group> (<year>2017</year>). <article-title>Environment: Hothouse of disease</article-title>. <source>Nature</source><italic>,</italic> <volume>543</volume><italic>(</italic><issue>7647</issue><italic>),</italic> <fpage>S44</fpage>&#x2013;<lpage>S46</lpage>. DOI <pub-id pub-id-type="doi">10.1038/543S44a</pub-id>.</mixed-citation></ref>
<ref id="ref-2"><label>2.</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Tan</surname>, <given-names>L. L.</given-names></string-name>, <string-name><surname>Zhang</surname>, <given-names>X. X.</given-names></string-name>, <string-name><surname>Zhou</surname>, <given-names>Y. Z.</given-names></string-name></person-group> (<year>2021</year>). <article-title>Prediction of molecular biological activity based on graph convolution method of multi-characteristic fusion</article-title>. <source>Journal of University of Electronic Science and Technology of China</source><italic>,</italic> <volume>50</volume><italic>(</italic><issue>6</issue><italic>),</italic> <fpage>921</fpage>&#x2013;<lpage>929</lpage>. DOI <pub-id pub-id-type="doi">10.12178/1001-0548.2021158</pub-id>.</mixed-citation></ref>
<ref id="ref-3"><label>3.</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Loibl</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Poortmans</surname>, <given-names>P.</given-names></string-name>, <string-name><surname>Morrow</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Denkert</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Curigliano</surname>, <given-names>G.</given-names></string-name></person-group> (<year>2021</year>). <article-title>Breast cancer</article-title>. <source>The Lancet</source><italic>,</italic> <volume>397</volume><italic>,</italic> <fpage>1750</fpage>&#x2013;<lpage>1769</lpage>. DOI <pub-id pub-id-type="doi">10.1016/S0140-6736(20)32381-3</pub-id>.</mixed-citation></ref>
<ref id="ref-4"><label>4.</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Cong</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Xue</surname>, <given-names>Y.</given-names></string-name></person-group> (<year>2013</year>). <article-title>Quantitative structure-activity relationship study of the non-nucleoside inhibitors of HCV NS5B polymerase by machine learning methods</article-title>. <source>Acta Physico-Chimica Sinica</source><italic>,</italic> <volume>29</volume><italic>(</italic><issue>8</issue><italic>),</italic> <fpage>1639</fpage>&#x2013;<lpage>1647</lpage>. DOI <pub-id pub-id-type="doi">10.3866/PKU.WHXB201305171</pub-id>.</mixed-citation></ref>
<ref id="ref-5"><label>5.</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Deo</surname>, <given-names>R. C.</given-names></string-name></person-group> (<year>2015</year>). <article-title>Machine learning in medicine</article-title>. <source>Circulation</source><italic>,</italic> <volume>132</volume><italic>(</italic><issue>20</issue><italic>),</italic> <fpage>1920</fpage>&#x2013;<lpage>1930</lpage>. DOI <pub-id pub-id-type="doi">10.1161/CIRCULATIONAHA.115.001593</pub-id>.</mixed-citation></ref>
<ref id="ref-6"><label>6.</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Jiang</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Lei</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Wang</surname>, <given-names>Z.</given-names></string-name>, <string-name><surname>Shen</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Cao</surname>, <given-names>D.</given-names></string-name> <etal>et al.</etal></person-group> (<year>2020</year>). <article-title>ADMET evaluation in drug discovery. 20. Prediction of breast cancer resistance protein inhibition through machine learning</article-title>. <source>Journal of Cheminformatics</source><italic>,</italic> <volume>12</volume><italic>(</italic><issue>1</issue><italic>),</italic> <fpage>16</fpage>. DOI <pub-id pub-id-type="doi">10.1186/s13321-020-00421-y</pub-id>.</mixed-citation></ref>
<ref id="ref-7"><label>7.</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Che</surname>, <given-names>H. X.</given-names></string-name>, <string-name><surname>Wang</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Wang</surname>, <given-names>W.</given-names></string-name></person-group> (<year>2021</year>). <article-title>Comparing prediction models for prostate cancer</article-title>. <source>Data Analysis and Knowledge Discovery</source><italic>,</italic> <volume>5</volume><italic>(</italic><issue>9</issue><italic>),</italic> <fpage>107</fpage>&#x2013;<lpage>114</lpage>. DOI <pub-id pub-id-type="doi">10.11925/infotech.2096-3467.2020.1185</pub-id>.</mixed-citation></ref>
<ref id="ref-8"><label>8.</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Wang</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Cui</surname>, <given-names>W.</given-names></string-name>, <string-name><surname>Guo</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Du</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Zhou</surname>, <given-names>Y.</given-names></string-name></person-group> (<year>2021</year>). <article-title>Identifying pathogens of foodborne diseases with machine learning</article-title>. <source>Data Analysis and Knowledge Discovery</source><italic>,</italic> <volume>5</volume><italic>(</italic><issue>9</issue><italic>),</italic> <fpage>54</fpage>&#x2013;<lpage>62</lpage>. DOI <pub-id pub-id-type="doi">10.11925/infotech.2096-3467.2020.1105</pub-id>.</mixed-citation></ref>
<ref id="ref-9"><label>9.</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Lu</surname>, <given-names>W.</given-names></string-name>, <string-name><surname>Xue</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Meng</surname>, <given-names>Q. W.</given-names></string-name></person-group> (<year>2013</year>). <article-title>Classification prediction of inhibitors of H1N1 neuraminidase by machine learning methods</article-title>. <source>Acta Physico-Chimica Sinica</source><italic>,</italic> <volume>29</volume><italic>(</italic><issue>1</issue>). DOI <pub-id pub-id-type="doi">10.3866/PKU.WHXB201211122</pub-id>.</mixed-citation></ref>
<ref id="ref-10"><label>10.</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Sheridan</surname>, <given-names>R. P.</given-names></string-name>, <string-name><surname>Wang</surname>, <given-names>W. M.</given-names></string-name>, <string-name><surname>Liaw</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Ma</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Gifford</surname>, <given-names>E. M.</given-names></string-name></person-group> (<year>2016</year>). <article-title>Extreme gradient boosting as a method for quantitative structure-activity relationships</article-title>. <source>Journal of Chemical Information and Modeling</source><italic>,</italic> <volume>56</volume><italic>(</italic><issue>12</issue><italic>),</italic> <fpage>2353</fpage>&#x2013;<lpage>2360</lpage>. DOI <pub-id pub-id-type="doi">10.1021/acs.jcim.6b00591</pub-id>.</mixed-citation></ref>
<ref id="ref-11"><label>11.</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Mansouri</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Cariello</surname>, <given-names>N. F.</given-names></string-name>, <string-name><surname>Korotcov</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Tkachenko</surname>, <given-names>V.</given-names></string-name>, <string-name><surname>Grulke</surname>, <given-names>C. M.</given-names></string-name> <etal>et al.</etal></person-group> (<year>2019</year>). <article-title>Open-source QSAR models for pKa prediction using multiple machine learning approaches</article-title>. <source>Journal of Cheminformatics</source><italic>,</italic> <volume>11</volume><italic>(</italic><issue>1</issue><italic>),</italic> <fpage>294</fpage>. DOI <pub-id pub-id-type="doi">10.1186/s13321-019-0384-1</pub-id>.</mixed-citation></ref>
<ref id="ref-12"><label>12.</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Ding</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Zhang</surname>, <given-names>X. Y.</given-names></string-name>, <string-name><surname>Wu</surname>, <given-names>D. Y.</given-names></string-name>, <string-name><surname>Liu</surname>, <given-names>M. L.</given-names></string-name></person-group> (<year>2021</year>). <article-title>Application of an extreme learning machine network with particle swarm optimization in syndrome classification of primary liver cancer</article-title>. <source>Journal of Integrative Medicine</source><italic>,</italic> <volume>19</volume><italic>(</italic><issue>5</issue><italic>),</italic> <fpage>395</fpage>&#x2013;<lpage>407</lpage>. DOI <pub-id pub-id-type="doi">10.1016/j.joim.2021.08.001</pub-id>.</mixed-citation></ref>
<ref id="ref-13"><label>13.</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhou</surname>, <given-names>C. M.</given-names></string-name>, <string-name><surname>Xue</surname>, <given-names>Q.</given-names></string-name>, <string-name><surname>Liu</surname>, <given-names>P. M.</given-names></string-name>, <string-name><surname>Duan</surname>, <given-names>W.</given-names></string-name>, <string-name><surname>Wang</surname>, <given-names>Y.</given-names></string-name> <etal>et al.</etal></person-group> (<year>2021</year>). <article-title>Construction of a predictive model of post-intubation hypotension in critically ill patients using multiple machine learning classifiers</article-title>. <source>Journal of Clinical Anesthesia</source><italic>,</italic> <volume>72</volume><italic>,</italic> <fpage>110279</fpage>. DOI <pub-id pub-id-type="doi">10.1016/j.jclinane.2021.110279</pub-id>.</mixed-citation></ref>
<ref id="ref-14"><label>14.</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Luo</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Song</surname>, <given-names>Y. L.</given-names></string-name>, <string-name><surname>Shang</surname>, <given-names>J. L.</given-names></string-name>, <string-name><surname>Wang</surname>, <given-names>L.</given-names></string-name></person-group> (<year>2019</year>). <article-title>Prediction of PI3K inhibitors based on naive bayesian machine learning</article-title>. <source>Chinese Journal of New Drugs</source><italic>,</italic> <volume>28</volume><italic>(</italic><issue>1</issue><italic>),</italic> <fpage>73</fpage>&#x2013;<lpage>80</lpage>.</mixed-citation></ref>
<ref id="ref-15"><label>15.</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zheng</surname>, <given-names>X. Z.</given-names></string-name>, <string-name><surname>Li</surname>, <given-names>M. H.</given-names></string-name>, <string-name><surname>Zhang</surname>, <given-names>Y. N.</given-names></string-name>, <string-name><surname>Jiang</surname>, <given-names>P.</given-names></string-name>, <string-name><surname>Wang</surname>, <given-names>B. Y.</given-names></string-name></person-group> (<year>2021</year>). <article-title>Research on the prediction model of coal spontaneous combustion temperature based on randem forest algorithm</article-title>. <source>Industry and Mine Automation</source><italic>,</italic> <volume>47</volume><italic>(</italic><issue>5</issue><italic>),</italic> <fpage>58</fpage>&#x2013;<lpage>64</lpage>. DOI <pub-id pub-id-type="doi">10.13272/j.issn.1671-251x.17700</pub-id>.</mixed-citation></ref>
<ref id="ref-16"><label>16.</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Xu</surname>, <given-names>M. X.</given-names></string-name>, <string-name><surname>Zheng</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Li</surname>, <given-names>Y. J.</given-names></string-name>, <string-name><surname>Wu</surname>, <given-names>W. H.</given-names></string-name></person-group> (<year>2022</year>). <article-title>Prediction of properties of anti-breast cancer drugs based on PSO-BP neural network and PSO-SVM</article-title>. <source>Journal of Nanjing University of Information Science &#x0026; Technology</source><italic>,</italic> <fpage>1</fpage>&#x2013;<lpage>20</lpage>.</mixed-citation></ref>
<ref id="ref-17"><label>17.</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Vilma</surname>, <given-names>S.</given-names></string-name></person-group> (<year>2021</year>). <article-title>Interpretability of selected variables and performance comparison of variable selection methods in a polyethylene and polypropylene NIR classification task</article-title>. <source>Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy</source><italic>,</italic> <volume>258</volume>
<issue>(8)</issue><italic>,</italic> <fpage>119850</fpage>. DOI <pub-id pub-id-type="doi">10.1016/J.SAA.2021.119850</pub-id>.</mixed-citation></ref>
<ref id="ref-18"><label>18.</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Tibshirani</surname>, <given-names>R.</given-names></string-name></person-group> (<year>1996</year>). <article-title>Regression shrinkage and selection via the lasso</article-title>. <source>Journal of the Royal Statistical Society: Series B (Methodological)</source><italic>,</italic> <volume>58</volume><italic>(</italic><issue>1</issue><italic>),</italic> <fpage>267</fpage>&#x2013;<lpage>288</lpage>. DOI <pub-id pub-id-type="doi">10.1111/j.2517-6161.1996.tb02080.x</pub-id>.</mixed-citation></ref>
<ref id="ref-19"><label>19.</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Bonsignore</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Trusso</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>de Pasquale</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Ferlazzo</surname>, <given-names>G.</given-names></string-name>, <string-name><surname>Allegra</surname>, <given-names>A.</given-names></string-name> <etal>et al.</etal></person-group> (<year>2021</year>). <article-title>A multivariate analysis of Multiple Myeloma subtype plasma cells</article-title>. <source>Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy</source><italic>,</italic> <volume>258</volume><italic>(</italic><issue>9686</issue><italic>),</italic> <fpage>119813</fpage>. DOI <pub-id pub-id-type="doi">10.1016/j.saa.2021.119813</pub-id>.</mixed-citation></ref>
<ref id="ref-20"><label>20.</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Xue</surname>, <given-names>J. K.</given-names></string-name>, <string-name><surname>Shen</surname>, <given-names>B.</given-names></string-name></person-group> (<year>2020</year>). <article-title>A novel swarm intelligence optimization approach: Sparrow search algorithm</article-title>. <source>Systems Science &#x0026; Control Engineering</source><italic>,</italic> <volume>8</volume><italic>(</italic><issue>1</issue><italic>),</italic> <fpage>22</fpage>&#x2013;<lpage>34</lpage>. DOI <pub-id pub-id-type="doi">10.1080/21642583.2019.1708830</pub-id>.</mixed-citation></ref>
<ref id="ref-21"><label>21.</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Breiman</surname>, <given-names>L.</given-names></string-name></person-group> (<year>2001</year>). <article-title>Random forests</article-title>. <source>Machine Learning</source><italic>,</italic> <volume>45</volume><italic>(</italic><issue>1</issue><italic>),</italic> <fpage>5</fpage>&#x2013;<lpage>32</lpage>. DOI <pub-id pub-id-type="doi">10.1023/A:1010933404324</pub-id>.</mixed-citation></ref>
</ref-list>
</back>
</article>