<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xml:lang="en" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CMC</journal-id>
<journal-id journal-id-type="nlm-ta">CMC</journal-id>
<journal-id journal-id-type="publisher-id">CMC</journal-id>
<journal-title-group>
<journal-title>Computers, Materials &#x0026; Continua</journal-title>
</journal-title-group>
<issn pub-type="epub">1546-2226</issn>
<issn pub-type="ppub">1546-2218</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">34933</article-id>
<article-id pub-id-type="doi">10.32604/cmc.2023.034933</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>A Novel Smart Beta Optimization Based on Probabilistic Forecast</article-title>
<alt-title alt-title-type="left-running-head">A Novel Smart Beta Optimization Based on Probabilistic Forecast</alt-title>
<alt-title alt-title-type="right-running-head">A Novel Smart Beta Optimization Based on Probabilistic Forecast</alt-title>
</title-group>
<contrib-group>
<contrib id="author-1" contrib-type="author">
<name name-style="western"><surname>Zhao</surname><given-names>Cheng</given-names></name><xref ref-type="aff" rid="aff-1">1</xref></contrib>
<contrib id="author-2" contrib-type="author">
<name name-style="western"><surname>Yang</surname><given-names>Shuyi</given-names></name><xref ref-type="aff" rid="aff-2">2</xref></contrib>
<contrib id="author-3" contrib-type="author">
<name name-style="western"><surname>Qin</surname><given-names>Chu</given-names></name><xref ref-type="aff" rid="aff-3">3</xref></contrib>
<contrib id="author-4" contrib-type="author">
<name name-style="western"><surname>Zhou</surname><given-names>Jie</given-names></name><xref ref-type="aff" rid="aff-4">4</xref></contrib>
<contrib id="author-5" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Chen</surname><given-names>Longxiang</given-names></name><xref ref-type="aff" rid="aff-5">5</xref><email>chenlx@zjut.edu.cn</email></contrib>
<aff id="aff-1"><label>1</label><institution>School of Economics, Zhejiang University of Technology</institution>, <addr-line>Hangzhou, 310023</addr-line>, <country>China</country></aff>
<aff id="aff-2"><label>2</label><institution>School of Computer Science, Zhejiang University of Technology</institution>, <addr-line>Hangzhou, 310023</addr-line>, <country>China</country></aff>
<aff id="aff-3"><label>3</label><institution>School of Management, Zhejiang University of Technology</institution>, <addr-line>Hangzhou, 310023</addr-line>, <country>China</country></aff>
<aff id="aff-4"><label>4</label><institution>School of Marxism, Zhejiang Chinese Medical University</institution>, <addr-line>Hangzhou, 310053</addr-line>, <country>China</country></aff>
<aff id="aff-5"><label>5</label><institution>Informatization Office, Zhejiang University of Technology</institution>, <addr-line>Hangzhou, 310023</addr-line>, <country>China</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>&#x002A;</label>Corresponding Author: Longxiang Chen. Email: <email>chenlx@zjut.edu.cn</email></corresp>
</author-notes>
<pub-date date-type="collection" publication-format="electronic"><year>2023</year></pub-date>
<pub-date date-type="pub" publication-format="electronic"><day>24</day><month>1</month><year>2023</year></pub-date>
<volume>75</volume>
<issue>1</issue>
<fpage>477</fpage>
<lpage>491</lpage>
<history>
<date date-type="received"><day>01</day><month>8</month><year>2022</year></date>
<date date-type="accepted"><day>09</day><month>11</month><year>2022</year></date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2023 Zhao et al.</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Zhao et al.</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CMC_34933.pdf"></self-uri>
<abstract><p>Rule-based portfolio construction strategies are rising as investment demand grows, and smart beta strategies are becoming a trend among institutional investors. Smart beta strategies have high transparency, low management costs, and better long-term performance, but are at the risk of severe short-term declines due to a lack of Risk Control tools. Although there are some methods to use historical volatility for Risk Control, it is still difficult to adapt to the rapid switch of market styles. How to strengthen the Risk Control management of the portfolio while maintaining the original advantages of smart beta has become a new issue of concern in the industry. This paper demonstrates the scientific validity of using a probability prediction for position optimization through an optimization theory and proposes a novel natural gradient boosting (NGBoost)-based portfolio optimization method, which predicts stock prices and their probability distributions based on non-Bayesian methods and maximizes the Sharpe ratio expectation of position optimization. This paper validates the effectiveness and practicality of the model by using the Chinese stock market, and the experimental results show that the proposed method in this paper can reduce the volatility by 0.08 and increase the expected portfolio cumulative return (reaching a maximum of 67.1&#x0025;) compared with the mainstream methods in the industry.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>NGBoost</kwd>
<kwd>portfolio optimization</kwd>
<kwd>probabilistic prediction</kwd>
<kwd>financial trading</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1"><label>1</label><title>Introduction</title>
<p>Over the last few years, the term &#x201C;Smart Beta&#x201D; has become ubiquitous in asset management. Factor investing, the financial theory underlying Smart beta, has been around since the 1960s when factors were first identified as drivers of equity returns. The return on these factors can be a source of risk or improved return, and it is critical to understand whether higher returns adequately compensate for any additional risk.</p>
<p>Active managers can build portfolios with specific factor exposures by selecting stocks based on their factor exposures and use factor investment to improve portfolio returns and reduce risks, depending on their specific objectives. Smart beta aims to achieve these objectives at a lower cost by utilizing a transparent, systematic, rules-based approach, significantly lowering costs compared to active management.</p>
<p>Although there is no consensus among investors on the definition of smart beta strategy, the one common characteristic that all smart beta strategies or indices share is that they are rules-based. The smart beta strategy integrates the ideas of active and passive portfolio management, weighting the assets by using methods other than market value-weighted methods.</p>
<p>While smart beta strategies have shown good performance over the long term, they often suffer severe short-term declines. The original smart beta strategies tended to manage the factors of style exposure and did not pay enough attention to the weighted optimization part. However, traditional portfolio management strategies are only based on historical data and do not fully exploit the future characteristics that can be expressed by mining historical data. They often use mean historical return as expected return, which induces a low pass filtering influence on the stock market&#x2019;s behaviors, thus obtaining inaccurate estimates of future short-term returns [<xref ref-type="bibr" rid="ref-1">1</xref>&#x2013;<xref ref-type="bibr" rid="ref-3">3</xref>]. Thus, we believe it is more scientific to manage them rationally based on forecast results.</p>
<p>The construction of a smart beta strategy involves two major issues: stock selection optimization and weight optimization. Huang [<xref ref-type="bibr" rid="ref-3">3</xref>] created a stock selection model based on support vector regression (SVR) and genetic algorithms (GAs), in which SVR is used to predict each stock&#x2019;s future return and GAs are used to optimize the model parameters and input features, and then weighted the top-ranked stocks equally to create the portfolio. Wang&#x00A0;et&#x00A0;al.&#x00A0;[<xref ref-type="bibr" rid="ref-4">4</xref>] proposed an optimal portfolio construction method based on long short-term memory (LSTM) networks and mean-variance (MV) models. Chen&#x00A0;et&#x00A0;al.&#x00A0;[<xref ref-type="bibr" rid="ref-5">5</xref>] proposed a new portfolio construction method that used a hybrid model combining extreme Gradient Boosting (XGBoost) and an improved Firefly Algorithm (IFA) to forecast future stock prices and use the MV model for portfolio selection. Tripathy&#x00A0;et&#x00A0;al.&#x00A0;[<xref ref-type="bibr" rid="ref-6">6</xref>] summarized the Harris Hawk Optimizer (HHO) method being used to perform parameter optimization of regression techniques, for example, SVR. Braik al. [<xref ref-type="bibr" rid="ref-7">7</xref>] proposed a hybrid crow search algorithm for solving numerical and constrained global optimization problems. Alzubi&#x00A0;et&#x00A0;al.&#x00A0;[<xref ref-type="bibr" rid="ref-8">8</xref>] put forward an efficient malware detection approach to feature weighting based. Alzubi&#x00A0;et&#x00A0;al.&#x00A0;[<xref ref-type="bibr" rid="ref-9">9</xref>] proposed an optimal pruning algorithm called the dynamic programming approach to improving the accuracy. Many researchers [<xref ref-type="bibr" rid="ref-9">9</xref>&#x2013;<xref ref-type="bibr" rid="ref-12">12</xref>] employed these forecasting models for stock pre-selection before portfolio creation, random forest (RF), SVR, LSTM, deep multi-layer perceptron (DMLP), and convolutional neural network (CNN). They then integrated their return prediction results into advancing MV and omega portfolio optimization models, respectively. However, all these machine learning methods in portfolio formation are purely dependent on the point prediction outcomes of maximum likelihood estimates and historical volatility, neglecting the uncertainty of the forecast.</p>
<p>The uncertainty of forecasts is critical for investors [<xref ref-type="bibr" rid="ref-13">13</xref>]. If an investor believes that a stock is likely to rise in value, they will buy more of that stock, which is known as the &#x201C;certainty effect&#x201D; in behavioral finance. To make optimal decisions, investors can quantify risk by estimating the uncertainty of the outcome. Forecast uncertainty can indicate the forecast model&#x2019;s confidence in the forecast results and has some ability to explain the forecast results. Since it relies on machine learning analysis and obtains more objective results, this paper considers forecast uncertainty to be an important factor to consider in a smart beta strategy. Probabilistic prediction is a type of uncertainty forecasting, which, compared to traditional point forecasting, yields not only maximum likelihood estimates but also information about their probability distributions, which can provide more comprehensive guidance for portfolio management. Investors can manage uncertainty exposure during investment management. The methods of probabilistic prediction can be broadly classified into Bayesian and non-Bayesian methods [<xref ref-type="bibr" rid="ref-14">14</xref>]. Bayesian models offer a mathematically grounded framework to reason about model uncertainty. By integrating predictions across the posterior, Bayesian approaches naturally provide predictive uncertainty estimates, but they have practical drawbacks when primarily concerned with predictive uncertainty. Exact solutions of Bayesian models are limited to simple models, and calculating the posterior distribution for more powerful models such as Neural Networks (NN) and Bayesian Additive Regression Trees (BART) is difficult. Furthermore, sampling-based inference necessitates some statistical knowledge, which limits the use of Bayesian approaches.</p>
<p>The non-Bayesian approach is simple to implement and parallelize, and it produces high-quality estimates of prediction uncertainty that are more suited to huge financial data sets. To improve the predictive uncertainty estimation capability of a single deterministic deep neural network (DNN), Liu&#x00A0;et&#x00A0;al.&#x00A0;[<xref ref-type="bibr" rid="ref-15">15</xref>] presented the Spectral Normalized Neural Gaussian Process (SNGP). However, DNN&#x2019;s ability to capture features of structured input data is limited, and prediction accuracy is low. Deep ensembles fit an ensemble of neural networks to the dataset and obtain predictive uncertainty by making an approximation to the Gaussian mixture arising out of the ensemble [<xref ref-type="bibr" rid="ref-16">16</xref>]. Moreover, deep ensembles can give overconfident uncertainty estimates in practice [<xref ref-type="bibr" rid="ref-17">17</xref>]. Natural gradient boosting (NGBoost) combines a multi-parameter boosting algorithm with the natural gradient to efficiently estimate how parameters of the presumed outcome distribution vary with the observed features. In comparison to existing probabilistic prediction approaches, NGBoost is appropriate for structured input data prediction, and it is simple to use, flexible, and capable of obtaining good uncertainty estimates, allowing it to effectively measure various real-time risks. NGBoost has also been used to predict temperature [<xref ref-type="bibr" rid="ref-18">18</xref>], short-term solar irradiance, and short-term wind power prediction models&#x00A0;[<xref ref-type="bibr" rid="ref-19">19</xref>]. We summarized the existing research work in <xref ref-type="table" rid="table-1">Table 1</xref>.</p>
<table-wrap id="table-1"><label>Table 1</label><caption><title>Comparison of existing work</title></caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">Reference</th>
<th align="left">Prediction model</th>
<th align="left">Robustness</th>
<th align="left">Prediction accuracy</th>
<th align="left">Portfolio management</th>
<th align="left">Investment risk</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-4">4</xref>]</td>
<td align="left">LSTM</td>
<td align="left">Average</td>
<td align="left">Good</td>
<td align="left">MV</td>
<td align="left">Low</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-5">5</xref>]</td>
<td align="left">XGboost</td>
<td align="left">Average</td>
<td align="left">Average</td>
<td align="left">MV</td>
<td align="left">Average</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-20">20</xref>]</td>
<td align="left">ARMIA</td>
<td align="left">Good</td>
<td align="left">Bad</td>
<td align="left">Omega</td>
<td align="left">High</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-21">21</xref>]</td>
<td align="left">RF</td>
<td align="left">Average</td>
<td align="left">Average</td>
<td align="left">Hierarchical risk parity</td>
<td align="left">High</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-22">22</xref>]</td>
<td align="left">SVR</td>
<td align="left">Average</td>
<td align="left">Average</td>
<td align="left">1/N</td>
<td align="left">Average</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>From the perspective of balancing returns and risks, this paper provides market investors with a better portfolio management strategy through a more aggressive smart index investment concept based on the probabilistic prediction of stock selection and the impact on prediction uncertainty of asset allocation, respectively, by optimally adjusting asset allocation ratios, we try to improve the overall performance of asset portfolios. This paper proposes an NGBoost-based portfolio optimization model (NGB-PF) that first selects stocks with high returns based on probabilistic forecasts, and then uses the uncertainty information derived from the probabilistic forecasts for portfolio management. The proposed model is more resilient to risks than the existing methods.</p>
<p>In this paper, our main contributions to this work are as follows: First, this paper demonstrates the possibility of using uncertainty forecasting methods for portfolio management; Second, we develop a probabilistic prediction-based portfolio optimization model, NGB-PF, for portfolio optimization; Third, we conducted comparative experiments, and the experimental results demonstrate that the NGB-PF model can effectively improve the investment return per unit risk under the condition that the prediction uncertainty is taken into account.</p>
<p>The rest of the paper is organized as follows. Section 2 describes the proposed model. Section 3 validates the model. Finally, Section 4 concludes.</p>
</sec>
<sec id="s2"><label>2</label><title>Methods</title>
<sec id="s2_1"><label>2.1</label><title>Portfolio Optimization Based on Probabilistic Forecasting</title>
<p>By using probabilistic forecasting, investors can learn about an asset&#x2019;s predicted return <italic>r</italic> and probability distribution. The variance of the predicted return is denoted by <inline-formula id="ieqn-1"><mml:math id="mml-ieqn-1"><mml:msup><mml:mi>&#x03C3;</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula>(<inline-formula id="ieqn-2"><mml:math id="mml-ieqn-2"><mml:msup><mml:mi>&#x03C3;</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>&#x2265;</mml:mo><mml:mn>0</mml:mn></mml:math></inline-formula>).</p>
<p>Based on the probabilistic forecast results, n stocks from the feasible asset allocation domain are chosen to form a portfolio <inline-formula id="ieqn-3"><mml:math id="mml-ieqn-3"><mml:mi>P</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula>(<inline-formula id="ieqn-4"><mml:math id="mml-ieqn-4"><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mtext>&#xA0;denotes stock&#xA0;</mml:mtext></mml:mrow><mml:mi>i</mml:mi></mml:math></inline-formula>), corresponding to a predicted return of <inline-formula id="ieqn-5"><mml:math id="mml-ieqn-5"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>P</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula id="ieqn-6"><mml:math id="mml-ieqn-6"><mml:mi>X</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula> is the investor&#x2019;s investment proportion in stock <inline-formula id="ieqn-7"><mml:math id="mml-ieqn-7"><mml:mi>i</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:mi>n</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> (where <inline-formula id="ieqn-8"><mml:math id="mml-ieqn-8"><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2265;</mml:mo><mml:mn>0</mml:mn></mml:math></inline-formula> and <inline-formula id="ieqn-9"><mml:math id="mml-ieqn-9"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:msubsup><mml:mrow><mml:mo>&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msubsup><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mstyle></mml:math></inline-formula>), then the predicted return of <italic>P</italic> is <inline-formula id="ieqn-10"><mml:math id="mml-ieqn-10"><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>X</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:msubsup><mml:mrow><mml:mo>&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msubsup><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mstyle></mml:math></inline-formula>, <inline-formula id="ieqn-11"><mml:math id="mml-ieqn-11"><mml:mi>C</mml:mi><mml:mi>o</mml:mi><mml:mi>v</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> denotes the predicted return between <inline-formula id="ieqn-12"><mml:math id="mml-ieqn-12"><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula id="ieqn-13"><mml:math id="mml-ieqn-13"><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>. The covariance of <italic>P</italic> is <inline-formula id="ieqn-14"><mml:math id="mml-ieqn-14"><mml:msubsup><mml:mrow><mml:mi>&#x03C3;</mml:mi></mml:mrow><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:msubsup><mml:mrow><mml:mo>&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msubsup><mml:mstyle displaystyle="true" scriptlevel="0"><mml:msubsup><mml:mrow><mml:mo>&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msubsup><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mi>C</mml:mi><mml:mi>o</mml:mi><mml:mi>v</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mstyle></mml:mstyle></mml:math></inline-formula>.</p>
<p>The goal of portfolio management is to balance return and risk, and the Sharpe ratio indicates the excess return per unit of risk, with a higher Sharpe ratio indicating better portfolio performance, so this paper employs the Sharpe ratio as an objective function. Sharpe ratio is calculated by the following <xref ref-type="disp-formula" rid="eqn-1">Eq. (1)</xref>:
<disp-formula id="eqn-1"><label>(1)</label><mml:math id="mml-eqn-1" display="block"><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>f</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:msub><mml:mi>&#x03C3;</mml:mi><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub></mml:mfrac></mml:math></disp-formula>where <inline-formula id="ieqn-15"><mml:math id="mml-ieqn-15"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>P</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> denotes the forecasted return of the portfolio; <inline-formula id="ieqn-16"><mml:math id="mml-ieqn-16"><mml:msub><mml:mrow><mml:mtext>r</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>f</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> denotes the risk-free return; <inline-formula id="ieqn-17"><mml:math id="mml-ieqn-17"><mml:msub><mml:mrow><mml:mi mathvariant="normal">&#x03C3;</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mtext>P</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> denotes the standard deviation of the forecasted return of the portfolio; and <inline-formula id="ieqn-18"><mml:math id="mml-ieqn-18"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>P</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>f</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is the risk premium.</p>
<p>The Smart beta strategy uses the predicted values obtained from probabilistic forecasting to construct the portfolio and the probability distribution obtained from probabilistic forecasting to achieve asset allocation. The smart beta strategy translates to solving the optimization problem shown in <xref ref-type="disp-formula" rid="eqn-2">Eq. (2)</xref>.
<disp-formula id="eqn-2"><label>(2)</label><mml:math id="mml-eqn-2" display="block"><mml:mrow><mml:mo>{</mml:mo><mml:mtable columnalign="left left" rowspacing=".2em" columnspacing="1em" displaystyle="false"><mml:mtr><mml:mtd><mml:mo movablelimits="true" form="prefix">max</mml:mo><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>X</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>f</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:msub><mml:mi>&#x03C3;</mml:mi><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub></mml:mfrac></mml:mstyle></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi>s</mml:mi><mml:mo>.</mml:mo><mml:mi>t</mml:mi><mml:mo>.</mml:mo><mml:mtable columnalign="left" rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mstyle displaystyle="true" scriptlevel="0"><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:munderover><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mstyle></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mn>0</mml:mn><mml:mo>&#x2264;</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2264;</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant="normal">&#x2200;</mml:mi><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:mi>n</mml:mi></mml:mtd></mml:mtr></mml:mtable></mml:mtd></mml:mtr></mml:mtable><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>Consider the case where the portfolio only contains two stocks, and define <inline-formula id="ieqn-19"><mml:math id="mml-ieqn-19"><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2208;</mml:mo><mml:mi>P</mml:mi></mml:math></inline-formula> as the asset allocation that invests in <inline-formula id="ieqn-20"><mml:math id="mml-ieqn-20"><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula id="ieqn-21"><mml:math id="mml-ieqn-21"><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> according to the weights of <italic>x</italic> and <inline-formula id="ieqn-22"><mml:math id="mml-ieqn-22"><mml:mrow><mml:mo>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, respectively, to allocate the portfolio with the highest Sharpe ratio in the feasible asset allocation, i.e., <inline-formula id="ieqn-23"><mml:math id="mml-ieqn-23"><mml:mi mathvariant="normal">&#x2200;</mml:mi><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2208;</mml:mo><mml:mi>P</mml:mi><mml:mo>:</mml:mo><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mi>P</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2264;</mml:mo><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mi>P</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>. <inline-formula id="ieqn-24"><mml:math id="mml-ieqn-24"><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>&#x2019;s predicted return and uncertainty level (risk level) are denoted by <inline-formula id="ieqn-25"><mml:math id="mml-ieqn-25"><mml:mover><mml:mi>r</mml:mi><mml:mo accent="false">&#x00AF;</mml:mo></mml:mover><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula id="ieqn-26"><mml:math id="mml-ieqn-26"><mml:mi>&#x03C3;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, respectively.</p>
<p>The maximum predicted return is different for different levels of risk. The maximum forecast return that can be obtained for a given level of risk is the set of optimal portfolios. These optimal combinations will form a curve on the predicted value-volatility plane, which we call <inline-formula id="ieqn-27"><mml:math id="mml-ieqn-27"><mml:mi>f</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x03C3;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mover><mml:mi>r</mml:mi><mml:mo accent="false">&#x00AF;</mml:mo></mml:mover><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>. This curve depicts the relationship between expected returns and volatility for a variety of viable assets. where <inline-formula id="ieqn-28"><mml:math id="mml-ieqn-28"><mml:mi>&#x03C3;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula id="ieqn-29"><mml:math id="mml-ieqn-29"><mml:mover><mml:mi>r</mml:mi><mml:mo accent="false">&#x00AF;</mml:mo></mml:mover><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> are as <xref ref-type="disp-formula" rid="eqn-3">Eqs. (3)</xref> and <xref ref-type="disp-formula" rid="eqn-4">(4)</xref>:
<disp-formula id="eqn-3"><label>(3)</label><mml:math id="mml-eqn-3" display="block"><mml:msup><mml:mi>&#x03C3;</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mi>&#x03C3;</mml:mi></mml:mrow><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msup><mml:mi>x</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:msubsup><mml:mrow><mml:mi>&#x03C3;</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:msubsup><mml:mrow><mml:mi>&#x03C3;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:mn>2</mml:mn><mml:mi>x</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mi>C</mml:mi><mml:mi>o</mml:mi><mml:mi>v</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula>
<disp-formula id="eqn-4"><label>(4)</label><mml:math id="mml-eqn-4" display="block"><mml:mover><mml:mi>r</mml:mi><mml:mo accent="false">&#x00AF;</mml:mo></mml:mover><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>x</mml:mi><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math></disp-formula></p>
<p>On the predicted value-volatility plane, the Sharpe ratio of an asset allocation is equal to the slope of the line connecting the risk-free asset to this asset allocation. To find the asset allocation with the largest Sharpe ratio, the line must be tangential to the curve and pass through the risk-free asset point.</p>
<p>The slope at portfolio <inline-formula id="ieqn-30"><mml:math id="mml-ieqn-30"><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is expressed in <xref ref-type="disp-formula" rid="eqn-5">Eq. (5)</xref>.
<disp-formula id="eqn-5"><label>(5)</label><mml:math id="mml-eqn-5" display="block"><mml:mfrac><mml:mrow><mml:mi>d</mml:mi><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:mi>d</mml:mi><mml:msub><mml:mi>&#x03C3;</mml:mi><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>]</mml:mo></mml:mrow><mml:msqrt><mml:msup><mml:mi>x</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:msubsup><mml:mrow><mml:mi>&#x03C3;</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:msubsup><mml:mrow><mml:mi>&#x03C3;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:mn>2</mml:mn><mml:mi>x</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mi>C</mml:mi><mml:mi>o</mml:mi><mml:mi>v</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:msqrt></mml:mrow><mml:mrow><mml:mi>x</mml:mi><mml:msubsup><mml:mrow><mml:mi>&#x03C3;</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mo>)</mml:mo></mml:mrow><mml:msubsup><mml:mrow><mml:mi>&#x03C3;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mn>2</mml:mn><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mi>C</mml:mi><mml:mi>o</mml:mi><mml:mi>v</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mfrac><mml:mo>.</mml:mo></mml:math></disp-formula></p>
<p>The Sharpe ratio of a portfolio is expressed in <xref ref-type="disp-formula" rid="eqn-6">Eq. (6)</xref>.
<disp-formula id="eqn-6"><label>(6)</label><mml:math id="mml-eqn-6" display="block"><mml:msub><mml:mi>S</mml:mi><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mo>:=</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>f</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:msub><mml:mi>&#x03C3;</mml:mi><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub></mml:mfrac><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>x</mml:mi><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>f</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:msqrt><mml:msup><mml:mi>x</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:msubsup><mml:mrow><mml:mi>&#x03C3;</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:msubsup><mml:mrow><mml:mi>&#x03C3;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:mn>2</mml:mn><mml:mi>x</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mi>C</mml:mi><mml:mi>o</mml:mi><mml:mi>v</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:msqrt></mml:mfrac><mml:mo>.</mml:mo></mml:math></disp-formula></p>
<p>Let <xref ref-type="disp-formula" rid="eqn-5">Eqs. (5)</xref> and <xref ref-type="disp-formula" rid="eqn-6">(6)</xref> be equal and solve for
<disp-formula id="eqn-7"><label>(7)</label><mml:math id="mml-eqn-7" display="block"><mml:mi>x</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>f</mml:mi></mml:mrow></mml:msub><mml:mo>]</mml:mo></mml:mrow><mml:mi>C</mml:mi><mml:mi>o</mml:mi><mml:mi>v</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>f</mml:mi></mml:mrow></mml:msub><mml:mo>]</mml:mo></mml:mrow><mml:msubsup><mml:mrow><mml:mi>&#x03C3;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:mrow><mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>]</mml:mo></mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:mi>C</mml:mi><mml:mi>o</mml:mi><mml:mi>v</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>&#x2212;</mml:mo><mml:msubsup><mml:mrow><mml:mi>&#x03C3;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>]</mml:mo></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>f</mml:mi></mml:mrow></mml:msub><mml:mo>]</mml:mo></mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:msubsup><mml:mrow><mml:mi>&#x03C3;</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mrow><mml:mi>&#x03C3;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>&#x2212;</mml:mo><mml:mn>2</mml:mn><mml:mi>C</mml:mi><mml:mi>o</mml:mi><mml:mi>v</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>]</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:math></disp-formula>that is, the optimal two stocks are assigned weights based on the predicted values, <inline-formula id="ieqn-31"><mml:math id="mml-ieqn-31"><mml:mi>X</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>}</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula>.</p>
</sec>
<sec id="s2_2"><label>2.2</label><title>Proposed Model: NGB-PF</title>
<p>As shown in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>, the NGB-PF model developed in this research is separated into three stages and based on probabilistic forecasting. The stock data is standardized in the data pre-processing stage. The probabilistic prediction stage involves predicting each stock&#x2019;s future return based on historical data and the associated uncertainty information. Through the PF model, the portfolio optimization stage obtains asset allocation ratios and forms trading decisions using the uncertainty information received from the preceding step of probabilistic prediction.</p>
<fig id="fig-1"><label>Figure 1</label><caption><title>Flow chart of the proposed model</title></caption><graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_34933-fig-1.tif"/></fig>
<sec id="s2_2_1"><label>2.2.1</label><title>Data Pre-Processing</title>
<p>First, the stocks with suspension cases and small market capitalization in the dataset are removed. The obtained stock data may have disordered or missing values, so we need to operate sorting and adding missing values to obtain a complete and valid stock time series data set.</p>
<p>Due to the differences in the dimensionality of the time series features, the feature values need to be normalized. The normalization of each feature component is performed using the maximum minimization method. The normalized value of the eigenvalue x is expressed in <xref ref-type="disp-formula" rid="eqn-8">Eq. (8)</xref>.
<disp-formula id="eqn-8"><label>(8)</label><mml:math id="mml-eqn-8" display="block"><mml:msup><mml:mi>x</mml:mi><mml:mrow><mml:msup><mml:mi></mml:mi><mml:mo>&#x2032;</mml:mo></mml:msup></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>x</mml:mi><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mo movablelimits="true" form="prefix">min</mml:mo></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mo movablelimits="true" form="prefix">max</mml:mo></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mo movablelimits="true" form="prefix">min</mml:mo></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:math></disp-formula></p>
</sec>
<sec id="s2_2_2"><label>2.2.2</label><title>Probabilistic Prediction</title>
<p>Since NGBoost is designed to be scalable and modular, it has base learners (e.g., decision trees), probability distribution parameters (e.g., normal distribution, Laplace distribution, etc.), and scoring rules (e.g., maximum likelihood estimation), it may employ flexible tree-based models for probabilistic prediction. The flowchart of NGBoost is shown in <xref ref-type="fig" rid="fig-2">Fig. 2</xref>, which first passes the technical indicator <italic>x</italic> to the base learner (decision tree) to obtain a probability prediction with probability density <inline-formula id="ieqn-32"><mml:math id="mml-ieqn-32"><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo><mml:mi>y</mml:mi><mml:mo>|</mml:mo></mml:mrow><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> over the entire outcome space <italic>y</italic>. The model is then optimized by employing the maximum likelihood estimation function, which yields the calibrated uncertainty and point predictions as the scoring rule <inline-formula id="ieqn-33"><mml:math id="mml-ieqn-33"><mml:mi>S</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>.</p>
<fig id="fig-2"><label>Figure 2</label><caption><title>NGBoost model flow chart</title></caption><graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_34933-fig-2.tif"/></fig>
<p>The generalized natural gradient is the direction of the fastest ascent in Riemannian space, and it has the advantage of remaining invariant to parameters under different distributional changes. As a result, NGBoost employs natural gradient learning parameters, allowing the optimization problem to be parameterized-independent. Each base learner is permitted to fit the natural gradient in the framework of a gradient boost machine (GBM), and after scaling and additive combination, the parameters of the integrated model are acquired. For probabilistic prediction, the parameters of the final conditional distribution can be learned from the parameters of the integrated model. In standard prediction settings, the object of interest is an estimate of the scalar function <inline-formula id="ieqn-34"><mml:math id="mml-ieqn-34"><mml:mi>E</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo><mml:mi>y</mml:mi><mml:mo>|</mml:mo></mml:mrow><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, where <italic>x</italic> is a vector of observed features and <italic>y</italic> is the prediction target. NGBoost generates a probability prediction with probability density <inline-formula id="ieqn-35"><mml:math id="mml-ieqn-35"><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo><mml:mi>y</mml:mi><mml:mo>|</mml:mo></mml:mrow><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> by directly predicting the parameter <inline-formula id="ieqn-36"><mml:math id="mml-ieqn-36"><mml:mi>&#x03B8;</mml:mi></mml:math></inline-formula>, and then the probability is determined so that the parameter satisfies the distribution of the observed feature vectors. The algorithm is made up of three modules: the base learner (<inline-formula id="ieqn-37"><mml:math id="mml-ieqn-37"><mml:mi>f</mml:mi></mml:math></inline-formula>), the parameter probability distribution (<inline-formula id="ieqn-38"><mml:math id="mml-ieqn-38"><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>), and the appropriate scoring rule (<inline-formula id="ieqn-39"><mml:math id="mml-ieqn-39"><mml:mi>S</mml:mi></mml:math></inline-formula>).</p>
<p>A prediction <inline-formula id="ieqn-40"><mml:math id="mml-ieqn-40"><mml:mrow><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo><mml:mi>y</mml:mi><mml:mo>|</mml:mo></mml:mrow><mml:mi>x</mml:mi></mml:math></inline-formula> on a new input <italic>x</italic> is made in the form of a conditional distribution <inline-formula id="ieqn-41"><mml:math id="mml-ieqn-41"><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, whose parameters <inline-formula id="ieqn-42"><mml:math id="mml-ieqn-42"><mml:mi>&#x03B8;</mml:mi></mml:math></inline-formula> are obtained by an additive combination of M base learner outputs (corresponding to the M gradient boosting stages) and an initial <inline-formula id="ieqn-43"><mml:math id="mml-ieqn-43"><mml:msup><mml:mi>&#x03B8;</mml:mi><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mn>0</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msup></mml:math></inline-formula>. Note that <inline-formula id="ieqn-44"><mml:math id="mml-ieqn-44"><mml:mi>&#x03B8;</mml:mi></mml:math></inline-formula> can be a vector of parameters (not limited to scalar-valued), and they completely determine the probabilistic prediction <inline-formula id="ieqn-45"><mml:math id="mml-ieqn-45"><mml:mrow><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo><mml:mi>y</mml:mi><mml:mo>|</mml:mo></mml:mrow><mml:mi>x</mml:mi></mml:math></inline-formula>. <inline-formula id="ieqn-46"><mml:math id="mml-ieqn-46"><mml:msup><mml:mi>f</mml:mi><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>m</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msup></mml:math></inline-formula> refers to a collection of base learners of stage m, one for each parameter. Each base learner <inline-formula id="ieqn-47"><mml:math id="mml-ieqn-47"><mml:msup><mml:mi>f</mml:mi><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>m</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msup></mml:math></inline-formula> takes <italic>x</italic> as input to calculate the prediction parameter for <italic>x</italic>. When using the normal distribution in the experiment (<inline-formula id="ieqn-48"><mml:math id="mml-ieqn-48"><mml:mi>&#x03B8;</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x03BC;</mml:mi><mml:mo>,</mml:mo><mml:mi>log</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mi>&#x03C3;</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>), there are two base learners <inline-formula id="ieqn-49"><mml:math id="mml-ieqn-49"><mml:msubsup><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x03BC;</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>m</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula id="ieqn-50"><mml:math id="mml-ieqn-50"><mml:msubsup><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mi>log</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mi>&#x03C3;</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>m</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> at each stage, uniformly expressed as: <inline-formula id="ieqn-51"><mml:math id="mml-ieqn-51"><mml:msup><mml:mi>f</mml:mi><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>m</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:msubsup><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x03BC;</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>m</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mi>log</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mi>&#x03C3;</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>m</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>. The predicted output is scaled by a stage-specific scaling factor <inline-formula id="ieqn-52"><mml:math id="mml-ieqn-52"><mml:msup><mml:mi>&#x03C1;</mml:mi><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>m</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msup></mml:math></inline-formula> and a common learning rate <inline-formula id="ieqn-53"><mml:math id="mml-ieqn-53"><mml:mi>&#x03B7;</mml:mi></mml:math></inline-formula>. It is expressed in <xref ref-type="disp-formula" rid="eqn-9">Eq. (9)</xref>.
<disp-formula id="eqn-9"><label>(9)</label><mml:math id="mml-eqn-9" display="block"><mml:mi>y</mml:mi><mml:mo fence="false" stretchy="false">|</mml:mo><mml:mi>x</mml:mi><mml:mo>&#x223C;</mml:mo><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>y</mml:mi><mml:mo fence="false" stretchy="false">|</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mi>&#x03B8;</mml:mi><mml:mo>=</mml:mo><mml:msup><mml:mi>&#x03B8;</mml:mi><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mn>0</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:mo>&#x2212;</mml:mo><mml:mi>&#x03B7;</mml:mi><mml:mstyle displaystyle="true" scriptlevel="0"><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>m</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:munderover><mml:mi>&#x03B2;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>m</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x22C5;</mml:mo><mml:msup><mml:mi>f</mml:mi><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>m</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mstyle></mml:math></disp-formula></p>
<p>The scoring rule <italic>S</italic> is the gradient over the probability distribution <inline-formula id="ieqn-54"><mml:math id="mml-ieqn-54"><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> concerning the parameter <inline-formula id="ieqn-55"><mml:math id="mml-ieqn-55"><mml:mi>&#x03B8;</mml:mi></mml:math></inline-formula>, denoted as <inline-formula id="ieqn-56"><mml:math id="mml-ieqn-56"><mml:mi mathvariant="normal">&#x2207;</mml:mi><mml:mi>S</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x03B8;</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, where <inline-formula id="ieqn-57"><mml:math id="mml-ieqn-57"><mml:mi>&#x03B8;</mml:mi></mml:math></inline-formula> is the parameter and <italic>y</italic> is the predicted target, which is the direction of the most rapid ascent, and a very small amount of moving the parameter in the direction of this gradient increases the scoring rule by the most, which is expressed in <xref ref-type="disp-formula" rid="eqn-10">Eq. (10)</xref>,
<disp-formula id="eqn-10"><label>(10)</label><mml:math id="mml-eqn-10" display="block"><mml:mover><mml:mi mathvariant="normal">&#x2207;</mml:mi><mml:mo>&#x223C;</mml:mo></mml:mover><mml:mi>S</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x03B8;</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x221D;</mml:mo><mml:munder><mml:mo movablelimits="true" form="prefix">lim</mml:mo><mml:mrow><mml:mi>&#x03B5;</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:munder><mml:munder><mml:mrow><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>g</mml:mi><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>d</mml:mi><mml:mo>:</mml:mo><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2225;</mml:mo><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>&#x03B8;</mml:mi><mml:mo>+</mml:mo><mml:mi>d</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>&#x03B5;</mml:mi></mml:mrow></mml:munder><mml:mi>S</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x03B8;</mml:mi><mml:mo>+</mml:mo><mml:mi>d</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula>where d is the gradient&#x2019;s change in direction and D is the divergence.</p>
<p>By solving the above optimization problem, the natural gradient of the problem can be obtained by <xref ref-type="disp-formula" rid="eqn-11">Eq. (11)</xref>:
<disp-formula id="eqn-11"><label>(11)</label><mml:math id="mml-eqn-11" display="block"><mml:mover><mml:mi mathvariant="normal">&#x2207;</mml:mi><mml:mo>&#x223C;</mml:mo></mml:mover><mml:mi>S</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x03B8;</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x221D;</mml:mo><mml:msub><mml:mi>I</mml:mi><mml:mrow><mml:mi>S</mml:mi></mml:mrow></mml:msub><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x03B8;</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mi mathvariant="normal">&#x2207;</mml:mi><mml:mi>S</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x03B8;</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula>where <inline-formula id="ieqn-58"><mml:math id="mml-ieqn-58"><mml:msub><mml:mi>I</mml:mi><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x03B8;</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is the Riemannian measure of the statistical manifold at <inline-formula id="ieqn-59"><mml:math id="mml-ieqn-59"><mml:mi>&#x03B8;</mml:mi></mml:math></inline-formula>, which is derived from the scoring rule <italic>S</italic>. Let S &#x003D; L, where L is the parameter estimate used for the scoring rule S. The NGBoost algorithm uses the maximum likelihood estimation by default. Solving the above optimization problem yields <xref ref-type="disp-formula" rid="eqn-12">Eq. (12)</xref>:
<disp-formula id="eqn-12"><label>(12)</label><mml:math id="mml-eqn-12" display="block"><mml:mover><mml:mi mathvariant="normal">&#x2207;</mml:mi><mml:mo>&#x223C;</mml:mo></mml:mover><mml:mi>L</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x03B8;</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x221D;</mml:mo><mml:msub><mml:mi>I</mml:mi><mml:mrow><mml:mi>L</mml:mi></mml:mrow></mml:msub><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x03B8;</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mi mathvariant="normal">&#x2207;</mml:mi><mml:mi>L</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x03B8;</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula>where <inline-formula id="ieqn-60"><mml:math id="mml-ieqn-60"><mml:msub><mml:mi>I</mml:mi><mml:mrow><mml:mi>L</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x03B8;</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is the amount of Fisher information from the observations about <inline-formula id="ieqn-61"><mml:math id="mml-ieqn-61"><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, defined as <xref ref-type="disp-formula" rid="eqn-13">Eq. (13)</xref>:
<disp-formula id="eqn-13"><label>(13)</label><mml:math id="mml-eqn-13" display="block"><mml:msub><mml:mi>I</mml:mi><mml:mrow><mml:mi>L</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x03B8;</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mi>E</mml:mi><mml:mrow><mml:mi>y</mml:mi><mml:mo>&#x223C;</mml:mo><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>[</mml:mo><mml:msub><mml:mi mathvariant="normal">&#x2207;</mml:mi><mml:mrow><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x03B8;</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:msub><mml:mi mathvariant="normal">&#x2207;</mml:mi><mml:mrow><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:msub><mml:mi>L</mml:mi><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x03B8;</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup><mml:mo>]</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mi>E</mml:mi><mml:mrow><mml:mi>y</mml:mi><mml:mo>&#x223C;</mml:mo><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>[</mml:mo><mml:mo>&#x2212;</mml:mo><mml:msubsup><mml:mrow><mml:mi mathvariant="normal">&#x2207;</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x03B8;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mi>L</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x03B8;</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>When natural gradients are used to learn the parameters, the optimization problem becomes parameterization independent and has more efficient and stable learning dynamics than when only gradients are used. Additionally, the fast sort time complexity of a list of <italic>e</italic> elements is <inline-formula id="ieqn-62"><mml:math id="mml-ieqn-62"><mml:mi>O</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>e</mml:mi><mml:msub><mml:mi>log</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>&#x2061;</mml:mo><mml:mi>e</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>. So we can calculate the time complexity of NGboost as <inline-formula id="ieqn-63"><mml:math id="mml-ieqn-63"><mml:mi>O</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>K</mml:mi><mml:mi>d</mml:mi><mml:mi>n</mml:mi><mml:msub><mml:mi>log</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>&#x2061;</mml:mo><mml:mi>n</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, where <italic>K</italic> means the total number of base learners and <italic>d</italic> represents the maximum depth of the base learners.</p>
</sec>
<sec id="s2_2_3"><label>2.2.3</label><title>Model with Probabilistic Forecasting (PF)</title>
<p>According to the forecasted returns, the stocks are ranked in descending order, and the top k stocks are chosen to form the portfolio. Any two of these stocks are chosen, and the allocation weights are calculated using the formula <xref ref-type="disp-formula" rid="eqn-11">(11)</xref>. When the third stock is added, the first and second stocks become a new stock M with a forecast return <inline-formula id="ieqn-64"><mml:math id="mml-ieqn-64"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>x</mml:mi><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> and a variance <inline-formula id="ieqn-65"><mml:math id="mml-ieqn-65"><mml:msubsup><mml:mrow><mml:mi>&#x03C3;</mml:mi></mml:mrow><mml:mrow><mml:mi>M</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msup><mml:mi>x</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:msubsup><mml:mrow><mml:mi>&#x03C3;</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:msubsup><mml:mrow><mml:mi>&#x03C3;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:mn>2</mml:mn><mml:mi>x</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mtext>cov</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>. From the formula derived above, the weights of the updated portfolio can be obtained as <inline-formula id="ieqn-66"><mml:math id="mml-ieqn-66"><mml:mi>X</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula> respectively, which is expressed by <xref ref-type="disp-formula" rid="eqn-14">Eqs. (14)</xref> and <xref ref-type="disp-formula" rid="eqn-15">(15)</xref>.
<disp-formula id="eqn-14"><label>(14)</label><mml:math id="mml-eqn-14" display="block"><mml:mrow><mml:mo>{</mml:mo><mml:mtable columnalign="left left" rowspacing=".2em" columnspacing="1em" displaystyle="false"><mml:mtr><mml:mtd><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:msub><mml:mi>x</mml:mi></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr></mml:mtable><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo></mml:mrow></mml:math></disp-formula>
<disp-formula id="eqn-15"><label>(15)</label><mml:math id="mml-eqn-15" display="block"><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>f</mml:mi></mml:mrow></mml:msub><mml:mo>]</mml:mo></mml:mrow><mml:mi>C</mml:mi><mml:mi>o</mml:mi><mml:mi>v</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>f</mml:mi></mml:mrow></mml:msub><mml:mo>]</mml:mo></mml:mrow><mml:msubsup><mml:mrow><mml:mi>&#x03C3;</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:mrow><mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:mo>]</mml:mo></mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:mi>C</mml:mi><mml:mi>o</mml:mi><mml:mi>v</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:msubsup><mml:mrow><mml:mi>&#x03C3;</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>]</mml:mo></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>f</mml:mi></mml:mrow></mml:msub><mml:mo>]</mml:mo></mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:msubsup><mml:mrow><mml:mi>&#x03C3;</mml:mi></mml:mrow><mml:mrow><mml:mi>M</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mrow><mml:mi>&#x03C3;</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>&#x2212;</mml:mo><mml:mn>2</mml:mn><mml:mi>C</mml:mi><mml:mi>o</mml:mi><mml:mi>v</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:math></disp-formula></p>
<p>Similarly, the allocation weights for k stocks can be obtained. Additionally, the time complexity of this algorithm is <inline-formula id="ieqn-67"><mml:math id="mml-ieqn-67"><mml:mi>O</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mi>k</mml:mi><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>.</p>
</sec>
</sec>
<sec id="s2_3"><label>2.3</label><title>Benchmark Models and Baseline Strategies</title>
<sec id="s2_3_1"><label>2.3.1</label><title>Prediction Benchmark Models: LSTM and Bayesian-LSTM (BLSTM)</title>
<p>This paper chooses two representative machine learning models for financial time series forecasting to benchmark the models proposed in this paper: LSTM and Bayesian-LSTM. In the following paragraphs, this paper will explain the rationale behind each model.</p>
<p>LSTM is a classical point prediction model that gives the predicted value at a given moment and is widely used for financial time series forecasting. LSTM is a recurrent neural network that was proposed to overcome the limitations of recurrent neural networks and preserve long-term information [<xref ref-type="bibr" rid="ref-23">23</xref>]. This property is primarily based on the hidden layer&#x2019;s storage unit. LSTM neural networks typically have three layers: an input layer, a hidden layer, and an output layer. In comparison to traditional neural networks, the control gate structure in LSTM neural networks can effectively simulate long-term dependencies in time series, allowing for the effective transmission of stock history data.</p>
<p>The Bayesian-LSTM model combines Bayesian principles with deep neural networks to make probabilistic predictions. Instead of using point estimates for the model parameters, it generates distributions for each parameter, and from the distribution of the parameters, the probability distribution of each value of the model output can be obtained, providing important uncertainty information related to prediction.</p>
</sec>
<sec id="s2_3_2"><label>2.3.2</label><title>Baseline Strategy for the Portfolio</title>
<p>
<list list-type="simple">
<list-item><label>(1)</label><p>Mean-variance model</p></list-item>
</list></p>
<p>Markowitz&#x2019;s [<xref ref-type="bibr" rid="ref-24">24</xref>] MV model serves as the foundation for portfolio optimization. Investment return and risk are quantified in this model by expected return and variance, respectively, and the model seeks to strike a balance between maximizing return and minimizing risk. A rational investor will always seek the lowest risk for a given expected return or the highest return for a given risk, and will select an appropriate portfolio to maximize expected utility, which is expressed by a typical multi-objective optimization formula, which is expressed by <xref ref-type="disp-formula" rid="eqn-16">Eq. (16)</xref>:
<disp-formula id="eqn-16"><label>(16)</label><mml:math id="mml-eqn-16" display="block"><mml:mrow><mml:mo>{</mml:mo><mml:mtable columnalign="left left" rowspacing=".2em" columnspacing="1em" displaystyle="false"><mml:mtr><mml:mtd><mml:mo movablelimits="true" form="prefix">min</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:munderover><mml:mstyle displaystyle="true" scriptlevel="0"><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:munderover><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mi>&#x03C3;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mstyle></mml:mstyle></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mo movablelimits="true" form="prefix">max</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:munderover><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mi>&#x03BC;</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mstyle></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi>s</mml:mi><mml:mo>.</mml:mo><mml:mi>t</mml:mi><mml:mo>.</mml:mo><mml:mtable columnalign="left" rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mstyle displaystyle="true" scriptlevel="0"><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:munderover><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mstyle></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mn>0</mml:mn><mml:mo>&#x2264;</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2264;</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant="normal">&#x2200;</mml:mi><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:mi>n</mml:mi></mml:mtd></mml:mtr></mml:mtable></mml:mtd></mml:mtr></mml:mtable><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo></mml:mrow></mml:math></disp-formula>where <inline-formula id="ieqn-68"><mml:math id="mml-ieqn-68"><mml:msub><mml:mi>&#x03C3;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is the covariance between asset <italic>i</italic> and asset <italic>j</italic>, <inline-formula id="ieqn-69"><mml:math id="mml-ieqn-69"><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula id="ieqn-70"><mml:math id="mml-ieqn-70"><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> are the proportion of investment in asset <italic>i</italic> and asset <italic>j</italic>, respectively, and <inline-formula id="ieqn-71"><mml:math id="mml-ieqn-71"><mml:msub><mml:mi>&#x03BC;</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is the expected return on asset <italic>i</italic>.
<list list-type="simple">
<list-item><label>(2)</label><p>Equal-weighted model (1/N)</p></list-item>
</list></p>
<p>Due to the ease of implementation of this basic allocation method, investors continue to use the equally weighted portfolio (1/N) [<xref ref-type="bibr" rid="ref-25">25</xref>] as a benchmark for comparing the performance of many portfolios described in the literature in addition to the MV model.</p>
</sec>
</sec>
</sec>
<sec id="s3"><label>3</label><title>Experiments and Results</title>
<p>This section first presents the variables chosen for the experiment and the data used, as well as the prediction results of the various models in stock forecasting throughout the test period. Following that, trading simulations are run to compare the performance of various models and strategies in daily trading investments with no transaction fees.</p>
<sec id="s3_1"><label>3.1</label><title>Data and Input Indicators</title>
<p>Due to the continuity of financial stock data, the sample data should cover a sufficiently long period. The constituent stocks of the CSI 300 are chosen based on volume and total market capitalization, and they are distinguished by their large size, relative stability, and adequate liquidity. Following the data pre-processing, 62 stocks are used in the experiment. The data from December 8, 2016, to May 31, 2019, were selected for the experiment and were obtained from JoinQuant. The data from December 8, 2016, to November 30, 2017, for 243 days were used as the experimental data set. The data from December 1, 2017, to May 31, 2018, for 120 days were used as the validation set, and the data from June 1, 2018, to May 31, 2019, for 243 days were used as the test set.</p>
<p>In this paper, 10 technical indicators are selected as inputs for stock forecasting: moving average (MA), exponential moving average (EMA), moving average (MA), moving average convergence/divergence (MACD), average transaction price (ATP), relative strength index (RSI), true range (TR), average true range (ATR), momentum index (MoM), parabolic SAR and amplitude of the price movement (ALT). <xref ref-type="table" rid="table-2">Table 2</xref> summarizes the selected input technical indicators.</p>
<table-wrap id="table-2"><label>Table 2</label><caption><title>Input technical indicators</title></caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">Attribute</th>
<th align="left">Details</th>
<th align="left">Attribute</th>
<th align="left">Details</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">1</td>
<td align="left">Moving average</td>
<td align="left">6</td>
<td align="left">True range</td>
</tr>
<tr>
<td align="left">2</td>
<td align="left">Exponential moving average</td>
<td align="left">7</td>
<td align="left">Average true range</td>
</tr>
<tr>
<td align="left">3</td>
<td align="left">Moving average convergence/Divergence</td>
<td align="left">8</td>
<td align="left">Momentum index</td>
</tr>
<tr>
<td align="left">4</td>
<td align="left">Average transaction price</td>
<td align="left">9</td>
<td align="left">Parabolic SAR</td>
</tr>
<tr>
<td align="left">5</td>
<td align="left">Relative strength index</td>
<td align="left">10</td>
<td align="left">The amplitude of the price movement</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s3_2"><label>3.2</label><title>Comparison of Predicted Results</title>
<p>Six metrics are used in this paper to comprehensively measure the performance of different models in the stock forecasting process: mean absolute error (MAE), mean square error (MSE), <inline-formula id="ieqn-72"><mml:math id="mml-ieqn-72"><mml:msub><mml:mi>H</mml:mi><mml:mrow><mml:mi>R</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, <inline-formula id="ieqn-73"><mml:math id="mml-ieqn-73"><mml:msub><mml:mi>H</mml:mi><mml:mrow><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mrow><mml:mo>+</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula>, <inline-formula id="ieqn-74"><mml:math id="mml-ieqn-74"><mml:msub><mml:mi>H</mml:mi><mml:mrow><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mrow><mml:mo>&#x2212;</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> and NLL. MSE usually represents the dispersion of the forecast outcome and MAE represents the deviation of the forecast outcome. In addition, <inline-formula id="ieqn-75"><mml:math id="mml-ieqn-75"><mml:msub><mml:mi>H</mml:mi><mml:mrow><mml:mi>R</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> denotes the total hit rate, <inline-formula id="ieqn-76"><mml:math id="mml-ieqn-76"><mml:msub><mml:mi>H</mml:mi><mml:mrow><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mrow><mml:mo>+</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> means the accuracy of positive prediction and <inline-formula id="ieqn-77"><mml:math id="mml-ieqn-77"><mml:msub><mml:mi>H</mml:mi><mml:mrow><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mrow><mml:mo>&#x2212;</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> means the accuracy of negative prediction. The negative mean log likelihood (NLL) is a popular metric for analyzing forecast uncertainty and is an effective scoring method for quantifying the quality of probabilistic forecasts [<xref ref-type="bibr" rid="ref-19">19</xref>]. Smaller MAE, MSE and NLL values and larger <inline-formula id="ieqn-78"><mml:math id="mml-ieqn-78"><mml:msub><mml:mi>H</mml:mi><mml:mrow><mml:mi>R</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, <inline-formula id="ieqn-79"><mml:math id="mml-ieqn-79"><mml:msub><mml:mi>H</mml:mi><mml:mrow><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mrow><mml:mo>+</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula>, and <inline-formula id="ieqn-80"><mml:math id="mml-ieqn-80"><mml:msub><mml:mi>H</mml:mi><mml:mrow><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mrow><mml:mo>&#x2212;</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> values indicate better performance. These metrics are defined as <xref ref-type="disp-formula" rid="eqn-17">Eqs. (17)</xref>&#x2013;<xref ref-type="disp-formula" rid="eqn-22">(22)</xref>:
<disp-formula id="eqn-17"><label>(17)</label><mml:math id="mml-eqn-17" display="block"><mml:mi>M</mml:mi><mml:mi>A</mml:mi><mml:mi>E</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mi>n</mml:mi></mml:mfrac><mml:mstyle displaystyle="true" scriptlevel="0"><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:munderover><mml:mrow><mml:mo>|</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mrow><mml:mover><mml:mi>y</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>|</mml:mo></mml:mrow></mml:mstyle></mml:math></disp-formula>
<disp-formula id="eqn-18"><label>(18)</label><mml:math id="mml-eqn-18" display="block"><mml:mi>M</mml:mi><mml:mi>S</mml:mi><mml:mi>E</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mi>n</mml:mi></mml:mfrac><mml:mstyle displaystyle="true" scriptlevel="0"><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:munderover><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mrow><mml:mover><mml:mi>y</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mstyle></mml:math></disp-formula>
<disp-formula id="eqn-19"><label>(19)</label><mml:math id="mml-eqn-19" display="block"><mml:msub><mml:mi>H</mml:mi><mml:mrow><mml:mi>R</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>C</mml:mi><mml:mi>o</mml:mi><mml:mi>u</mml:mi><mml:mi>n</mml:mi><mml:msubsup><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msubsup><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mover><mml:mi>y</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x003E;</mml:mo><mml:mn>0</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>C</mml:mi><mml:mi>o</mml:mi><mml:mi>u</mml:mi><mml:mi>n</mml:mi><mml:msubsup><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msubsup><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mover><mml:mi>y</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2260;</mml:mo><mml:mn>0</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:math></disp-formula>
<disp-formula id="eqn-20"><label>(20)</label><mml:math id="mml-eqn-20" display="block"><mml:msub><mml:mi>H</mml:mi><mml:mrow><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mrow><mml:mo>+</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>C</mml:mi><mml:mi>o</mml:mi><mml:mi>u</mml:mi><mml:mi>n</mml:mi><mml:msubsup><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msubsup><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x003E;</mml:mo><mml:mn>0</mml:mn><mml:mtext>&#x00A0;</mml:mtext><mml:mi>A</mml:mi><mml:mi>N</mml:mi><mml:mi>D</mml:mi><mml:mtext>&#x00A0;</mml:mtext><mml:msub><mml:mrow><mml:mover><mml:mi>y</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x003E;</mml:mo><mml:mn>0</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>C</mml:mi><mml:mi>o</mml:mi><mml:mi>u</mml:mi><mml:mi>n</mml:mi><mml:msubsup><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msubsup><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mrow><mml:mover><mml:mi>y</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x003E;</mml:mo><mml:mn>0</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:math></disp-formula>
<disp-formula id="eqn-21"><label>(21)</label><mml:math id="mml-eqn-21" display="block"><mml:msub><mml:mi>H</mml:mi><mml:mrow><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mrow><mml:mo>&#x2212;</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>C</mml:mi><mml:mi>o</mml:mi><mml:mi>u</mml:mi><mml:mi>n</mml:mi><mml:msubsup><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msubsup><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x003C;</mml:mo><mml:mn>0</mml:mn><mml:mtext>&#x00A0;</mml:mtext><mml:mi>A</mml:mi><mml:mi>N</mml:mi><mml:mi>D</mml:mi><mml:mtext>&#x00A0;</mml:mtext><mml:msub><mml:mrow><mml:mover><mml:mi>y</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x003C;</mml:mo><mml:mn>0</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>C</mml:mi><mml:mi>o</mml:mi><mml:mi>u</mml:mi><mml:mi>n</mml:mi><mml:msubsup><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msubsup><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mrow><mml:mover><mml:mi>y</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x003C;</mml:mo><mml:mn>0</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:math></disp-formula>
<disp-formula id="eqn-22"><label>(22)</label><mml:math id="mml-eqn-22" display="block"><mml:mi>N</mml:mi><mml:mi>L</mml:mi><mml:mi>L</mml:mi><mml:mo>=</mml:mo><mml:mi>log</mml:mi><mml:mo>&#x2061;</mml:mo><mml:msub><mml:mrow><mml:mover><mml:mi>P</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>y</mml:mi><mml:mrow><mml:mo>|</mml:mo><mml:mi>x</mml:mi><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula>where <inline-formula id="ieqn-81"><mml:math id="mml-ieqn-81"><mml:mrow><mml:mover><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x005E;</mml:mo></mml:mover></mml:mrow></mml:math></inline-formula> denotes the forecast price, <inline-formula id="ieqn-82"><mml:math id="mml-ieqn-82"><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> denotes the actual price, and n denotes the number of forecast days.</p>
<p>First, the performance of the three prediction models is compared, and as shown in <xref ref-type="table" rid="table-3">Table 3</xref>, the error of NGB-PF is the smallest among the three models, which means that the prediction accuracy of NGB-PF is higher. The NLL index of NGB-PF is 2.73 greater than that of BLSTM-PF, showing that NGB-PF has a comparatively high quality in predicting uncertainty. In conclusion, NGB-PF outperforms the other models in the stock return prediction process.</p>
<table-wrap id="table-3"><label>Table 3</label><caption><title>The performance of three prediction models</title></caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">Model</th>
<th align="left">MAE</th>
<th align="left">MSE</th>
<th align="left"><inline-formula id="ieqn-83"><mml:math id="mml-ieqn-83"><mml:msub><mml:mi>H</mml:mi><mml:mrow><mml:mi>R</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th align="left"><inline-formula id="ieqn-84"><mml:math id="mml-ieqn-84"><mml:msub><mml:mi>H</mml:mi><mml:mrow><mml:mi>R</mml:mi><mml:mo>+</mml:mo></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th align="left"><inline-formula id="ieqn-85"><mml:math id="mml-ieqn-85"><mml:msub><mml:mi>H</mml:mi><mml:mrow><mml:mi>R</mml:mi><mml:mo>&#x2212;</mml:mo></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th align="left">NLL</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">NGB-PF</td>
<td align="left"><inline-formula id="ieqn-86"><mml:math id="mml-ieqn-86"><mml:mn>3.92</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:msup><mml:mn>10</mml:mn><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula></td>
<td align="left"><inline-formula id="ieqn-87"><mml:math id="mml-ieqn-87"><mml:mn>8.36</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:msup><mml:mn>10</mml:mn><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mn>4</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula></td>
<td align="left">52.85&#x0025;</td>
<td align="left">52.90&#x0025;</td>
<td align="left">50.96&#x0025;</td>
<td align="left">2.73</td>
</tr>
<tr>
<td align="left">BLSTM-PF</td>
<td align="left"><inline-formula id="ieqn-88"><mml:math id="mml-ieqn-88"><mml:mn>2.85</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:msup><mml:mn>10</mml:mn><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula></td>
<td align="left"><inline-formula id="ieqn-89"><mml:math id="mml-ieqn-89"><mml:mn>5.84</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:msup><mml:mn>10</mml:mn><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mn>3</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula></td>
<td align="left">51.39&#x0025;</td>
<td align="left">52.14&#x0025;</td>
<td align="left">49.52&#x0025;</td>
<td align="left">3.04</td>
</tr>
<tr>
<td align="left">LSTM-MV</td>
<td align="left"><inline-formula id="ieqn-90"><mml:math id="mml-ieqn-90"><mml:mn>2.14</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:msup><mml:mn>10</mml:mn><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula></td>
<td align="left"><inline-formula id="ieqn-91"><mml:math id="mml-ieqn-91"><mml:mn>6.91</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:msup><mml:mn>10</mml:mn><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mn>4</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula></td>
<td align="left">53.01&#x0025;</td>
<td align="left">53.21&#x0025;</td>
<td align="left">50.54&#x0025;</td>
<td align="left">-</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s3_3"><label>3.3</label><title>Comparison of Backtest Results</title>
<p>This section will determine the proper cardinality of the portfolio and evaluate the effectiveness and superiority of the proposed NGB-PF.</p>
<p>Most studies related to the portfolio formation of individual investors focus only on about 10 assets [<xref ref-type="bibr" rid="ref-26">26</xref>&#x2013;<xref ref-type="bibr" rid="ref-28">28</xref>]. Oreng&#x00A0;et&#x00A0;al.&#x00A0;[<xref ref-type="bibr" rid="ref-29">29</xref>] found that portfolios with 7 assets outperformed those with another number of stocks. Therefore, in this paper, we choose 7 assets to form a portfolio and use four indicators, annualized mean return, annualized standard deviation, annualized Sharpe ratio, and annualized Sortino ratio, to assess the performance of the portfolio. The Sharpe ratio is a comprehensive indicator that can consider both return and risk. The Sortino ratio can distinguish between good and bad fluctuations. A larger Sharpe ratio and Sortino ratio indicate better performance.</p>
<p>According to Panel A of <xref ref-type="table" rid="table-4">Table 4</xref>, NGB-PF has the highest annualized return of 0.29, while LSTM-MV has the lowest annualized return of 0.1703. NGB-PF has the lowest volatility of 0.2746, while BLSTM-1/N has the highest volatility. The annualized Sharpe ratio of NGB-PF is the highest, as is the Sortino ratio. The ability of NGBoost to effectively capture the features of the structured input dataset and the ability of the PF model to adjust the positions of the smart beta strategy by taking into account forecast uncertainty when performing asset allocation gives the proposed approach in this paper the best overall performance. The LSTM-1/N model, on the other hand, performs relatively poorly due to its lack of flexibility in allocating resources.</p>
<table-wrap id="table-4"><label>Table 4</label><caption><title>Performance characteristics</title></caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">Model</th>
<th align="left">NGB-PF</th>
<th align="left">NGB-MV</th>
<th align="left">NGB-1/N</th>
<th align="left">BLSTM-PF</th>
<th align="left">BLSTM-MV</th>
<th align="left">BLSTM-1/N</th>
<th align="left">LSTM-MV</th>
<th align="left">LSTM-1/N</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" colspan="9">Panel A: Annualized risk-return metrics.</td>
</tr>
<tr>
<td align="left">Mean return</td>
<td align="left">0.2952</td>
<td align="left">0.2262</td>
<td align="left">0.2028</td>
<td align="left">0.2418</td>
<td align="left">0.2279</td>
<td align="left">0.2237</td>
<td align="left">0.1703</td>
<td align="left">0.2152</td>
</tr>
<tr>
<td align="left">Standard deviation</td>
<td align="left">0.2746</td>
<td align="left">0.2895</td>
<td align="left">0.3166</td>
<td align="left">0.3325</td>
<td align="left">0.3325</td>
<td align="left">0.3683</td>
<td align="left">0.3248</td>
<td align="left">0.3582</td>
</tr>
<tr>
<td align="left">Sharpe ratio</td>
<td align="left">2.1957</td>
<td align="left">2.1631</td>
<td align="left">2.0494</td>
<td align="left">1.3889</td>
<td align="left">1.0533</td>
<td align="left">0.7882</td>
<td align="left">1.1369</td>
<td align="left">0.9359</td>
</tr>
<tr>
<td align="left">Sortino ratio</td>
<td align="left">3.41</td>
<td align="left">2.2752</td>
<td align="left">2.4582</td>
<td align="left">1.1942</td>
<td align="left">1.6171</td>
<td align="left">1.3538</td>
<td align="left">1.6764</td>
<td align="left">1.4233</td>
</tr>
<tr>
<td align="center" colspan="9">Panel B: Daily return characteristic.</td>
</tr>
<tr>
<td align="left">Mean return</td>
<td align="left">0.0028</td>
<td align="left">0.0025</td>
<td align="left">0.0023</td>
<td align="left">0.0018</td>
<td align="left">0.0016</td>
<td align="left">0.0013</td>
<td align="left">0.0015</td>
<td align="left">0.0011</td>
</tr>
<tr>
<td align="left">Maximum</td>
<td align="left">0.086</td>
<td align="left">0.077</td>
<td align="left">0.09</td>
<td align="left">0.085</td>
<td align="left">0.1</td>
<td align="left">0.1</td>
<td align="left">0.0795</td>
<td align="left">0.1005</td>
</tr>
<tr>
<td align="left">Minimum</td>
<td align="left">&#x2212;0.0712</td>
<td align="left">&#x2212;0.0846</td>
<td align="left">&#x2212;0.089</td>
<td align="left">&#x2212;0.083</td>
<td align="left">&#x2212;0.0935</td>
<td align="left">&#x2212;0.076</td>
<td align="left">&#x2212;0.0885</td>
<td align="left">&#x2212;0.092</td>
</tr>
<tr>
<td align="left">Range</td>
<td align="left">0.1572</td>
<td align="left">0.1616</td>
<td align="left">0.179</td>
<td align="left">0.168</td>
<td align="left">0.1935</td>
<td align="left">0.176</td>
<td align="left">0.168</td>
<td align="left">0.1925</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>As shown in Panel B of <xref ref-type="table" rid="table-4">Table 4</xref>, the mean return of NGBoost models is 0.2414, which is generally better than that of the model using BLSTM (0.2311) and LSTM (0.1927). This is highly related to the high robustness of the NGBoost models in dealing with uncertain system data. Also, as shown in <xref ref-type="fig" rid="fig-3">Fig. 3</xref>, the average standard deviation of the MV-based position optimization method is 0.3156. However, it can reduce the risk compared with the 1/N models (average standard deviation of 0.3477). But the PF models proposed in this paper (standard deviation of 0.2746) gives the best results.</p>
<fig id="fig-3"><label>Figure 3</label><caption><title>Box plot of the daily returns</title></caption><graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_34933-fig-3.tif"/></fig>
<p>It can be seen that although NGBoost-1/N leads in cumulative returns for the year, the magnitude is not large. And its returns are not as good as BLSTM-1/N from Q1-2019 to Q2-2019. But on the other hand, NGBoost-1/N has less overall volatility and better consistency. The base learner of NGboost is a decision tree, which is very tolerant of data missing. As an ensemble learning method, NGBoost reduces overfitting by returning the probability distribution method for each prediction. BLSTM and LSTM also have good accuracy, but the neural networks have more learning parameters, which causes them to be less robust than NGBoost. It also validates the view of Fischer et al. [<xref ref-type="bibr" rid="ref-26">26</xref>].</p>
<p>The cumulative return curve derived from the prediction model integrated with the MV model is shown in <xref ref-type="fig" rid="fig-5">Fig. 5</xref>. <xref ref-type="fig" rid="fig-5">Fig. 5</xref> shows that the NGB-MV model has the best cumulative return of 60.42&#x0025;, the cumulative return of the BLSTM-MV model is 38.83&#x0025; and the LSTM-MV model has the lowest cumulative return of 35.34&#x0025;. Compared to <xref ref-type="fig" rid="fig-4">Fig. 4</xref>, the returns of all three algorithms have improved, and the volatility has decreased. The MV algorithm estimates the future risk from the historical volatility and optimizes the portfolio positions accordingly. Due to the continuity of stock styles, the MV algorithm can serve to reduce risk and increase efficiency, but the method may experience degradation in performance over time when rapid switches in market styles occur.</p>
<fig id="fig-4"><label>Figure 4</label><caption><title>Cumulative returns by 1/N models</title></caption><graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_34933-fig-4.tif"/></fig>
<fig id="fig-5"><label>Figure 5</label><caption><title>Cumulative returns by MV models</title></caption><graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_34933-fig-5.tif"/></fig>
<p>As shown in <xref ref-type="fig" rid="fig-6">Fig. 6</xref>, the NGB-PF model has a constantly growing cumulative return, reaching a maximum of 67.1&#x0025;, but the NGB-MV and NGB-1/N models have maximum cumulative returns of just 60.4&#x0025; and 54.08&#x0025;, respectively. The cumulative return of BLSTM-MV is only 3.49&#x0025; higher than that of LSTM-MV, and the difference is not significant. The cumulative return achieved by the BLSTM-PF model is 44.0&#x0025;, which is also higher than the cumulative return achieved by the BLSTM-MV, which is 38.83&#x0025;. The NGBoost and BLSTM algorithms combined with the PF model further improve the gain and reduce the volatility compared to using the MV model. The PF model uses the probability distribution data of the prediction results, which requires the prediction model itself to have the ability of probability prediction. Since LSTM does not have this capability, the PF model cannot be used. The analytical solution of the position optimization scheme derived from Section 2.1, combined with the excellent probability prediction capability of NGBoost, can help the smart beta strategy maintain transparency while being able to better control risk and improve returns.</p>
<fig id="fig-6"><label>Figure 6</label><caption><title>Cumulative returns by PF models</title></caption><graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_34933-fig-6.tif"/></fig>
</sec>
</sec>
<sec id="s4"><label>4</label><title>Conclusions and Discussion</title>
<p>The main focus of this paper is to propose an improved smart beta strategy for portfolio management. The main contributions of this work are:
<list list-type="order">
<list-item><p>This paper uses the quantitative uncertainty of prediction to determine the allocation share. This investment allocation has endogenous logic and is an objective portfolio method based on confidence in the prediction value, which conforms to modern people&#x2019;s investment psychology.</p></list-item>
<list-item><p>There are two outputs of probability prediction: predictive value and probability distribution, both of which will be used to allocate the weight of portfolio investment. This paper compares the prediction ability of NGBoost, BLSTM and LSTM in stock return prediction through experiments. Six indicators are used to comprehensively measure the performance of different models in the process of stock return prediction. The experimental results show that the NGB-PF model has the smallest prediction error and the highest prediction accuracy among these models, which indicates that NGBoost is suitable for stock return prediction.</p></list-item>
<list-item><p>The quality of forecast uncertainty will also affect the performance of the portfolio constructed by using forecast uncertainty. Through experimental verification, the cumulative yield of the BLSTM-PF model with low prediction uncertainty is 23.1&#x0025; lower than that of the NGB-PF model. To further measure the effectiveness of the PF model, we compared it with the benchmark model, MV model and 1/N model and found that the PF model can more reasonably allocate the weight of the portfolio when using the prediction results of the same prediction model, thus promoting the steady growth of income. The cumulative rate of return of the NGB-PF model reached 67.1&#x0025; at the highest, thanks to its use of prediction uncertainty to guide portfolio management.</p></list-item>
</list></p>
<p>This study also has some practical implications for individual investors, as this approach can help them make investment decisions more effectively, reduce risk, and ensure the safety and profitability of their investments. Although this study has some research implications, more research can be conducted. For example, the input characteristics could take into account some other external environmental factors such as news, government policies, interest rates, etc. Besides, we still face challenges in improving the training efficiency of the algorithm. The NGBoost algorithm learns in a sequential form and does not support parallel computing, so it cannot take full advantage of multi-core computers to reduce the training time, which is a current limitation of the algorithm. Although it will not have much impact in the case of using daily data for analysis, it is indeed a problem worth investigating in the future.</p>
</sec>
</body>
<back>
<sec><title>Funding Statement</title>
<p>This work was supported by the <funding-source>National Natural Science Foundation</funding-source> of China [Grant Number <award-id>61902349</award-id>].</p></sec>
<sec sec-type="COI-statement"><title>Conflicts of Interest</title>
<p>The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in the paper.</p></sec>
<ref-list content-type="authoryear"><title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Agrawal</surname></string-name>, <string-name><given-names>P.</given-names> <surname>Shukla</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Nair</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Nayyar</surname></string-name> and <string-name><given-names>M.</given-names> <surname>Masud</surname></string-name></person-group>, &#x201C;<article-title>Stock prediction based on technical indicators using deep learning model</article-title>,&#x201D; <source>Computers, Materials &#x0026; Continua</source>, vol. <volume>70</volume>, no. <issue>1</issue>, pp. <fpage>287</fpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>K.</given-names> <surname>M</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Sankar</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Kumar</surname></string-name>, <string-name><given-names>T.</given-names> <surname>Nestor</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Soliman</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Stock market trading based on market sentiments and reinforcement learning</article-title>,&#x201D; <source>Computers, Materials &#x0026; Continua</source>, vol. <volume>70</volume>, no. <issue>1</issue>, pp. <fpage>935</fpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>C. -F.</given-names> <surname>Huang</surname></string-name></person-group>, &#x201C;<article-title>A hybrid stock selection model using genetic algorithms and support vector regression</article-title>,&#x201D; <source>Applied Soft Computing</source>, vol. <volume>12</volume>, no. <issue>2</issue>, pp. <fpage>807</fpage>&#x2013;<lpage>818</lpage>, <year>2012</year>.</mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>W.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>W.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Zhang</surname></string-name> and <string-name><given-names>K.</given-names> <surname>Liu</surname></string-name></person-group>, &#x201C;<article-title>Portfolio formation with preselection using deep learning from long-term financial data</article-title>,&#x201D; <source>Expert Systems with Applications</source>, vol. <volume>143</volume>, pp. <fpage>113042</fpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>W.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>M. K.</given-names> <surname>Mehlawat</surname></string-name> and <string-name><given-names>L.</given-names> <surname>Jia</surname></string-name></person-group>, &#x201C;<article-title>Mean&#x2013;variance portfolio optimization using machine learning-based stock price prediction</article-title>,&#x201D; <source>Applied Soft Computing</source>, vol. <volume>100</volume>, pp. <fpage>106943</fpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>B. K.</given-names> <surname>Tripathy</surname></string-name>, <string-name><given-names>P. K.</given-names> <surname>Reddy Maddikunta</surname></string-name>, <string-name><given-names>Q. -V.</given-names> <surname>Pham</surname></string-name>, <string-name><given-names>T. R.</given-names> <surname>Gadekallu</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Dev</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Harris hawk optimization: A survey onvariants and applications</article-title>,&#x201D; <source>Computational Intelligence and Neuroscience</source>, vol. <volume>2022</volume>, pp. <fpage>2218594</fpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>O. A.</given-names> <surname>Alzubi</surname></string-name>, <string-name><given-names>J. A.</given-names> <surname>Alzubi</surname></string-name>, <string-name><given-names>A. M.</given-names> <surname>Al-Zoubi</surname></string-name>, <string-name><given-names>M. A.</given-names> <surname>Hassonah</surname></string-name> and <string-name><given-names>U.</given-names> <surname>Kose</surname></string-name></person-group>, &#x201C;<article-title>An efficient malware detection approach with feature weighting based on harris hawks optimization</article-title>,&#x201D; <source>Cluster Computing</source>, vol. <volume>25</volume>, no. <issue>4</issue>, pp. <fpage>2369</fpage>&#x2013;<lpage>2387</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>O. A.</given-names> <surname>Alzubi</surname></string-name>, <string-name><given-names>J. A.</given-names> <surname>Alzubi</surname></string-name>, <string-name><given-names>A. M.</given-names> <surname>Al-Zoubi</surname></string-name>, <string-name><given-names>M. A.</given-names> <surname>Hassonah</surname></string-name> and <string-name><given-names>U.</given-names> <surname>Kose</surname></string-name></person-group>, &#x201C;<article-title>An efficient malware detection approach with feature weighting based on harris hawks optimization</article-title>,&#x201D; <source>Cluster Computing</source>, vol. <volume>25</volume>, no. <issue>4</issue>, pp. <fpage>2369</fpage>&#x2013;<lpage>2387</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>O. A.</given-names> <surname>Alzubi</surname></string-name>, <string-name><given-names>J. A.</given-names> <surname>Alzubi</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Alweshah</surname></string-name>, <string-name><given-names>I.</given-names> <surname>Qiqieh</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Al-Shami</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>An optimal pruning algorithm of classifier ensembles: Dynamic programming approach</article-title>,&#x201D; <source>Neural Computing and Applications</source>, vol. <volume>32</volume>, no. <issue>20</issue>, pp. <fpage>16091</fpage>&#x2013;<lpage>16107</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Y.</given-names> <surname>Ma</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Han</surname></string-name> and <string-name><given-names>W.</given-names> <surname>Wang</surname></string-name></person-group>, &#x201C;<article-title>Portfolio optimization with return prediction using deep learning and machine learning</article-title>,&#x201D; <source>Expert Systems with Applications</source>, vol. <volume>165</volume>, pp. <fpage>113973</fpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>C.</given-names> <surname>Zhao</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Zhou</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Cen</surname></string-name> and <string-name><given-names>X.</given-names> <surname>Yao</surname></string-name></person-group>, &#x201C;<article-title>Gcn-based stock relations analysis for stock market prediction</article-title>,&#x201D; <source>PeerJ Computer Science</source>, vol. <volume>8</volume>, pp. <fpage>e1057</fpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Y.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Wu</surname></string-name> and <string-name><given-names>Z.</given-names> <surname>Wu</surname></string-name></person-group>, &#x201C;<article-title>China&#x2019;s commercial bank stock price prediction using a novel k-means-lstm hybrid approach</article-title>,&#x201D; <source>Expert Systems with Applications</source>, vol. <volume>202</volume>, pp. <fpage>117370</fpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>B.</given-names> <surname>Rossi</surname></string-name> and <string-name><given-names>T.</given-names> <surname>Sekhposyan</surname></string-name></person-group>, &#x201C;<article-title>Macroeconomic uncertainty indices based on nowcast and forecast error distributions</article-title>,&#x201D; <source>American Economic Review</source>, vol. <volume>105</volume>, no. <issue>5</issue>, pp. <fpage>650</fpage>&#x2013;<lpage>655</lpage>, <year>2015</year>.</mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>T.</given-names> <surname>Duan</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Anand</surname></string-name>, <string-name><given-names>D. Y.</given-names> <surname>Ding</surname></string-name>, <string-name><given-names>K. K.</given-names> <surname>Thai</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Basu</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Ngboost: Natural gradient boosting for probabilistic prediction</article-title>,&#x201D; in <conf-name>Proc. PMLR</conf-name>, <conf-loc>Vienna, Austria</conf-loc>, pp. <fpage>2690</fpage>&#x2013;<lpage>2700</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Lin</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Padhy</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Tran</surname></string-name>, <string-name><given-names>T.</given-names> <surname>Bedrax Weiss</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Simple and principled uncertainty estimation with deterministic deep learning via distance awareness</article-title>,&#x201D; <source>Advances in Neural Information Processing Systems</source>, vol. <volume>33</volume>, pp. <fpage>7498</fpage>&#x2013;<lpage>7512</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>H.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Huang</surname></string-name>, <string-name><given-names>C. Q.</given-names> <surname>Wu</surname></string-name> and <string-name><given-names>Z.</given-names> <surname>Li</surname></string-name></person-group>, &#x201C;<article-title>An effective convolutional neural network based on smote and Gaussian mixture model for intrusion detection in imbalanced dataset</article-title>,&#x201D; <source>Computer Networks</source>, vol. <volume>177</volume>, pp. <fpage>107315</fpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>R.</given-names> <surname>Rahaman</surname></string-name></person-group>, &#x201C;<article-title>Uncertainty quantification and deep ensembles</article-title>,&#x201D; <conf-name>Advances in Neural Information Processing Systems</conf-name>, vol. <volume>34</volume>, pp. <fpage>20063</fpage>&#x2013;<lpage>20075</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>T.</given-names> <surname>Peng</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Zhi</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Ji</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Ji</surname></string-name> and <string-name><given-names>Y.</given-names> <surname>Tian</surname></string-name></person-group>, &#x201C;<article-title>Prediction skill of extended range 2-m maximum air temperature probabilistic forecasts using machine learning post-processing methods</article-title>,&#x201D; <source>Atmosphere</source>, vol. <volume>11</volume>, no. <issue>8</issue>, pp. <fpage>823</fpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Y.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Wang</surname></string-name> and <string-name><given-names>B.</given-names> <surname>Wu</surname></string-name></person-group>, &#x201C;<article-title>Short-term direct probability prediction model of wind power based on improved natural gradient boosting</article-title>,&#x201D; <source>Energies</source>, vol. <volume>13</volume>, no. <issue>18</issue>, pp. <fpage>4629</fpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J. -R.</given-names> <surname>Yu</surname></string-name>, <string-name><given-names>W. -J. P.</given-names> <surname>Chiou</surname></string-name>, <string-name><given-names>W. -Y.</given-names> <surname>Lee</surname></string-name> and <string-name><given-names>S. -J.</given-names> <surname>Lin</surname></string-name></person-group>, &#x201C;<article-title>Portfolio models with return forecasting and transaction costs</article-title>,&#x201D; <source>International Review of Economics &#x0026; Finance</source>, vol. <volume>66</volume>, pp. <fpage>118</fpage>&#x2013;<lpage>130</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>T.</given-names> <surname>Kaczmarek</surname></string-name> and <string-name><given-names>K.</given-names> <surname>Perez</surname></string-name></person-group>, &#x201C;<article-title>Building portfolios based on machine learning predictions</article-title>,&#x201D; <source>Economic Research-Ekonomska Istra&#x017E;ivanja</source>, vol. <volume>35</volume>, no. <issue>1</issue>, pp. <fpage>19</fpage>&#x2013;<lpage>37</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>H.</given-names> <surname>Yu</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Chen</surname></string-name> and <string-name><given-names>G.</given-names> <surname>Zhang</surname></string-name></person-group>, &#x201C;<article-title>A svm stock selection model within pca</article-title>,&#x201D; <source>Procedia Computer Science</source>, vol. <volume>31</volume>, pp. <fpage>406</fpage>&#x2013;<lpage>412</lpage>, <year>2014</year>.</mixed-citation></ref>
<ref id="ref-23"><label>[23]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Hochreiter</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Schmidhuber</surname></string-name></person-group>, &#x201C;<article-title>Long short-term memory</article-title>,&#x201D; <source>Neural Computation</source>, vol. <volume>9</volume>, pp. <fpage>1735</fpage>&#x2013;<lpage>1780</lpage>, <year>1997</year>.</mixed-citation></ref>
<ref id="ref-24"><label>[24]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>H.</given-names> <surname>Markowitz</surname></string-name></person-group>, &#x201C;<article-title>Portfolio selection&#x002A;</article-title>,&#x201D; <source>The Journal of Finance</source>, vol. <volume>7</volume>, no. <issue>1</issue>, pp. <fpage>77</fpage>&#x2013;<lpage>91</lpage>, <year>1952</year>.</mixed-citation></ref>
<ref id="ref-25"><label>[25]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>F.</given-names> <surname>Yang</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Li</surname></string-name> and <string-name><given-names>L.</given-names> <surname>Tang</surname></string-name></person-group>, &#x201C;<article-title>A novel hybrid stock selection method with stock prediction</article-title>,&#x201D; <source>Applied Soft Computing</source>, vol. <volume>80</volume>, pp. <fpage>820</fpage>&#x2013;<lpage>831</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-26"><label>[26]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>D. P.</given-names> <surname>Gandhmal</surname></string-name> and <string-name><given-names>K.</given-names> <surname>Kumar</surname></string-name></person-group>, &#x201C;<article-title>Systematic analysis and review of stock market prediction techniques</article-title>,&#x201D; <source>Computer Science Review</source>, vol. <volume>34</volume>, pp. <fpage>100190</fpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-27"><label>[27]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>B.</given-names> <surname>Kocuk</surname></string-name> and <string-name><given-names>G.</given-names> <surname>Cornu&#x00E9;jols</surname></string-name></person-group>, &#x201C;<article-title>Incorporating black-litterman views in portfolio construction when stock returns are a mixture of normals</article-title>,&#x201D; <source>Omega</source>, vol. <volume>91</volume>, pp. <fpage>102008</fpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-28"><label>[28]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Almahdi</surname></string-name> and <string-name><given-names>S. Y.</given-names> <surname>Yang</surname></string-name></person-group>, &#x201C;<article-title>An adaptive portfolio trading system: A risk-return portfolio optimization using recurrent reinforcement learning with expected maximum drawdown</article-title>,&#x201D; <source>Expert Systems with Applications</source>, vol. <volume>87</volume>, pp. <fpage>267</fpage>&#x2013;<lpage>279</lpage>, <year>2017</year>.</mixed-citation></ref>
<ref id="ref-29"><label>[29]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Oreng</surname></string-name>, <string-name><given-names>C. E.</given-names> <surname>Yoshinaga</surname></string-name> and <string-name><given-names>W.</given-names> <surname>Eid Junior</surname></string-name></person-group>, &#x201C;<article-title>Disposition effect, demographics and risk taking</article-title>,&#x201D; <source>RAUSP Management Journal</source>, vol. <volume>56</volume>, no. <issue>2</issue>, pp. <fpage>217</fpage>&#x2013;<lpage>233</lpage>, <year>2021</year>.</mixed-citation></ref>
</ref-list>
</back>
</article>