Zhiyong Yang , Yuxi Ye and Yu ZhouPredicting Chinese Stocks Using XGBoost-LSTM-Attention ModelAbstract: Forecasting is a popular topic in the stock market. In recent years, many scholars have utilized machine- and deep-learning models in this field. However, many stock forecasting models suffer from problems of information overlap in stock trading data and a relatively simple structure of the prediction model. To overcome these issues, we built a stock forecasting model based on extreme gradient boosting (XGBoost), long short-term memory (LSTM), and attention (XGBoost-LSTM-Attention). XGBoost is used to extract important information from stock data, and the LSTM combined with the attention mechanism can enhance stock prediction performance. To verify the feasibility and effectiveness of XGBoost-LSTM-Attention, we selected 14 Chinese stocks from different industries for the prediction experiments and compared their performance with those of existing models. The experimental results showed that the average root-mean-square error value of the XGBoost-LSTM-Attention model for the different stock datasets was the smallest (0.012); the average 2R value (0.96) and average accuracy (66.1%) were the highest. Keywords: LSTM , Attention Mechanism , Extreme Gradient Progression Tree , Stock Forecasting 1. IntroductionThe Internet has expanded the scale of economic activities and enhanced investment awareness [1]. Stock investing appeals to investors because of convenience, high risk, and high returns. Stock price forecasting, decision-making, and portfolio models provide useful tools for helping investors plan their investment strategies [2]. However, stock price trends are characterized by instability, randomness, and nonlinearity, making it difficult for investors and researchers to predict stock prices. Previous studies have attempted to improve the predictive performance of stock models, which is also the goal of this study. Methods for predicting stock time series can be classified into statistical, traditional machine learning (ML), and deep learning (DL) models. Statistical methods include moving average, autoregressive moving average, and logistic regression. However, as stock market systems are complex and nonlinear, statistical models are unable to achieve accurate prediction results [3]. Researchers have attempted to use ML models to solve nonlinear stock time series, such as support vector machines, artificial neural networks, and decision trees [4-8]. Although the aforementioned methods have significantly progressed stock market forecasting, these models have a highly complex training time and cannot efficiently handle large amounts of financial data. In recent years, DL models have been utilized in stock prediction research because of their excellent data processing capabilities. The most common techniques include recurrent neural networks (RNN), long short-term memory (LSTM), and convolutional neural networks (CNN) [9-11]. These techniques can easily handle large amounts of financial time series data; however, single models can lead to low model generalization and cannot be applied to volatile stock markets. To overcome this issue, this study proposes a novel framework to enhance stock price prediction accuracy. Compared with single models, multi-model integration approaches exhibit better predictive performance. In 2017, Cheng et al. [12] proposed an attention-based LSTM model for stock price movement prediction. In 2019, Kim and Kim [13] used LSTM and CNN to extract temporal and image features from SPDR S&P 500 exchange-traded fund (ETF) data and found that these methods can effectively reduce the prediction error. In 2020, Wu et al. [14] converted stock data and leading indicators into simulated images and predicted stock prices using a CNN-LSTM model. The prediction results were more accurate than those of single models. In 2021, Lu et al. [15] proposed a CNN-bidirectional LSTM (BiLSTM) attention mechanism (AM) model to predict stock index prices. The results show that the AM can improve prediction performance. In 2022, Lee [16] combined the gated recurrent unit (GRU) and AM to predict the rise and fall of future trading days. However, stock prices are affected by numerous factors because of the changing market conditions and economic environment, introducing noise into stock prices. However, the aforementioned models do not address the problem of noisy raw stock price data. To address this problem, we used extreme gradient boosting (XGBoost) to process the original data. In this study, a novel XGBoost-LSTM-Attention model was used to predict Chinese stocks, including the Shanghai and Shenzhen Stock Exchanges. XGBoost can extract important features of the original stock information from the input data, and regarding network structure, the combination of LSTM and the AM can enhance the accuracy of stock predictions. The purpose of this study was to determine whether the XGBoost-LSTM-Attention model can accurately predict stocks in different industries and whether significant differences exist between different stock forecasting models in developing economies. The remainder of this paper is organized as follows: Section 2 briefly introduces the techniques employed in this study. Section 3 introduces the training process for the XGBoost-LSTM-Attention model. Section 4 presents the experimental research process and an analysis of the experimental results using different models, followed by a conclusion in Section 5. 2. Methodology2.1 XGBoostThe XGBoost algorithm is an extensible end-to-end enhancement tree system [17] that improves the gradient boosting decision tree (GBDT) objective function by adding a regular term to the original function and expanding the loss function according to the second-order Taylor’s formula. In Eq. (1), [TeX:] $$\begin{equation} L\left(y_i, \hat{y}\right) \end{equation}$$ is the squared difference loss function between the predicted value [TeX:] $$\begin{equation} \hat{y} \end{equation}$$ and real value yi. In Eq. (2), T represents the number of leaf nodes; Wj is the value of the j-th leaf node; Gj is the sum of the first partial derivative; and Hj is the sum of the second partial derivative.
(1)[TeX:] $$\begin{equation} o b j=\sum_{i=1}^n\left(L\left(y_i, \hat{y}\right)\right)+\sum_{i=1}^t \Omega\left(f_i\right), \end{equation}$$
(2)[TeX:] $$\begin{equation} o b j^{(t)}=\sum_{j=1}^T\left[G_j W_j+\frac{1}{2}\left(H_j+\lambda\right) W_j^2\right]+\gamma T . \end{equation}$$The objective function can be simplified by calculating the first-order Wj as follows:
(3)[TeX:] $$\begin{equation} o b j=-\frac{1}{2} \sum_{j=1}^T \frac{G_j^2}{H_j+\lambda}+\gamma T . \end{equation}$$XGBoost calculates the split revenue for each node using a greedy algorithm. Eqs. (4) and (5) are the objective functions before and after splitting, respectively, and “Gain” is the basis for calculating feature importance scores and performing feature selection, as expressed in Eq. (6).
(4)[TeX:] $$\begin{equation} o b j_1=-\frac{1}{2}\left[\frac{\left(G_L+G_R\right)^2}{H_L+H_R+\lambda}\right]+\gamma, \end{equation}$$
(5)[TeX:] $$\begin{equation} o b j_2=-\frac{1}{2}\left[\frac{G_L^2}{H_L+\lambda}+\frac{G_R^2}{H_R+\lambda}\right]+2 \gamma, \end{equation}$$
2.2 LSTMThe structure of LSTM is a variant of RNN. LSTM adopts recursive network architecture and gradient based learning algorithm [18] to select the retention and forgetting of data through three “Gates” in the cell structure and solves the problem of gradient disappearance caused by the long input sequence of RNN model, as shown in Fig. 1. LSTM states are updated as follows: first, the forgetting gate determines whether to keep the data based on the output of the previous time point and the current input data, as expressed in Eq. (7), where ft is the forgetting gate, and h_(i-1) and x_t are the outputs of the previous and current moments, respectively.
Second, the input gate saves the updated data and stores them in the cell state using Eq. (8), where it is the input gate.
Third, the cell state is updated using Eq. (9), where Ct is the new cell.
Finally, the output gate determines the data output of the cell state, as expressed in Eqs. (10) and (11).
Here, the parameter set [TeX:] $$\begin{equation} \left\{W_o, W_f, W_i\right\} \end{equation}$$ corresponds to the weight matrix of the different gates; [TeX:] $$\begin{equation} \left\{b_o, b_f, b_i\right\} \end{equation}$$ represents the corresponding offset items; tanh and [TeX:] $$\begin{equation} \sigma \end{equation}$$ are activation functions. 2.3 Attention MechanismIn 1980, Treisman and Gelade [19] proposed the attention mechanism. Based on the human brain, AM stems from the ability of the human vision to quickly focus on key areas with high attention weights. Similarly, AM calculates the importance of each element in the input sequence and assigns weights to different features according to their importance. The calculation principle of the AM is illustrated in Fig. 2. First, value (V) of each key (K) is obtained by calculating the similarity between each query (Q) and key. Second, the softmax function is used to normalize the weights of the values to obtain the weight coefficients. Third, the weight coefficient and corresponding value are weighted and summed to obtain the final attention value, as follows:
3. XGBoost-LSTM-AttentionThe XGBoost algorithm can extract important features of the input data; consequently, it is widely used for data processing. The LSTM can handle long-term dependence information and is widely used in time-series analysis. The AM can capture the internal correlations of the data and information at different points in the input sequence. Thus, based on the advantages of XGBoost, LSTM, and AM, an XGBoost-LSTM-Attention model for stock prediction is constructed. The network structure is illustrated in Fig. 3. The fundamental structures are the XGBoost, and LSTM attention modules. First, the XGBoost module extracts important feature subsets by computing the important feature scores for each stock dataset. Second, the LSTM-Attention module, which includes LSTM, attention, and BiLSTM layers, improves the generalization ability and accuracy of the prediction model. 3.1 Raw Stock Data ProcessingThe stock dataset is the key to ensuring that the forecasting model has good generalization ability. The XGBoost algorithm is used to extract important information from the original stock data and obtain important feature subsets. The XGBoost algorithm is outlined in Table 1. Table 1. Data processing steps
We considered Kweichow Moutai (SH600519) as an example. The original dataset includes the stock data for each trading day from 2012 to 2022. Each piece of data contains nine items: open, high, low, close, previous close, volume, amount, change, and percentage change. The original data are presented in Table 2. Table 2. Original stock data
The specific steps of the XGBoost data processing are: First, the XGBoost method is used to build a model. Each feature score of the original stock data is obtained through model training; the scores are then arranged in descending order, as shown in Fig. 4. Second, the feature dimension is increased using the recursive method, and the accuracies of the models under different dimensions are compared, as shown in Fig. 5. The results show that the accuracy rate is the highest when extracting six-dimensional features (87.5%). Third, the first six feature items are used as an important feature subset, which includes volume, open, change, amount, change, and close. Finally, the important feature subsets and technical indicators are used as input data for the LSTM-Attention module. The technical indicators primarily included the moving average (5 days) and Bollinger Bands. 3.2 LSTM-Attention Prediction ProcessThe LSTM attention module predicts stock prices as follows (Fig. 6): · Input processed data is the dataset obtained after the XGBoost module processing. · Data standardization: The discrepancies in the stock dataset necessitate that the stock dataset be normalized to better train the model. The normalization method for processing the dataset is shown in Eq. (13), where Xmin and Xmax are the minimum and maximum values in the stock dataset, respectively.
· Network initialization: The weights of each network layer are initialized to a fixed mean or variance. · LSTM-Attention calculation: The model output value is obtained by calculating each network layer of the LSTM-Attention module. · Calculation error: The output and actual values are compared to determine the corresponding error. · End condition corresponds to the maximum number of iterations reached or to a predicted error that is lower than the set threshold. · Output results: The results are output when the prediction process is completed and the neural network model is trained. 4. ExperimentsIn this section, we verify the validity of XGBoost-LSTM-Attention on Chinese stocks in different industries, and the performance of different models is compared in developing stock markets [14-16]. 4.1 DatasetsWe selected Chinese stocks from different industries for the experimental data, which are presented in Table 3. Transaction data for 2,357 trading days from January 2012 to June 2022 were obtained from the Tushare database [20]. To evaluate the generalization error, the stock dataset was split into training and testing datasets at a ratio of 8:2. Table 3. Stock picking list
4.2 Model ImplementationAll models were built on the Windows 10 operating system, and the GPU version of the TensorFlow framework was used. The experimental hardware environment was GeForce RTX2060 with 16 GB of RAM. The model settings are listed in Table 4. Table 4. LSTM-Attention setting
The Adam optimizer method was used in the experiment with model training parameters as follows: batch size of 5, time step of 5, epoch of 50, mean absolute error (MAE) as the loss function, and a learning rate of 0.001. To evaluate the prediction effect of experience level agreements (XLA), root-mean-square error (RMSE), R-square (R2), and accuracy were used as model evaluation criteria. The RMSE equation is:
(14)[TeX:] $$\begin{equation} R M S E=\sqrt{\sum_{i=1}^n\left(y_i-\hat{y}_i\right)^2}, \end{equation}$$where [TeX:] $$\begin{equation} \hat{y}_i \end{equation}$$ is the predicted value, and yi> is the actual value. The smaller the RMSE value, the smaller is the prediction deviation of the model. The R2 equation is:
(15)[TeX:] $$\begin{equation} R^2=1-\frac{\sum_{i=1}^n\left(y_i-\hat{y}_i\right)^2}{\sum_{i=1}^n\left(y_i-\hat{y}_i\right)^2}, \end{equation}$$where [TeX:] $$\begin{equation} \hat{y}_i \end{equation}$$ is the predicted value, and yi is the actual value. The closer the R2 value is to 1, the better is the fitting of the model. The accuracy (ACC) equation is:
where TP and TN denote the number of predicted successes and failures, respectively. The higher the accuracy, the more accurate is the stock price prediction. 4.3 ResultsThe stock datasets were provided as input to XGBoost-LSTM-Attention for training, and the test datasets were predicted using a trained model. The actual and predicted values were compared, and the experimental results were divided based on the movement of the stock price into the following three groups: volatile, upward, and downward trends, as shown in Figs. 7–9. Fig. 7. Stock price forecast results under a volatile trend: (a) China Merchant Bank, (b) Gemdale Group, (c) Human Valin Steel, (d) Baoshan Iron & Steel, and (e) Jiangling Motors. ![]() In a volatile trend, many factors influence the price, leading to rises and falls, making it difficult to predict. In Fig. 7(a)–7(e), the prediction results of XGBoost-LSTM-Attention are relatively stable in terms of volatility. In an upward trend, stocks are in short supply, causing stock prices to keep rising. In Fig. 8(a)–8(h), the minimum and maximum stock price increases are 40% and 1000%, respectively, and the model-predicted prices follow the actual prices well. In a downward trend, stocks are sold off and prices continue to fall. Fig. 9 shows that the model predicted price results are relatively close to the actual prices. In Figs. 7–9, the prediction curve of XGBoost-LSTM-Attention fits well with the actual price curve and can reflect the actual price trend. The stock prediction experiment demonstrates that our model can accurately predict stocks in different industries, and the prediction results are relatively stable. Fig. 8. Stock price forecast results show upward trend: (a) Chongqing Yukaifa, (b) Yueyang Xingchang, (c) China Shenhua Energy, (d) Shanghai Energy, (e) Tibet Mining, (f) Zijin Mining, (g) BYD, and (h) Petro China. ![]() In the comparison experiments, we trained GRU-Attention, CNN-LSTM, CNN-BiLSTM-AM, and XGBoost-LSTM-Attention using the leading stock dataset. Based on the experimental results of all the models, the generalization errors of each model can be calculated. The results of the evaluation criteria of each model are listed in Tables 5 and 6, where relatively excellent results are observed for three of the four test datasets. Among the four models, the R2 of the XGBoost-LSTM-Attention model is the highest; RMSE is relatively small; and ACC is relatively high. Similarly, the GRU-Attention model also exhibits good performance. The R2, RMSE, and ACC of the GRU-Attention model increased by 0.3%, 5%, and 9% on the Baoshan Iron & Steel test datasets, respectively, compared to the XGBoost-LSTM-Attention model. Comparing CNN-BiLSTM-AM with CNN-LSTM, the RMSE decreased by 0.082, 0.108, 0.101, and 0.086 for the four test datasets. Thus, CNN-BiLSTM-AM, GRU-Attention, and XGBoost-LSTM-Attention exhibit better performance than the CNN-LSTM model. This is because the neural network models based on attention can effectively study the correlation between stock information, thus reducing forecast error, which improves prediction performance. Table 5. Evaluation metrics of different models on China Shenhua and Baoshan Iron & Steel datasets
Table 6. Evaluation metrics for different models on the Zijin Mining and Petro China datasets
In Tables 7 and 8, for the same input conditions, the evaluation results of our model are significantly different from those of the other three models. For the three test datasets, the R2, RMSE, and ACC values of our model were 0.955, 0.008, and 65.21%; 0.974, 0.003, and 67.08%; and 0.984, 0.014, and 73.08%, respectively. This demonstrates that the prediction performance of the model can be improved by processing the original stock data. In general, the XGBoost-LSTM-Attention model had the best generalization ability among the four models, with an average RMSE of 0.012, an average R2 of 0.96, and an average ACC of 66.1%, indicating that the forecasting performance can be improved by processing the original data and by combining LSTM with the AM to accurately predict stocks in different industries. For the other three models, based on the results, they may not be applicable to stock markets in developing countries. Table 7. Evaluation metrics for different models on China Merchants Bank and BYD datasets
5. ConclusionIn this study, we focused on two tasks: first, we proposed a new model, XGBoost-LSTM-Attention, for stock prediction in the China A-share market. Subsequently, the XGBoost-LSTM-Attention model was used to predict stocks in different industries, and the results were compared with those of existing models. Specifically, in terms of data input, the XGBoost algorithm was used to extract the important features of the original data, thereby reducing market noise and redundant feature information. In terms of the network structure, an LSTM-based attention unit was combined with the BiLSTM prediction unit to improve prediction accuracy. In the experiments comparing the CNN-LSTM, GRU-attention, and CNN-BiLSTM-AM models with our XGBoost-LSTM-Attention model, the proposed model had the highest average R2, lowest average RMSE, and highest ACC, indicating that the model is effective for stock price prediction. XGBoost-LSTM-Attention can be used as an important reference for investors when investing in stocks. In the future, we will introduce sentiment analysis to predict the stock market more comprehensively. BiographyZhiyong Yanghttps://orcid.org/0000-0003-4399-5557He received his M.S. and Ph.D. degrees from Chongqing University in 2009 and 2013, respectively. He is currently a full professor and vice president at the Chongqing Vocational Institute of Engineering, Chongqing, China. His current research interests include artificial intelligence and big data processing and analysis. BiographyReferences
|