3. Data description and preprocessing In this study, we use the data set from four stock markets, SSEC (SSE Composite Index) of China stock market, HSI (Hang Seng Index) of Hong Kong stock market, SP 500 (Standard & Poor’s 500 Index, and DJI Dow Jones Industrial Average) of US stock market. These data sets represent four types of markets. The Chinese market is an immature market. The Hong Kong market is a semi-mature market. The American market is a mature market, and one of the earliest stock markets. Each data set includes the open price, high price, low price, close price, and volume from January 4, 2005, to March 8, 2021. Part of the four data sets is shown in Table Many studies have shown that technical indicators are very effective features for stock forecasting ( Basak et al., 2019; Haq et al., 2021; Patel et al., a. We chose 18 indicators in our study, which are shown in Table Normalization is unnecessary in ensemble learning methods because the scale of the data does not affect the final result. In addition, stock forecasting is a classification task in this study, the target to be predicted is calculated as follows target after i = { 1, (close i − close)/close > 0 − 1, (close i − close)/close < 0 (1) where target after ii is the label after i days, close is the close price, close i is the close price on the i th day. In the experiment, in chronological order, the first 80% of the data was the training set, and the last 20% of the data was the test set. For example, in the SSEC data set, there were 3863 data points in total after calculation of technical indicators and data cleaning. Among those the first 3090 data points were used for training and the remaining 773 data points were used for testing. The same was true for the other three data sets. We setup size-varied time windows, and extracted the information of each day contained in the time window as a lagged variable. All the information in the time window forms a data point. The features contained in each data point are given in the following formula DayPoint = {Indicator 1 , Indicator 2 , ..., Indicator n }, n = Indicatornumber (2)
Share with your friends: |