1. Introduction
During past decades, business failure prediction (BFP) has been a global significant topic for its key role in helping financial institutions and investors evaluate the firm solvency risk. Even though many researchers all over the world have worked in this field for decades, BFP is still a big challenge in practices [1]. Moreover, firms’ operational environment is more complex in the big data age than ever. BFP becomes much more urgent but also more difficult than over the past [2]. While, both prior literatures and practices have demonstrated that signs of BFP could be found if a firm will encounter the failure [3].
Following this idea, a lot of accounting ratios, marketing based features, etc., have been adopted to predict business failure [1,4]. For example, many accounting ratios, including sales to total assets, have been selected for bankruptcy forecasting [5,6]. Meanwhile, researcher also have proposed different statistical and intelligent methods for BFP [7-12], such as discriminant regression model (DRM), logistic regression (LR), support vector machine (SVM), to name a few. Those predicting methods have been verified to be effective tools to discriminate business failure firms from normal firms. However, each individual methods has its advantages and limitations.
Thus, with considering the significant impact of models in the performance of BFP, various combination predicting models, which focused on combining individual classifiers to improve predicting performance, have been proposed for BFP [13,14]. However, conventional combination predicting models mainly utilized majority voting, equal weighted, or intelligent algorithms, etc., to combine the results of each basic classifiers. While there are some deficiencies on predicting business failure with those models in practices. For example, conventional combination models need to determine the weights of each basic classifier, and be trained with big sample [15]. They also have a contradiction of components increasing and over fitting problem, and ignore qualitative methods.
To overcome or avoid such disadvantages, this paper aims to develop a novel unweighted combination method (UCSS) for BFP. This method, marked as UCSS, adopts the uni-int decision making method to combine the quantitative analysis and the qualitative analysis together. Given features of individual classifiers, the conventional expert system (ES) is employed as the basic qualitative classifier, LR and SVM are employed as basic quantitative classifiers. The uni-int decision making method is a novel parameterized mathematic approach [16]. In other words, the predicting results of ES, LR, and SVM can be further excavated with this method. Hence UCSS inherits the flexibility and efficiency of ES, LR, SVM, and SS. Comparing with traditional predicting methods of BFP, UCSS has its advantages.
UCSS does not need to compute the weights of each classifier.
UCSS is excellent on dealing with small sample sizes and uncertainty data for BFP.
UCSS has advantages for BFP in complex environments. UCSS, which is developed based on the quantitative analysis and the qualitative analysis, has advantages on analyzing both the numerical information and the uncertainty information.
The problem of over fitting and generalization decreasing will not occur as components increasing for UCSS.
UCSS is easy to understand and apply for BFP in practice.
For comparison, ES, LR, SVM, the combination method utilizing the equal weight approach (CMEW), neural network algorithm (CMNN), rough set and D-S evidence theory (CMRD), and the receiver operating characteristic curve and SS (CFBSS) are included as benchmarks in this paper. We randomly split the real data into two sets: training dataset and testing dataset with percentages (25%, 75%), (50%, 50%), and (75%, 25%). In such a way, we can verify the performance changing of UCSS with different sample sizes.
A novel UCSS is proposed for BFP.
The uni-int decision making method is employed to combine results of each basic classifier.
Both quantitative analysis and qualitative analysis are employed to predict business failure under the complex environment.
We organize the remainder of this paper as follows. Section 2 reviews combination methods for BFP and the uni-int decision making method. Section 3 elaborates UCSS, and Section 4 provides an empirical experiment. Section 5 shows and discusses the empirical result. In Section 6 we conclude this study.
2. Literature Review
2.1 Combination Methods for BFP
Researchers rarely considered the quantitative analysis and the qualitative analysis together for BFP [1]. More attention is paid on the quantitative analysis with financial data [7,11][14,17]. However, both the qualitative analysis and the quantitative analysis have advantages and disadvantages [18]. There is a huge number of qualitative information can be used for BFP under the era of big data. Therefore, it is a great development to include the qualitative analysis as a component of combination predicting methods based on an appropriate combining method.
While, little attention has been paid on combining of multi-classifiers for BFP. They mainly focused on the individual predicting method or ensembles of two predicting classifiers [14]. While, a better performance may be obtained by combining more classifiers with an appropriate combining technology [13]. Thus various combining methods have been adopted. Majority voting, weighting, or intelligent algorithms are widely used combining methods [14]. As mentioned above, however, those combination methods have their advantages and disadvantages in practice. This paper contributes to both combination methods and business failure prediction in the following two ways.
First, the uni-int decision making method, which has advantages in BFP practice [15], is adopted as the combining method. Second, both the quantitative analysis and the qualitative analysis are included for predicting business failure.
2.2 The uni-int Decision Making Method
Unlike probability theory, fuzzy set theory, etc., SS is free from the insufficient of parameterization. It can be treated as a proper method for uncertainty, and can be presented as formula (1).
where [TeX:] $$U$$ is the universe , [TeX:] $$E$$ is the parameters set, [TeX:] $$P(U)$$ is the power set of [TeX:] $$U$$. A soft set [TeX:] $$F_{A}$$ on [TeX:] $$U$$ is the set of ordered pairs, if [TeX:] $$A \subseteq E, f_{A}: E \rightarrow P(U), f_{A}(x)=\emptyset, \text { and } x \notin A . f_{A}$$ is the mapping function of [TeX:] $$F_{A}$$, and [TeX:] $$f_{A}(x)$$ is the set of [TeX:] $$F_{A}$$ over the parameter [TeX:] $$x$$. The collection of all soft sets on [TeX:] $$U$$ is [TeX:] $$S(U)$$. With the definition above, the uni-int decision making method has been developed [19]. It is a binary operation of soft sets based on their products. If [TeX:] $$\wedge(U)$$ is the set of [TeX:] $$\wedge$$-products of soft sets, and [TeX:] $$F_{A} \wedge F_{B} \in \wedge(U)$$, then uni-int operations, marked as [TeX:] $$u n i_{x} i n t_{y}$$ and [TeX:] $$u n i_{y} i n t_{x}$$, can be carried out, respectively, as formulas (2) and (3).
Then we can obtain the uni-int decision set, shown as formula (4).
The uni-int decision making method is an excellent combining method to integrate the opinions of different decision makers in practice, especially when the data is uncertainty and the data volume is not very large [20]. However, up to now, researchers mainly focus on the theoretical research [20,21], and rarely pay more attention on the applications to some practical problems. Our contribution margin is filling this gap. We applied it to combine multi-classifiers for achieving a better performance of BFP.
3. Methodology
A novel UCSS is introduced here for BFP. The principle of UCSS can be briefly showed in Fig. 1.
Assume there are [TeX:] $$n$$ samples and [TeX:] $$m$$ predicting classifiers, where [TeX:] $$Y_{n}$$ is a matrix which stands for the financial status of samples, [TeX:] $$Y_{n m}$$ is a matrix which presents the predicting result of mth method of nth samples, [TeX:] $$Y_{n u}$$ is a matrix includes combination results of [TeX:] $$Y_{n m} \cdot Y_{n}$$, [TeX:] $$Y_{n m}$$ and [TeX:] $$Y_{n u}$$ are showed as in formulas (5) and (6).
where
[TeX:] $$y_{n}, y_{n m}, y_{n u}$$ equals to 1 or 0. If the [TeX:] $$n$$-th sample is failed actually, then [TeX:] $$y_{n}=0$$, or else [TeX:] $$y_{n}=1$$. If the [TeX:] $$n$$-th sample is predicted to be failed by the [TeX:] $$m$$-th classifier, we mark its output as [TeX:] $$y_{n m}=0$$, if not [TeX:] $$y_{n m}=1$$. Also, we mark the output as [TeX:] $$y_{n u}=0$$ if the [TeX:] $$n$$-th sample is predicted to be failed by the UCSS, otherwise [TeX:] $$y_{n u}=1$$. Obviously, two key points should pay more attention to construct the UCSS: individual predicting classifiers, and the combining methods. In this paper, the novel [TeX:] $$u n i-i n t$$ decision making method is adopted to be the combining approach. Thus, in the following, we mainly discuss individual predicting classifiers.
3.1 Individual Predicting Classifier
As mentioned above, each classifier has its disadvantages and advantages. We want to combine more classifiers into the proposed combination predicting method UCSS to exploit their advantages and restrain their disadvantages. However, the combination method will be more complex as the components increase generally. It may suffer the over fitting question and the generalization of the model will be poor too.
To resolve this contradiction, the [TeX:] $$u n i-i n t$$ decision making method is employed to combine results of basic classifiers. [TeX:] $$u n i-i n t$$ decision making method is free from such a contradiction. Because the novel combining method is a parameterized tool. The increasing of individual predicting classifiers will let the [TeX:] $$u n i-i n t$$ decision making method have a better performance [15].
Three seems a suitable number of components to get a balance between the predicting performance and the model’s complexity [1]. We want to include the quantitative analysis and the qualitative analysis together. Thus, based on prior literatures [1,12], we adopt ES, LR, SVM as basic classifiers. Those three classifiers above have been certified as effective predicting methods for predicting business failure [22].
3.2 Algorithm
According to the analysis above, the algorithm of UCSS can be illustrated in Fig. 2.
The algorithm of the proposed UCSS.
Step 1. Samples and data preprocessing: Using the 10 times’ split method, we divide all data into two sets: the validating data group including training datasets and testing datasets, and the holdout data group. It will be abandoned. First of all, we eliminated the difference of variables by data normalizing. The function is defined as follow.
where, the initial value of the [TeX:] $$i$$-th variable with respect to the [TeX:] $$n$$-th firm is [TeX:] $$x_{n i}$$, and the minimal value and maximal value of the [TeX:] $$i$$-th variables cross all samples are [TeX:] $$m i n_{i}$$,[TeX:] $$m a x_{i}$$.
Step 2. Predicting results of basic classifiers: For ES, the rule-based expert system with forward chaining is employed as the ES algorithm. More specifically, we employ a financial institution that always publishes the analysis reports of Chinese listed firms as an expert. Experts will offer advices “buy”, “sell”, or “keep” for stocks. Here, we use the variable [TeX:] $$x_{E S}$$ represents “the advice of experts” for ES. The first rule is that if the financial institution suggest to buy or keep a firm’s stock, then in their opinion the firm will not suffer the failure. The second rule is that if the financial institution suggest to sell a firm’s stock, then in their opinion the firm will suffer the failure. To eliminate the predicting bias, several experts are included. Suppose the number of experts is an odd number, the third rule is that if a firms’ stock is suggested to buy or keep by more experts, then ES will believe the firm will not suffer business failure. The whole identification process can be presented in mathematic. First, the predicting results of k experts, marked as [TeX:] $$Y_{n k}$$, can be showed in formula (8).
The comment “buy” is assigned the value “1”, “keep” is “0”, and “sell” is “-1”. The predicting result of the [TeX:] $$n$$-th sample using ES is [TeX:] $$y_{n}^{E S}$$, can be showed in formula (9).
It means, if most experts suggest to buy or keep a firm’s stock, ES believe this firm will be in the normal financial status.
Please refer to the literature [6] for details of LR. If a firm’s probability of y=1 is bigger than 0.5, we treat it as a normal sample, or else we treat it as a specially treated (ST) sample. And refer to the literature [12] for details of SVM, we adopt the radial basis function as the kernel function, and search out the optimal parameters of the kernel function by the grid search technique.
Step 3. Construct SSs [TeX:] $$\left(F_{E S}, F_{L R}, F_{S V M}\right)$$: According to formula (1), we construct three different SSs [TeX:] $$\left(F_{E S}, F_{L R}, F_{S V M}\right)$$ for the approximate function F of SS is ES, LR and SVM. The universal samples set of [TeX:] $$F_{E S}, F_{L R}, F_{S V M}$$ is U, the parameter set of [TeX:] $$F_{L R}, F_{S V M}$$is [TeX:] $$E_{1}$$, and the parameter set of [TeX:] $$F_{E S}$$ is [TeX:] $$E_{2}$$. Thus, for UCSS, the parameter set includes two parts.
Step 4. Find the [TeX:] $$\wedge$$- product of [TeX:] $$F_{E S}, F_{L R}$$ and [TeX:] $$F_{S V M}$$: According to formulas (2) and (3) and the literature [19], we can get [TeX:] $$\wedge$$- products of [TeX:] $$F_{E S}, F_{L R}, F_{S V M}$$. First, we can compute the [TeX:] $$\wedge$$- product of [TeX:] $$F_{L R}$$ and [TeX:] $$F_{S V M}$$ that marked as [TeX:] $$F^{\prime}$$. Then, we can obtain the [TeX:] $$\wedge$$- product of [TeX:] $$F^{\prime}$$ and [TeX:] $$F_{E S}$$.
Step 5. [TeX:] $$u n i-i n t$$ decision sets calculating: Using formula (4), we can calculate the [TeX:] $$u n i-i n t$$ decision set of [TeX:] $$\left(F_{E S}, F_{L R}, F_{S V M}\right)$$. Then, we obtain the combination predicting results of UCSS.
3.3 Performance Measure
Researchers have proposed various methods and metrics to measure the performance of BFP [3,12]. Each has its advantages and limitations. Given the imbalanced dataset for BFP, here we adopt the area under curve (AUC) of the receiver operating characteristic (ROC), which has been always treated as a performance measure to assess the comprehensive discriminatory power of classifiers, as the evaluation metric [23]. Generally, AUC ranges from 0 to 1. The performance of a classifier is better if its AUC is higher.
4. Experiment Design
4.1 Data
In practice, according to the net profit in the recent two years, we always divided Chinese listed firms into two groups: ST firms and normal firms. Specifically, we treat the firm which has the negative net profit in the recent 2 years (ST firm) as a business failure sample, and treat other firms normal. The data is mainly collected from the Shenzhen and Shanghai Stock Exchanges from the CSMAR database. The 520 listed firms, which include 320 NST listed firms and 200 ST listed firms, are randomly selected from years 2010 to 2017 as samples. For (25%, 75%), it presumes that 130 training samples and 390 testing samples compose the dataset. (50%, 50%), (75%, 25%) have the same meaning. In this way, the impact of the sample size on the performance of each model for BFP can be clearly verified.
Besides, we also download comment data of financial institutions from the website of “Invest Today” span from 2010 to 2017 in China. More than 1,000 professional institutions have worked in this field. We employ 5 (k=5) financial institutions as experts of ES, which are randomly selected according to the background and the analysis ability. They are “CMS”, “CITIC Securities”, “SHENWAN & HONGYUAN Securities”, “HAITONG Securities”, and “INDUSTRIAL Securities”. At present, with the development of big data, it makes the job of collecting such information possible and easy.
4.2 Variables
As mentioned in Section 3, [TeX:] $$x_{E S}$$ is the variable of the qualitative analysis (ES). Therefore, we focused on the variable selection of the quantitative analysis (LR and SVM).
For BFP, 18 variables that listed in Table 1 are the most popular variables for quantitative methods in prior literatures [10]. Obviously, some variables are highly correlated. We have to carry out a variable reduction to eliminate the multicollinearity problem. According to the literature [1], we employed the stepwise logistic regression to select variables. As a result, we selected 5 variables for LR and SVM: [TeX:] $$x_{1}, x_{4}, x_{5}, x_{14}$$, and [TeX:] $$x_{18}$$. For UCSS, we adopted 6 variables: [TeX:] $$x_{1}, x_{4}, x_{5}, x_{14}, x_{18}$$, and [TeX:] $$x_{E S}$$.
4.3 Experiment Framework
The empirical experiment is used to verify whether UCSS can achieve an acceptable predicting performance. With considering the percentage of (25%, 75%), the percentage of (50%, 50%), and the percentage of (75%, 25%), we split the sample up into two groups: the training dataset and the testing dataset. According to the literature [12], it is much more challenge to predict business failure in a longer predicting horizon. Hence we try to challenge it by adopting the datasets of the year [TeX:] $$(t-2)$$, and [TeX:] $$(t-3)$$ to predict business failure at the year [TeX:] $$t$$. For comparison, three individual predicting models, and four combination predicting methods are included as benchmarks. The experiment framework is illustrated in Fig. 3.
The framework of this empirical experiment.
5. Results and Analysis
5.1 Out-of-Sample Results
Here we take 10-fold cross-validation technique to evaluate the out of sample performance of each method. All computations are performed by the MATLAB R2016 software.
According to the literature [12], the radial basis function (RBF) is adopted to be the kernel function of the SVM. Based on the training datasets, we can search out optimal parameters of kernel functions by the grid-search technique. Given features of BFP, the neural network (NN) is set up to be the back propagation NN algorithm. Thirty times verifications were conducted based on training datasets to select the optimal result as NN’s outputs. Meanwhile, the rule-based expert system with forward chaining is employed as the ES algorithm. Testing datasets are used to verify the out-of-sample performance of each method. Predicting results of the confidence intervals is 95% are illustrated in Figs. 4–9.
5.2 Discussion and Findings
Here AUC of ROC curve is employed as the criterion to evaluate the predicting performance of each predicting method for BFP. To observe the changes of predicting performance of each method, we summarized the value of AUC based on the Figs. 4–9, and listed in Table 2. In the following, the predicting performance of each method are compared and analyzed from the horizontal perspective and the vertical perspective.
5.2.1 Horizontal comparisons and discussions
From Figs. 4, 6, 8, and Table 2, it is quite clearly that the proposed predicting method UCSS has the highest value of AUC. No matter what the sample percentage changes from (25%, 75%) to (75%, 25%), the value of AUC for UCSS is still around 0.84. It means that the UCSS not only has good predicting accuracy but also has reliable predicting stability. The superior performance is derived from the advantages of the uni-int decision making method on dealing with the small sample size. The individual predicting classifier ES and SVM of UCSS also have advantages on predicting with the small sample sizes. Moreover, without weighting determination, UCSS can reduce the interference in the process of calculation and obtain a better predicting performance. Also, the value of AUC for ES (around 0.81) does not have a lot changes with different sample sizes. This is easy to understand. It is because experts of the ES are actual practitioners. Risk management is a key work for them.
AUC of each predicting method using datasets of mainland China
ROC curves. Datasets: the year (t-2), the percentage (25%, 75%).
ROC curves. Datasets: the year (t-3), the percentage (25%, 75%).
ROC curves. Datasets: the year (t-2), the percentage (50%, 50%).
ROC curves. Datasets: the year (t-3), the percentage (50%, 50%).
ROC curves. Datasets: the year (t-2), the percentage (75%, 25%).
ROC curves. Datasets: the year (t-3), the percentage (75%, 25%).
While, different from the UCSS, other benchmarks were suffering from the size changing of samples. Most of the rest predicting methods get a better predicting performance with the percentage changing from (75%, 25%) to (25%, 75%). The training sample size is bigger, the value of AUC is higher, the predicting performance is better. The obvious upheaval of results are LR, CMEW, CMNN. While, for SVM, CMRD, CFBSS, the result is different. When the sample size is changing, they always can obtain acceptable predicting results. This advantage is inherited from the superiority of SVM, evidence theory, and SS on dealing with small sample sizes.
Furthermore, from Figs. 5, 7, 9, and Table 2, according to the value of AUC, we can easily summarize a conclusion that the ranking results of each predicting method no matter what the data of the year is adopted do not have big difference. The proposed predicting method UCSS has the highest AUC. The AUC value of UCSS is still around 0.77.
5.2.2 Vertical comparisons and discussions
From Figs. 4–9, and Table 2, for each selected predicting method in this study, it is easy to obtain a conclusion that the predicting with the datasets of the year [TeX:] $$(t-2)$$ has a better performance (bigger value of AUC) than predicting with the datasets of the year [TeX:] $$(t-3)$$. It is because that some unexpected incidents might have happened during the longer predicting term. The unexpected incidents may have impact on the firms’ development. This conclusion is same as the literatures [1] and [7].
More specially, for the proposed UCSS, the difference of the AUC value is the second smallest. The smallest one is the CFBSS. It means that both the CFBSS and the UCSS have an excellent fault-tolerant capability for BFP in long predicting term. The biggest is the LR. This because LR is a classical statistical method. The performance of LR depends on the quality of data.
5.3 Robustness Test
To test the adaptability of the proposed UCSS model, we randomly select 360 listed firms in pair of the Taiwan Stock Exchange (TSE) from years 2010 to 2017 as samples to make a robustness test. According to the literature [24,25], if the stock is classified as “full delivery stock” by TSE, we view the firm is business failure. And “Yuanta Securities”, “KGI” and “Fubon Securities”, which are top three security institutions in Taiwan, are employed as experts of the ES. The “Taiwan Economic Journal Data Bank” database is the data sources. Then we run our new empirical experiment with Taiwan China samples again. The results, which are briefly showed in Table 3, are similar to the empirical results using data of mainland China, although the exact value is different. Because, the operational environment in Taiwan China is more complex than the mainland China. This demonstrates that our proposed predicting model can be widely used for BFP.
AUC of each method using datasets of Taiwan China
6. Conclusion
In this study, the research margin of combination models for predicting business failure was extended by introducing a novel unweighted combination method (UCSS). For UCSS, the combination method, which is used to integrate the results of each basal classifiers without weighting, is the novel uni-int decision making method. This method is developed based on the soft set theory. In such a way that the dilemma of weighting for most combination predicting methods can be effectively bypassed. Compared with other selected benchmark methods, UCSS demonstrates its excellent performance when it is applied to predict the business failure under all sample sizes in the age of big data. In years to come, for a better performance of BFP, we will make further efforts on the theoretical and systematical work about the individual predicting classifier. Besides, we will continue to focus on the combining or integrating method to construct some more excellent predicting models for BFP, especially with a large volume data.
Acknowledgement
This study was supported by the National Natural Science Foundation of China (No. 71801113, 71602077), the MOE (Ministry of Education in China) Project of Humanities and Social Sciences (No. 18YJC630212), and Fundamental Research Funds for the Central Universities (No. JUSRP11764).