1. Introduction
The expansion of power system scale implies the augmentation of non-linear and impact loads, power quality disturbances (PQDs) problems have become increasingly prominent [1,2]. To ensure the safe and stable operation of the power system, the power quality must be detected and analyzed. For this purpose, it is imperative to identify the potential PQDs. The PQDs mainly contain single disturbance and multiple disturbances. The single PQDs mainly include harmonic, transient oscillation, flicker, voltage interruption, etc., and the typical multiple PQDs include harmonic with voltage sag or voltage swell, flicker with voltage swell or voltage sag, etc.
The classification processes of PQDs are composed of feature extraction and classification. In feature extraction process, we use wavelet transform (WT) to decompose the observed PQDs signal. If we directly use the transformation coefficient as feature vector, this will cause the system to learn slowly and the structure is huge. Moreover, the transient energy distribution of the signal after WT is similar and difficult to distinguish. Therefore, we use the normalized transient wavelet energy difference of the disturbance signal and standard signal as the feature vector. Eventually, we import the feature vector into the classifier for classification. The feature vector is more conductive by this method. In the classification process, we use the support vector machine (SVM) to classify the feature vectors that are obtained by the WT method. The kernel function mathematical model of SVM directly affects the classification performance. Different types of kernel function have different generalization ability and learning ability. The single kernel function cannot have better ability both in learning and generalization. Therefore, we propose a hybrid kernel function that is composed of two single kernel functions to improve both the ability in generation and learning. We also obtain the optimal combination coefficient of the two kernel functions through multiple experiments.
2. Related Work
The identification of the PQDs has two processes: feature extraction and classification. The feature extraction methods are typically based on the short time Fourier transform (STFT), Fourier transform [2], wavelet transform [3], and S transform (ST) [4,5]. The Fourier transform, which is most suitable for stationary signals, provides information about the frequency components, but does not reflect when the signal exists and for how long. Most PQDs are non-stationary, thus we need to determine the frequency and the time of occurrence of the disturbance signals. The short time Fourier transform generates information about the frequency as well as the time. However, the width of the window is a constant; hence it cannot accurately describe the characteristics of the disturbance. Different disturbance needs different window width, which poses difficulties in choosing the window width for the STFT. At the same time, the STFT cannot trace the transient and mutation signals. These drawbacks can be overcome by the waveform transform. The WT has higher time resolution and frequency resolution for the low frequency signal and high frequency signal respectively [3]. Consequently, this method can satisfy the requirement of the resolution for classification of the PQD signal. The signal adaptability of the WT makes it precise, meaning it can successfully be used to analyze the non-stationary signals. However, the WT is sensitive to noise. The ST can be viewed as an extension to the WT and the STFT. The ST is a form of multiresolution analysis, which can provide information about the frequency and the phase [5]. However, the resulting ST matrix is redundant. Thus, the computing time of ST is long.
The commonly used classification methods are artificial neural networks (ANN) [6], decision tree [7] and SVM [8-10]. The ANN methods are simple in structure, strong in problem solving, and are defined by large-scale distributed parallel processing, nonlinearity, self-organization and self-learning. However, the algorithm has some disadvantages, such as the local optimization problems and poor convergence. In addition, the training time of this method can be long and over-fitting may occur. Decision tree is meant to simulate human thought, however, the regulation of a decision tree is hard to establish. The SVM methods can effectively solve the nonlinear, finite sample and high dimensional pattern recognition problems. The basic principle of SVM methods is to transform the feature vector that is inseparable in the low-dimensional space to the high-dimensional space and make it be separable in the highdimensional space. Different kernel functions determine different SVM classifiers. The kernel function mathematical model of SVM directly affects the performance of classification, so it is very important to design a suitable kernel function [11].
3. Feature Extraction Using the WT
The WT has a variable time-frequency window, which can be adjusted according to the difference in signal frequency. Furthermore, the WT has good time-frequency local characteristics [12]. We can obtain a series of coefficients corresponding to each scale by multi-scale decomposition of the disturbance signal. These coefficients are the basis of feature extraction using the WT. The input energy is loaded into the wavelet coefficients, as follows:
where, x(t) is the signal to be decomposed, [TeX:] $$\mathrm{C}_{j}(n) \text { and } D_{x}(n)$$ are respectively the approximation coefficient and detail coefficient of wavelet decomposition. The approximate coefficients store the fundamental wave energy, and the detail coefficients store the transient energy. As the PQD occurs, each band energy of the signal changes. Moreover, the transient energy of different PQD signal differs in different frequency bands.
As the Daubechies wavelets (db) offer tight support, are sensitive to irregular signals, and generate an orthogonal analysis, we used the db4 wavelet to decompose the disturbance signal for 8 layers. The relationship between the wavelet coefficients and wavelet energy can be expressed as follows:
where, [TeX:] $$E_{d_{j}} \text { and } d_{j}(n)$$ are respectively the transient energy and detail coefficient of the jth layer. We used the waveform transform method to decompose the signal into 8 layers, and obtained the transient energy Ei of the signal. The relatively small differences in transient energy distribution of each disturbance negatively impacted the classification. Consequently, we used transient energy difference between the disturbance signal Ei and standard signal Eref to construct feature vector X, which is viewed as the subsequent input vector of the SVM.
where [TeX:] $$E_{i}^{*}=E_{i}-E_{ref}$$.
4. Support Vector Machine
The main principle of SVM classification is that transforming the inseparable data in low-dimensional space to high-dimensional separable space by utilizing a kernel function. The mapped data can be separated in the new space. Moreover, the SVM designs a classification hyper-plane as a decision plane to realize the classification. Since the optimal solution of the SVM is based on the idea of minimizing structural risk, it has stronger generalization ability than the method of nonlinear-function approximation.
We set the training set as [TeX:] $$D=\left\{\left(x_{i}, y_{j}\right), x_{i} \in R^{n}, y_{j} \in\{+1,-1\}, i=1,2, \ldots, N\right\}$$, where [TeX:] $$x_{i} \text { and } y_{j} \in\{+1,-1\}$$ are the training vector set and label vector set, respectively. The function of the optimal separating hyperplane is expressed as follows:
When the yi can meet the following condition
The classification hyper-plane can makes the classification interval maximum and successfully separate the feature vectors. The maximum classification interval is
A bigger the maximum classification interval means that the SVM will have a better performance of classification. Therefore, maximizing the classification interval can be converted into mining ||w||. There, the cost function is expressed as:
SVM algorithm uses the Lagrange method to solve the above problem. The Lagrange function is:
where, the Lagrange multiplier [TeX:] $$\alpha_{i}$$ is a non-negative value. The derivatives of w and b in (8) are expressed as:
The Lagrange function in (8) is transformed into the problem about a by introducing (9) into (8). That is
The optimal solution can be obtained by using sequential minimal optimization method, and expressed as [TeX:] $$\alpha^*$$. Based on the optimal solution [TeX:] $$\alpha^*$$, we can get the optimal solutions of w and b.
The training set is named support vector when the optimal solution [TeX:] $$\alpha^*$$ is not zero. Therefore, the classification function is expressed as
where, sgn() is a sign function, n is the number of the support vector. The nonlinear function [TeX:] $$K\left(x_{i}, x_{j}\right)$$ named kernel function in SVM is used to replace [TeX:] $$\left(x_{i}, x_{j}\right)$$ in (13) to transform the feature vectors from inseparable space to separable space. Therefore, the classification function mathematical expression based on kernel function is as follows:
5. Improved SVM Based on a Hybrid Kernel Function
The kernel function mathematical model of SVM directly affects the classification performance. Different types of kernel functions have different generalization ability and learning ability. The kernel function transforms the inseparable data in low-dimensional space to high-dimensional separable space. At the same time, it does not increase the computation complexity and running time. The typical kernel functions are as follows:
(1) Gaussian kernel
(2) Polynomial kernel
(3) Sigmoid kernel
where p, d, v, and r are real constants.
Different types of kernel functions have different generalization ability and learning ability. Fig. 1 shows the curve of the radial basis function (RBF), when the parameter p respectively equals 0.1, 0.2, 0.3, 0.4, 0.5, with 0.2 as the test point. As shown in Fig. 1, when the input data were near the test point, the value of the kernel changed significantly, indicating that the Gaussian kernel function is a child of the local kernel functions.
Radial basis kernel function.
Fig. 2 shows the curve of the polynomial kernel, when the parameter d respectively equals 1, 2, 3, 4, 5, with 0.2 as the test point. As evidenced by Fig. 2, when the input data were far from the test point, the value of the kernel changed significantly, indicating that the polynomial kernel is the global kernel function.
The [TeX:] $$K_{1}, K_{2}$$ are kernel functions, set [TeX:] $$\lambda$$ is constant and [TeX:] $$\lambda \geq 0$$, the kernel functions are constructed according to the following formulas
Polynomial kernel function.
The hybrid kernel function, which is constructed by simple, single kernel functions, still satisfies Mercer’s theorem of the kernel function.
For [TeX:] $$K_{1} , K_{2}$$ defined on a limited set of points [TeX:] $$\left\{x_{1}, K, x_{n}\right\}$$ for any vector [TeX:] $$\alpha \in R^{n}, K$$, is a positive, semidefinite matrix, for which all [TeX:] $$\alpha$$ must satisfy [TeX:] $$\alpha K \alpha \geq 0$$ as the necessary and sufficient condition. [TeX:] $$\alpha\left(K_{1}+K_{2}\right) \alpha=\alpha K_{1} \alpha+\alpha K_{2} \alpha \geq 0$$, where the sum of [TeX:] $$K_{1} \text { and } K_{2}$$ is positive semi-definite matrix, and the kernel function is [TeX:] $$K(x, z)=K_{1}(x, z)+K_{2}(x, z)$$. Therefore, it can conclude that it is still a kernel function for [TeX:] $$K=\lambda K_{1}+(1-\lambda) K_{2}, \lambda \in(0,1)$$.
Although the polynomial kernel and the RBF kernel are typical kernel functions of SVM, both of them have their own limitations. Any single kernel function cannot have better ability both in learning and generalization. To overcome the problem, we use two kernel functions to construct a new mixed kernel function that is suitable for the classification of PQDs. According to the constitute conditions of the kernel function, the sum of two kernel functions is still a kernel function. Therefore, we mixed the polynomial kernel and the RBF kernel to construct a new kernel function of SVM to make the SVM with the new kernel function have better performance in classification of PQDs. The proposed kernel function of SVM can be expressed as:
where [TeX:] $$\lambda_{1}, \lambda_{2}$$ are the proportionality coefficients of the two kernel functions, their ranges are from zero to one, and their sum is one.
Figs. 3 and 4 are the output diagrams of the proposed kernel function with different test points. As evidenced by Figs. 3 and 4, the new kernel function not only highlights the local characteristics of data near the test point, but also retains the global characteristics of data far from the test point. Through a series of adjustments of the coefficients, we found that, when [TeX:] $$\lambda_{1}=\lambda_{2}=0.5$$, the proposed kernel function has better performance.
The procedure of the proposed method is as follows:
1) Using the wavelet transform to decompose the PQD signal
2) Extracting the wavelet energy difference between the disturbance and standard signal
3) Normalizing the wavelet energy difference, and using it as the feature vector
4) Identifying the disturbance signals using the SVM based on the proposed hybrid kernel function
Hybrid kernel function with p=0.1.
Hybrid kernel function with p=0.3.
6. Simulation and Analysis
In this paper, we chose seven types of single PQDs (i.e., harmonic, voltage sag, flicker, voltage swell, transient oscillation, transient pulse, and voltage interruption) and four types of multiple PQDs (i.e., flicker/harmonic with voltage swell/voltage sag) to simulate. For each disturbance signal, 200 samples were selected. The mathematical model is shown in reference [12]. The fundamental frequency of the signal was 50HZ. There were 1281 samples points per cycle. The length of the signal is ten cycles. In order to simulate the real PQD signal, we add 15db of Gaussian white noise to each disturbance signal. As already mentioned, the process of PQD identification contains feature extraction and classification. Firstly, the PQD signal is decomposed for 8 layers by using the db4 wavelet as the mother wavelet, then, the wavelet coefficients of each layer were extracted. It is observed that the trends of the wavelet coefficients of the signals was roughly the same and difficult to distinguish. Based on Parseval’s theorem, we firstly compute the energy differences between the standard signal and PQDs signal, and use the obtained eight energy differences as the characteristic vector. Secondly, we normalize the vector to construct the feature vector. In the end, SVM algorithms with different kernel functions are used to classify the constructed feature vector.
The number of samples are 200, and 80 are selected as training samples to obtain the mathematical model of the decision function in the SVM algorithm. We use the remaining 120 samples as test samples to test the classification accuracy. The classification accuracies of the eleven PQD signals for the SVM classifier with the polynomial kernel, the proposed hybrid kernel function, the RBF kernel and the other method in [11,13] are shown in Table 1.
From Table 1, it can be seen that classification accuracy of SVM algorithm with proposed hybrid kernel is higher than the other classification methods. For voltage swell, the proposed SVM algorithm has the same classification accuracy with the two methods in [11,13], which is higher than the SVM algorithms based on polynomial kernel and RBF kernel. For voltage interruption and voltage sag, the proposed method has lower accuracy than the methods in [11,13], which is higher than the others. For harmonic and transient oscillation, all the methods are the same, as the accuracy is 100%. For transient pulse, voltage flicker, voltage swell with harmonic and voltage sag with harmonic, the classification accuracy of improved SVM algorithm is higher than the other SVM algorithms. For voltage sag with harmonic, the classification accuracy of proposed SVM method is higher than the other classification methods except for the SVM algorithm with radial basis kernel function. For voltage swell with flicker, the proposed method has the same accuracy with the method in [11], which is higher than the other three methods. For voltage sag with flicker, the classification accuracy of proposed SVM algorithm is lower than the other SVM algorithm. However, the average accuracy of the SVM algorithm with the proposed hybrid kernel function is higher than other algorithms. The classification accuracies of voltage sag and voltage interruption are both not high. The amplitude is also reduced in the case of voltage interruption, which is different from the sag. This suggests that there is a certain degree of interference in the PQDs classification, meaning that the proposed SVM method has some limitations. Future work should be aimed at improving the classification threshold, which would, in turn, improve the overall classification performance of the proposed SVM algorithm. We mainly focus on SVM algorithm so we did not compare the other intelligence algorithms [14-16] that can be used to optimize the SVM and classify quality disturbances.
Classification accuracies of the single and multiple power quality disturbances
The average classification accuracies that are obtained by using the SVM algorithms with different kernel functions are shown in Fig. 5. From this figure, we can see that the average classification accuracy that is obtained by using the SVM algorithm with the proposed kernel function is higher than the SVM algorithms with the other kernel function.
Comparison of average classification accuracy for different methods.
7. Conclusion
In feature extraction, this paper used the wavelet energy difference between the standard signal and the PQD signal to construct the feature vector. In feature vector classification, this paper used the improved SVM with proposed hybrid kernel function to improve the classification accuracy of PQDs. The proposed SVM method has higher generalization and learning abilities than the others, and its classification accuracy is greatly improved.
Acknowledgement
This paper is supported by the Foundation of Jilin Educational Committee (Grant No. 2015235).