1. Introduction
Face recognition has been a hotspot in both biometric recognition and commercial activity in the past few years due to its friendliness and convenience [1]. Under constrained circumstances with frontal face images, most previous studies can achieve satisfying recognition rate [2,3]. However for practical application utilizing face recognition, many unconstrained factors occur, such as occlusion, pose variation, complex background as well as illumination variation which is really common and inevitable. It can cause a great change of the grayscale distribution on the candidate face images to change greatly due to the different intensity and incidence angle of the ambient lighting [4,5].
In order to decrease the impact of illumination variation on face recognition, two categories of solution are normally studied. One is conducting illumination preprocessing, the other is extracting effective face expression that is irrelevant to illumination change. However most preprocessing method will remove useful information to some extent while eliminating the effect of illumination change [6]. Therefore, effective face expression that is irrelevant to illumination change has been a challenging issue and attracted lots of attention. For the last few years, many face feature expression methods have been studied. Local binary pattern (LBP) [7-10] is an effective local texture descriptor. It is widely used in face recognition because of its advantages in image texture description. But its dimension is relatively high, and the excessively detailed description makes it sensitive to noise. On the basis of LBP, a modified central symmetric local binary pattern (CSLBP) descriptor is proposed [11-13]. Due to the center symmetric principle CSLBP adopted, its dimension is much lower than LBP and it is relatively robust to noise. However for the face images with severe illumination effects, it still cannot achieve satisfying result.
Hence, many researchers aim to explore deeper and more robust feature representation method on the basic of these shallow layer features. Deep learning seems to be a feasible way [14,15]. Deep learning simulates the organizational structure of human brain. It can obtain more precise and efficient highlevel feature representation by combining low-level features [16-18]. The feature extraction process is automatic without artificial interference. However the deep learning method might ignore the local features if the input of the multi-layer net is pixel-level image. Liang and Zhang etc. propose to use LBP feature as the input of the deep learning network [19-21]. It improves the performance of LBP and deep learning algorithm respectively. However for severe illumination effect, its result still cannot meet the requirement of practical application.
Therefore, this paper presents an effective and feasible way to extract the robust deep features of face images under severe illumination conditions. It is on the basis of combining the enhanced CSLBP (ECSLBP) with the Deep Belief Network (DBN) [22-24], which is an effective deep learning network. This paper is organized as follows: in Section 2 we describe the proposed ECSLBP descriptor and the construction of DBN combined with ECSLBP. The experimental results are given in Section 3, massive experimental results are illustrated and discussed to show the validity of the proposed algorithm. And the conclusions are drawn in Section 4.
2. Technical Approach
2.1 Center Symmetric Local Binary Pattern
Encoding process of the CSLBP descriptor.
Original local binary pattern proposed by Heikkila has been proven to be effective for local texture feature description. However features extracted by LBP are too detailed with high dimension, which lead to high computational complexity and unrobustness to noise. To improve its performance, central symmetric local binary pattern [12] is generated from LBP. It applies the center symmetric encode principle to describe each pixel of the image. By comparing the gray value transformation among symmetrical pixel pairs around the center pixel, the texture feature is well extracted.
As shown in Fig. 1, pi denotes K pixels around the center pixel, and g(x) is calculated as the follow equation:
where T is a threshold representing the image intensity variation. CSLBP has lower computational complexity due to its calculation rule, and it is more robust to some extend compared with the original LBP [12,13].
2.2 Enhance Center Symmetric Local Binary Pattern
However the performance of CSLBP on images under severe illumination variation is still not satisfying. To compensate this shortage, an improved CSLBP is presented in this section. It is based on the combination of wavelet transform and CSLBP. The procedure will be detailed in the following part. The wavelet decomposition is conducted on the face image to acquire the corresponding low frequency and high frequency component images [25]. The decomposition results of the face images are shown as follows:
Wavelet decomposition of images under normal illumination (a) and severe illumination variation (b).
As shown in Fig. 2(a), 4 sub-band images are generated by wavelet decomposition of the twodimensional image J*. They are low frequency sub-band image, also known as approximate component image A which containing the main information of the image, as well as three high frequency sub-band image H, V and D which respectively reflecting the horizontal, vertical and diagonal direction information of the image. As shown in Fig. 2(b), for the image under severe illumination variation, such as very dark environment, the result of wavelet decomposition reflects very less useful information of the image compared with image under good illumination condition.
Hence, a nonlinear grayscale enhancement transform is introduced into the wavelet decomposition as a pre-procedure. After the nonlinear grayscale enhancement, wavelet decomposition result on the very dark image achieves great progress even compared with the images under normal illumination condition. The results of the wavelet decomposition are shown in Fig. 3.
After acquiring the component images, the wavelet fusion and encoding with the center symmetric encode principle is conducted. The process is illustrated with the following Fig. 4.
Wavelet decomposition after nonlinear grayscale enhancement.
As shown in Fig. 4, only the vertical and horizontal component images are reserved to construct the improved descriptor. The vertical and horizontal component images are fused according to the fusion rules illustrated in Fig. 4. Then the center symmetric encode principle is applied to describe the fusion data. The improved feature extraction method is named as enhanced central symmetric local binary pattern. The extracted features can describe the images robustly since the combination of wavelet fusion and center symmetric encode principle not only suppress the influence of severe illumination variation but also enhance the effective information for identification.
2.3 Deep Belief Network
Deep learning architecture is a non-supervised neural network consist of multi-layers. The output of the former layer are usually set as the input of the latter layer. The learning aim is to make the original input information and the final output information as similar as possible by constructing the network architecture and training the parameters. Some typical deep architectures have been proposed, such as DBN [23,24], Convolution Neural Network (CNN) [26], and so on. The DBN consists of a number of unsupervised Restricted Boltzmann Machines (RBM). And each RBM contains a visual layer and a hidden layer. A three-layer DBN model is demonstrated in Fig. 5.
DBN structure of three-layer RBM model.
As shown, v represents the visible layer, hi (i = 1, 2,...) represents the hidden layer. By training and expressing the input data through the multi-layer network, essential features that reveal hidden information and high-order correlations of data can be extracted.
For a DBN with c layer, the joint probability distribution between the visual unit and the hidden unit can be represent with the following equation:
Among them, v=h0 is the visual unit of DBN, h is the hidden unit, as well ashi (i = 1, 2,...) is the i-th hidden unit.
Two adjacent hidden units hi and hi+1 should satisfy the following formulas:
where [TeX:] $$F \left( h ^ { i } , h ^ { i + 1 } \right)$$ represent a RBM model, [TeX:] $$b _ { k } ^ { i }$$ represents the i-th layer bias, [TeX:] $$W _ { k u } ^ { i }$$ denotes the weight between the i-th layer and the (i+1)th layer.
For each RBM model [TeX:] $$F \left( h ^ { i } , h ^ { i + 1 } \right)$$, its energy function can be expressed as:
where [TeX:] $$\varphi = \left\{ W _ { i j } , c _ { i } ^ { \prime } , c _ { j } ^ { * } \right\}$$ is the parameters of RBM, [TeX:] $$W _ { i j }$$ is the weight parameters between visible units i and hidden j. [TeX:] $$C _ { i } ^ { \prime } \text { and } c _ { j } ^ { * }$$ denote the visible unit bias and hidden unit bias respectively. n and m are the numbers of visible units and hidden units.
Training the DBN model consists of pre-training and fine-tuning procedure. Pre-training includes training each RBM layer by layer. The unsupervised greedy algorithm can be adopted to train each RBM [20], during which the learned features of one RBM are put into the next RBM as the input ‘data’. After finishing the pre-training procedure and constructing the DBN, the back propagation algorithm is used to optimize the whole DBN, and obtain the final DBN.
2.4 Face Recognition Based on ECSLBP and DBN
However, when pixel level images are import to DBN directly, the recognition performance normally declines since DBN usually ignores the local features of the images. Especially for the face images under complex illumination environment, the computational complexity will increase due to information redundancy, the robustness will also be weaken by the noise. Hence, a novel face recognition method based on the combination of local texture feature and DBN is proposed in this paper. Considering that original LBP has a relatively high dimension and is sensitive to the illumination variation, the presented ECSLBP is adopted to extract the local texture feature. Then the extracted features are used to construct and train the DBN, which can enable the multi-layer network to be more efficient at learning and extracting deep features and improve the recognition rate spontaneously.
It is given that the characteristics of ECSLBP have excellent ability to describe local texture features and have better robustness to severe illumination changes. In order to excavate the deep features of sample images more effectively, local features are used to initial the DBN, including pre-training and fine-tuning in this paper.
Pre-training: The main purpose of pre-training is to obtain the network parameters of each RBM from layer to layer. In this paper, ECSLBP features are imported to the bottom visible layer to initialize the DBN. The weight parameter of the first layer W1 and the hidden layer h1 can be trained through contrastive divergence algorithm. After obtaining the first layer’s network parameters, it is imported to the second layer as the input data. By this way, the whole network parameters of each RBM can be obtained layer by layer. In this paper, the layer number of the DBN is three.
Fine-tuning: After pre-training the multiple layers DBN, some tagged training data are imported to the constructed DBN in the pre-training step, and the back propagation algorithm is used to optimize the network parameters.
The algorithm framework is shown in Fig. 6.
Illustration of the DBN framework.
3. Experimental Results
In order to verify the effectiveness of the proposed algorithm under severe illumination conditions, the Extended Yale-B and CMU-PIE face databases are used in the following experiments, both of which contain face images captured under severe illumination variation.
3.1 CMU-PIE
The CMU-PIE database contains face images of 68 people with obvious illumination variation, slight pose and expression variation. In this experiment, a total of 200 sample images with severe illumination are selected from CMU-PIE database to test the effectiveness of the proposed algorithm. These images are from 20 persons. And for each person there are 10 kinds of face images with different illumination. We use the entire images with background for recognition without face detection and segmentation. It is more challenging using the entire images with background than pure face images. The reason is that we aim to verify the advantage and robustness of the proposed method for face recognition with background. Sample images utilized in this experiment are shown in Fig. 7.
Examples of CMU-PIE face image.
As shown in Fig. 7, face images of one person appear great variations, such as uneven grayscale distribution, severe shadow or reflection due to the change of the ambient light and different background. The first seven images are used to form the training subset, and the remaining three are used to form the test subset. The number of iterations is 50 times as well as the DBN parameters are 130-100-50. To further verify the superiority of the proposed algorithm based on combination of ECSLBP and DBN, other typical algorithms are conducted on this database too. The experimental results are shown as follows.
Recognition rate comparison on CMU-PIE database.
The comparison between the proposed methods, DBN, LBP combined with DBN and CSLBP combined with DBN are shown in Fig. 8. It can be seen that the rank1 recognition rate of DBN is 73.33%, the rank1 recognition rate of LBP+DBN is 71.67%, the rank1 recognition rate of CSLBP+DBN is 80%, and the rank1 recognition rate of the proposed ECSLBP combined with DBN method is 95%, which is at least 15% higher than the other three methods. It shows that compared with the three existing algorithms, our proposed method is more robust to illumination variation and can improve the recognition performance effectively.
3.2 Extended Yale-B
The Extended Yale-B face database is the mostly commonly used database for testing the robustness of the algorithm under severe illumination variation. It is established by the Yale University Computing Visual and Control Center. 640 face images of Extended Yale-B are used in this experiment. To clarify the effectiveness of the proposed method under different illumination condition, these face images are divided into 5 subsets according to its lighting condition (subset 1 has 7 images, subset 2 has 12 images, subset 3 has 12 images, subset 4 has 14 images, and subset 5 has 19 images). When constructing the deep belief network, the number of iterations is 50 times as well as the network structure is 50-30-20. In this part two experimental schemes are designed to further demonstrate the effectiveness of the proposed method.
Experiment One
In the first experiment, select one image (the first of each subset) from each subset to form the training set, and the rest images of each set are used as the test subset respectively, which means that there are one training set and 5 test sets. The examples of the training set and test sets are shown in Fig. 9. To further illustrate the effectiveness of the proposed method, comparison with other algorithms are also conducted. The 5 test sets experimental results are shown in Fig. 10.
First experiment scheme on the Extended Yale-B database.
Results comparison of the first experiment on Extended Yale-B database.
It can be seen from the experimental results, for the first and second subsets, the recognition rate of the proposed method and CSLBP+DBN are almost equal, which are both higher than LBP+DBN and DBN algorithm. With the illumination condition becoming worse, the algorithm ECSLBP combined with DBN achieve much higher recognition rate on subset 3, 4 and 5. Especially on subset 5, although its recognition rate has declined to nearly 70% compared with the ideal illumination condition, but the recognition rates of the other three algorithms face a sharper drop, which are at least 40% lower than our proposed method. Meaning that the proposed method can adapt to varying illumination compared with other three methods, especially when the face images are under really bad illumination condition, the proposed method shows significant advantage.
Experiment Two
Since that for practical applications, the registered face images are usually under relatively ideal lighting condition, while face recognition system normally encounters face images under various lighting conditions during the recognition phase. In the second experiment, to verify the effectiveness of the proposed method on practical application, subset 1 under relatively ideal light condition is selected as the training set, while the rest subsets 2–4 are used as the test subsets. The experimental schemes are shown in Fig. 11 and the experimental result are shown in Fig. 12.
Second experiment scheme on the Extended Yale-B database.
Results comparison of the second experiment on Extended Yale B database.
It can be seen from Fig. 12, for the second subsets, the recognition rate of the proposed method, CSLBP+DBN and LBP+DBN are all 100%, since the illumination condition of subset 2 is very similar to subset 1. For subset 3, the algorithm ECSLBP combined with DBN achieve 94.17% recognition rate, while the recognition rate of the other three algorithm drops to 88.33%, 70.83%, and 40% respectively. With the illumination condition becoming much worse, although the recognition rate of the proposed method also drops, it is still much more robust compared with the other three algorithms which all drop to 20%–30%. This experimental results show that out proposed method is much more robust to illumination variation and can satisfy the demand of practical face recognition application.
4. Conclusion
Face recognition performance normally declines significantly due to the inevitable illumination variation. Aiming at this problem, we propose a novel effective and feasible identification method in this paper. Firstly, image fusion based on wavelet decomposition is conducted to eliminate the useless information caused by illumination variation. Then, inspired by the CSLBP, an ECSLBP is obtained by applying the central symmetric encode principle on the fused component images to extract the relative robust features of face image. On the basis of which, the ECSLBP and DBN are combined to compensate the shortage that DBN usually ignores the local information and to extract discriminative and illumination variation robust features. The effectiveness of the proposed method is verified by extensive experiments on both CMU-PIE and Extended Yale-B databases. Two test schemes are designed to further testify the advantages of the proposed method against the DBN, LBP combined with DBN and CSLBP combined with DBN algorithms. The experimental results show that our proposed method makes great improvement on the recognition rate compared with other three methods. Especially for face images with really bad lighting condition, the advantage of our proposed method is significant.
Acknowledgement
This work is supported by the National Key R&D Program of China under Grant 2017YFB0802300, the National Natural Science Foundation of China under Grant No. 61503005 and Research Project of Beijing Municipal Education Commission under Grant No. SQKM201810009005.