2. The Proposed Method
3. Experiments and Analysis
2.1 Gait Silhouettes Extraction
The proposed approach extracts the gait video silhouettes through moving object detection and segmentation from each gait videos. Since there are some influences of external factors in the process of real image acquisition, which can easily cause some problems such as noise and contrast, it is necessary to preprocess the image of gait recognition. Image preprocessing is the precondition of feature extraction and gait recognition. The paper mainly follows the steps to preprocess image.
- Background reconstruction: Since the scene is approximately stationary and the background information is corresponding to the low frequency part in the whole video sequence, the median of the corresponding pixels in the sequence image is used to estimate the static background.
- Moving object detection: Background subtraction is first used to eliminate moving foreground image, which is got by difference operation between the image sequence and the background image, and then the moving shadow of the foreground image is erased by HSV based color model [17]. At last, the binarization image is calculated by threshold value of OTSU algorithm [18].
- Binarization image noise removal and normalization: Since there are some small noises around the human body and background area in the process of threshold segmentation, the median filter method is used to remove the noise and redundant information. In addition, to deal with the problem of the inconsistency of the human silhouette size caused by the change of focal length in the shooting, the object image silhouettes is normalized and the image is scaled to the same size.
- Gait cycle detection: The change of gait is periodic and the center of gravity of the body is constantly changing in the process of walking. The center of gravity does not change with the hands and legs forward movement, but the vertical axis will change periodically. In the paper, the period of the gait is determined according to the change of the coordinate of the center of gravity.
2.2 Improving GEI Processing and Cycle Detection
The gait energy diagram adopts a simple weighted average method to synthesize a periodic gait silhouettes images into one image, which is defined as:
where T is the gait cycle length and [TeX:] $$B_{t}(x, y)$$ is the brightness values of pixel point (x, y) at time t. The brightness value of the background area is 0, and the target area is 255.
Since the method of gait video silhouettes extraction has been discussed in above section, this section will focus on subsequent gait description and identification and propose feature extraction algorithm based on natural video gait cycle. According to human leg characteristics, the key gait of normal walking can be classified three kinds:
State 1: The two legs are kept close together in the same plane of the body and there are three common postures, namely left foot being lifted by the side of the right foot, right foot being lifted by the side of the left foot and two feet normally standing, all together marked as K1;
State 2: The left foot being in front of the right foot, marked as K2;
State 3: The right foot being in front of the left foot, marked as K3.
We define the complete gait cycle as a process of K1→K2→K1→K3→K1 or K1→K3→K1→K2→K1. Fig. 2 shows one example of complete gait cycle of K1->K3->K1->K2->K1. After the segmentation of a complete gait cycle, we will first continue to deal with all subsequent frames, find all complete gait cycles or gaits to evenly partition each gait cycle and extract NF key gait silhouettes images in turn. Then the GEI of a time domain is extracted based on center of each key gait silhouettes image to construct the observed state set. At last, the KFDA is used to reduce the dimensions of the observed state set and corresponding low dimensional eigenvectors would be got, which will be discussed in follow section.
An example of a complete gait cycle.
GEIs of three different conditions at 90º: (a) normal condition, (b) carrying a bag, and (c) with a coat.
From Fig. 3, we can see the outermost outlines of the human body have changed when human is walking at condition of carrying backpacks and other objects, which can lead to poor gait recognition robustness [19]. Since changes on top part of the body is very little and lower part of the body (such as legs ) has obvious change in normal, carrying and clothing conditions, the paper adopts the method in [20] to represent the gait feature by both gait silhouettes image and energy image in the local outline of the legs. Fig. 4 shows energy image in the local region of the legs (box tag).
Supposed using above feature extraction method to extract feature of whole video gait silhouettes image and the corresponding energy image in the local region of the legs for each gait video G, G can be expressed as
where [TeX:] $$F_{1} \text { and } F_{2}$$ is extracted features from whole gait silhouettes images and from the local region of the legs respectively. The [TeX:] $$F_{1} \text { and } F_{2}$$ can be calculated by eigenmatrix mapping method, which was discussed in detail in [20] and we’re not going to analyze it.
Image in the local outline of the legs: (a) normal, (b) carrying a bag, (c) clothing, and (d) walking at night.
2.3 Kernel Fisher Discriminant Analysis
KFDA method is to map the input samples into a high dimensional feature space by a nonlinear mapping and find the projection space, which makes the inter-class scatter matrix biggest and the withinclass scatter matrix smallest. The main idea of LDA algorithm is to find the optimal projection matrix [TeX:] $$W_{o p t}$$ by Fisher criterion, which objective is to determine the discriminant vectors by maximizing interclass scatter matrix [TeX:] $$S_{B}^{\Phi}$$ while minimizing the within-class scatter matrix [TeX:] $$s_{W}^{\Phi}.$$
Suppose there are n samples [TeX:] $$\left\{x_{1}, x_{2}, \ldots, x_{n}\right\}$$ belonging to class [TeX:] $$C\left\{X_{1}, X_{2}, \ldots, X_{c}\right\}.$$ According to Fisher LDA, non-linear function [TeX:] $$\varphi(x)$$ map samples on the feature space F, and get the optimal subspace [TeX:] $$W_{o p t} :$$
Similarly, [TeX:] $$s_{B}^{\Phi} \text { and } s_{W}^{\Phi}$$ represent the inter-class and within-class scatter matrix of feature space F.
where [TeX:] $$\mu_{i}^{\Phi}=\frac{1}{n} \sum_{x_{k} \in X_{i}} \Phi\left(x_{k}\right), u^{\Phi}=\frac{1}{n} \sum_{i=1}^{N} \Phi\left(x_{i}\right), w_{i}$$ is calculated by [TeX:] $$S_{B}^{\Phi} w_{i}=\lambda_{i} S_{W}^{\Phi} w_{i}.$$
In the kernel function theory, [TeX:] $$w_{i} \in | \operatorname{span}\left\{\varphi\left(\mathrm{x}_{1}\right), \varphi\left(\mathrm{x}_{2}\right), \cdots, \varphi\left(\mathrm{x}_{n}\right)\right\}$$ is bound in the mapping function to generate space, which is, [TeX:] $$w_{i} \in \operatorname{span}\left\{\Phi\left(x_{1}\right), \Phi\left(x_{2}\right), \ldots, \Phi\left(x_{n}\right)\right\}$$ can be linear represented as [TeX:] $$w_{i}=\sum_{j=1}^{n} \alpha_{i}^{j} \cdot \Phi\left(x_{j}\right)$$ by train samples. So molecular in (4) can be converted as:
where [TeX:] $$M=\sum_{i=1}^{C}\left(M_{i}-\overline{M}\right)\left(M_{i}-\overline{M}\right)^{T}, \overline{M}_{j}=\frac{1}{n} \sum_{i=1}^{n} k\left(x_{j}, x_{i}\right) \text { and } \alpha_{i}=\left[\alpha_{i}^{1}, \alpha_{i}^{2}, \ldots, \alpha_{i}^{n}\right]^{T}$$
Similarly, the denominator in (3) can be converted as:
where [TeX:] $$L=\sum_{j=1}^{C} K_{j}\left(I-I_{N}\right) K_{j}^{T}, K_{j} \text { is } n \times n_{j}$$ matrix, [TeX:] $$\left(K_{j}\right)_{n m}=k\left(x_{n}, x_{m}\right), x_{m} \in X_{j},$$ I denotes a unit matrix, [TeX:] $$I_{N j}$$ represents a matrix formed by [TeX:] $$1 / n_{i}$$.
Eq. (3) can also be simplified as:
The optimal subspace [TeX:] $$W_{O p t}$$ is:
When given a non-linear mapping function [TeX:] $$\Phi(x),$$ the mapping result of sample to the feature space F is:
KFDA method can convert class X to c - 1 dimension vector, and we can chose the feature vectors corresponding to the first c - 1 eigenvectors to reduce the dimension of feature space and improve the processing speed. Since the dimension of samples usually is small in the recognition process, the number of training samples is smaller than the pixels of an image that may cause inter-class scatter matrix [TeX:] $$S_{W}$$ singular matrix, that is say [TeX:] $$\operatorname{rank}(Q)=\operatorname{rank}\left(S_{w}^{\Phi}\right) \leq n-1,$$ we cannot use generalized eigenvalue equation to solve Rayleigh’s extremum problems. Aiming at this problem, we add [TeX:] $$\mu I$$ (I represents a unit matrix and μ is a coefficient) to the Q matrix, and [TeX:] $$Q \mu=Q+\mu I,$$ which can let Q become a nonsingular and use a generalized eigenvalue equation to solve.
Suppose there are m individuals, each of which correspond the gait sequences of different view angles. The proposed algorithm extracts gait information contained in the each individual’s GEI as an input vector samples. According to KFDA method, assuming that the number of gait sequences is [TeX:] $$m_{i}, i=1,2, \ldots, q,$$ the sample set is [TeX:] $$\left\{x_{1,1}, x_{1,2}, \ldots, x_{1, n 1}, x_{2,1}, \ldots, x_{2, n 2}, \ldots, x_{q, n q}\right\},$$ the input samples is [TeX:] $$n=n_{1}+n_{2}+\ldots+n_{q},$$ if we want to classify the unknown class of gait sequences, the first we should do is to train the known classes of gait, find the optimal feature space [TeX:] $$W_{o p t} \text { and } \alpha_{o p t}.$$ Then the projection on [TeX:] $$\alpha_{o p t}$$ and its projected trajectory are calculated. The detailed gait recognition algorithm is described in Algorithm 1.
The gait recognition algorithm based on combination of KFDA and GCI-GEIs
3.1 Experiment on CASIA Gait Database
The CASIA dataset A [21] is called NLPR gait database before, which contains 20 subjects and each subjects has 12 image sequences and 3 walking direction (0º, 45º, and 90º). There are 4 image sequences in each direction and 2 gait cycles in each sequence. Our experiments select 2 gait sequences and 4 gait cycles for training and another 2 gait sequences and 4 gait cycles for testing. We carried out 6 experiments on CASIA database A and calculate the average.
The CASIA Dataset B is a large multi-view gait dataset contains 124 subjects and data was captured from 11 views varying from 0º to 180º with 18º between each two nearest view directions. There are 10 sequences for each subject, 6 sequences of them for normal walking (normal), 2 for walking with bag (bag) and 2 for walking in coat. Our experiments use 60 subjects at 90º and normal conditions that is each subject has 6 sequences, each sequence contains 2 gait cycles and every sequence have 12 gait cycles. In the experiments, 2 gait sequences and 4 gait cycles are selected for training and another 4 gait sequences and 8 gait cycles for testing. We carried out 15 experiments on CASIA B and calculate the average.
Our main focuses are on feature extraction and identification of human silhouette image. The input samples of experiments are expansion of everyone’s GEI by column vector. The training and testing set are divided with the ratio of 1:1. Then, the KFDA method is applied to train features, get [TeX:] $$\alpha_{o p t} \text { and } W_{o p t}.$$ Finally, the sample vectors are projected on the optimal feature space [TeX:] $$W_{O p t}$$ and the training features Train_Features and testing features Train_Features are got respectively.
Due to the small number of samples, to obtain the unbiased estimation of the correct recognition rate, leave-one-out cross-validation method is adopted to conduct the experiments. The paper adopts cumulative match score (CMS) [22] to evaluate the experiment results and presents recognition rates of Rank 1 and Rank 5. In order to evaluate the GCI-GEI feature extraction effect and the KFDA dimension reduction and classification ability, the recognition rate of our method is compared to other exiting algorithms, and the results are showed in Table 1.
Recognition rate of Rank 1 and Rank 5
As can be seen from Table 1, the average recognition rate of our algorithm is best. We can find that our proposed algorithm can get more than 90% recognition rate in Rank 1 and Rank 5, which testify that the method based on GCI-GEI and KFDA can identify better the different human gait and get better recognition effect than other algorithms in the small sample gait database. Furthermore, the experiment data shows the recognition rate of GEI+LBP+DCV is lower than GEI+PCA, which explains reducing the dimension by PCA will lose important gait recognition information after LBP feature extraction from GEI.
3.2 Experiments on USF Human ID Database
A database consisting of 1,870 sequences of the 122 subjects based the USF Human ID database [23] is divided into 1 set for training and 10 probes labeled from A to J for test, which based on 3 covariates: normal, walking, and carrying condition. Being different from experiment on CASIA, this experiment adopts weighted mean recognition rate to evaluate the experiment results. To demonstrate the advantages of KFDA, we choose the traditional algorithms LDA, PCA and DCV, and manifold earning algorithm LPP for comparison. Table 2 reports the recognition rates of this group of experiments on the 10 probes.
Recognition rates (%) of different algorithms
As can be seen from Table 2, KFDA can achieve higher recognition rate than traditional GEI when using different dimensional reduction algorithms including PCA, LDA, LPP, and DCV. KFDA features present more discriminating power than original features. Comparing with other algorithms, the GCIGEI+ KFDA approaches can improve the recognition rate by about 6%. In addition, since the experiment is based on normal, walking and carrying condition, the experiment results demonstrate the proposed algorithm can eliminate effects of walking and carrying conditions and have great robustness.