4. Experimental Results and Discussions
The features are extracted using MFCC and LPCC techniques. The features are in the dimension of 13 and 39. The speakers are modeled using GMM and GMM-UBM techniques. The ideal speaker verification system should accept all the true speakers and reject all the false speakers [4] . Speaker verification performance can be measured in terms of equal error rate (EER). It is an operating point where the false rejection rate (FRR) equals the false acceptance rate (FAR) [31] . NIST-2003 database is used to test trained model.
In the 1st experiment, the features are in the dimension of 13 extracted using MFCC and LPCC techniques and modeling is done using GMM is shown in Fig. 2(a) and (b), respectively. The analysis technique used is SFSR. The experiment is conducted for 3, 4, 5, 6, 9 and 12 sec data for different Gaussian mixtures. Further, the minimum EER is 45.17%, 44.21%, 42.36%, 41.59%, 38.25%, and 36.68% for 3, 4, 5, 6, 9, and 12 seconds, respectively. It was observed that, the minimum EER 36.68% is obtained for 12 seconds data for Gaussian mixtures of 16 compared to remaining data sizes. The average EER can be calculated by considering minimum EER obtained for all the data size of different Gaussian mixtures. The average EER of MFCC-SFSR is 41.39%.
In LPCC-SFSR, the minimum EER is obtained for 3, 4, 5, 6, 9, and 12 seconds data is 43.08%, 41.41%, 39.97%, 38.7%, 31.34%, and 28.18% respectively for different Gaussian mixtures. Further, among these different data size, the least minimum EER is for 12 seconds data. The average EER in case of LPCCSFSR is 37.21%. The average EER of LPCC-SFSR is less compared to MFCC-SFSR analysis. When compared the EER of both analysis, LPCC-SFSR is 4.18% lesser than MFCC-SFSR. Further comparing both MFCC-SFSR and LPCC-SFSR for all data size, minimum EER obtained for 12 seconds data. In limited data both train/test data size is limited to less than or equal to 15 seconds. Therefore remaining experimental results are analyzed only for 12 seconds data. In SFSR analysis both FS and FR are fixed and available train/test data are also limited. Because of this, extracted features are also less in numbers. This will not create good speaker modeling and also speaker testing may not occur accurately.
To overcome this problem, we need to increase the features vectors. This can be done by using MFR, MFS, and MFSR analysis techniques.
Performance of SFSR using (a) MFCC and (b) LPCC features and GMM modeling.
Performance comparison of speaker verification system using (a) MFCC-MFR (b) MFCC-MFS, and (c) MFCC-MFSR and for GMM modeling.
In the second experiment, MFR along with MFS and MFSR analysis techniques are analyzed with the help of 13 dimensional MFCC features and results are plotted in Fig. 3(a)–(c), respectively.
The modeling is done using GMM. In case of MFCC-MFR, the minimum EER of 39.7% is obtained for 12 seconds data for Gaussian mixtures of 64 compared to other data sizes. The average EER in case of MFCC-MFR is 42.82%. Further, it can be observed that compared to the average EER of MFCC-MFR and MFCC-SFSR, the MFCC-MFR is 1.44% higher than MFCC-SFSR. The experimental results indicate that there is no progress in the performance.
In MFCC-MFS, the minimum EER is 36.04% for Gaussian mixture of 32 for 12 seconds data as compared with remaining data sizes. The average EER is 40.79%. MFCC-MFS performance is better as compared to MFCC-MFR for all data sizes. Compare to the average EER of MFCC-MFS with MFCCMFR, MFCC-MFS is having 2.04% lower in EER than MFCC-MFR. This is because the magnitude spectra and feature vectors extracted from speech data for different FS are different due to different frequency resolution [10].
The MFCC-MFSR gives minimum EER of 35.9% is obtained for 12 seconds for Gaussian mixture of 32 compared with remaining data sizes. The average EER is 40.38%. Compared to the average EER of MFCC-MFSR with MFCC-MFR and MFCC-MFS is 2.45% and 0.41% less in EER, respectively. This is because MFSR is the combination of both MFS and MFR. Further, MFSR is having more feature vectors compared to MFS and MSR.
In third experiment, 13 dimension LPCC features are extracted using MFR, MFS, and MFSR analysis techniques and GMM modeling is used to get the speaker models and the results are shown in Fig. 4(a)–(c), respectively. The results show that minimum EER of LPCC-MFR is 27.95% which is obtained for 12 seconds data for Gaussian mixture of 16 compared with all data sizes. The average EER is 37.14%. Further, when we compared EER of both analysis, it was observed that LPCC-MFR is 0.7% lesser than LPCC-SFSR.
In LPCC-MFS analysis, minimum value of EER is 27.64% obtained for 12 seconds train/test data for Gaussian mixture of 16 compared with remaining data sizes. The average EER is 36.33%. Further, it can be observed that the average EER of LPCC-MFS is 0.81% less as compared to LPCC-MFR. Also for all the data sizes LPCC-MFS has lesser EER than LPCC-MFR.
In LPCC-MFSR analysis, there is considerable improvement in the EER as compared to LPCC-MFS and LPCC-MFR. The least EER in case of LPCC-MFSR is 27.5% for Gaussian mixture of 16 for 12 seconds data size. The average EER of LPCC-MFSR is 35.95%. The LPCC-MFSR is 1.19% and 0.38% lesser in average EER as compared with LPCC-MFR and LPCC-MFS, respectively. Other than average EER in LPCC-MFSR, the individual EER for all data sizes are also substantially lesser than that of LPCC-MFR and LPCC-MFS.
Performance comparison of speaker verification system using (a) LPCC-MFR, (b) LPCC-MFS, and (c) LPCC-MFSR for GMM modeling.
From these experimental study, we noticed that when train/test both data are small, MFSR analysis technique improves the verification performance as compared to SFSR in both feature extraction methods. Further, LPCC-MFSR provides an average EER of 4.43% less as compared with EER of MFCC-MFSR.
To study the significance of GMM-UBM modeling, we conducted experiments using GMM-UBM modeling. In GMM-UBM, UBM is usually constructed from large number of speakers’ data and UBM is trained using EM algorithm. The speaker dependent model can be created by performing MAP adaptation technique [25]. UBM should contain equal number of male and female speakers. The total duration of male and female speakers is 1,506 seconds each. We also used NIST-2003 database for training the UBM. In this experiment also features are extracted using LPCC and MFCC by considering different speech analysis technique including MFR, MFS and MFSR.
Fig. 5(a) and (b) show the experimental results of SFSR analysis in case of MFCC and LPCC, respectively. The GMM-UBM modeling is used. The minimum EER in case of MFCC-SFSR is 27.28% obtained for 12 seconds data for Gaussian mixtures of 128 compared to remaining data sizes. The average EER of MFCC-SFSR is 35.12%.
In LPCC-SFSR, the least EER is obtained for 12 seconds data for Gaussian mixture of 32 is 26.91% compared with different data sizes. The average EER in case of LPCC-SFSR is 34.19%. The average EER of LPCC-SFSR is minimum as compared with MFCC-SFSR analysis. When we compare EER of LPCCSFSR with MFCC-SFSR, LPCC-SFSR is 0.93% lower than MFCC-SFSR.
Performance of SFSR using (a) MFCC and (b) LPCC features and GMM-UBM modeling.
Performance comparison of speaker verification system using (a) MFCC-MFR, (b) MFCC-MFS, and (c) MFCC-MFSR for GMM-UBM modeling.
The results of analysis techniques MFR, MFS and MFSR are shown in Fig. 6(a)–(c), respectively. The features used are MFCC and modeling technique used is GMM-UBM. In case of MFCC-MFR, the least EER is 26.1% obtained for 12 seconds data for Gaussian mixtures of 128 compared to other data sizes. The average EER in case of MFCC-MFR is 34.29%. Further, it can be observed that compare the EER of MFCC-MFR and MFCC-SFSR, MFCC-MFR is 0.83% lower than MFCC-SFSR.
In case of MFCC-MFS, the minimum EER is 25.38% for Gaussian mixture of 128 for 12 seconds data as compared to other data sizes. The average EER is 33.58%. It was observed that, MFCC-MFS performance is better as compared to MFCC-MFR for all data sizes. When we the compare the average EER of MFCC-MFS with MFCC-MFR, MFCC-MFS is having 0.71% lesser than MFCC-MFR.
The MFCC-MFSR gives minimum EER of 25.57% is obtained for 12 seconds for Gaussian mixture of 256. The average EER is 33.42%. Average EER of MFCC-MFSR when compared with MFCC-MFR and MFCC-MFS is 0.87% and 0.16% lesser in EER, respectively.
The LPCC features are extracted using MFR, MFS, and MFSR analysis techniques and results are shown in Fig. 7(a)–(c), respectively. The modeling technique used is GMM-UBM. It shows that least EER of LPCC-MFR is 26.12% which is obtained for 12 seconds data for Gaussian mixture of 64 compared with all data sizes. The average EER is 34.02%. Further, the average EER of LPCC-MFR when compared with LPCC-SFSR, the LPCC-MFR average EER is 0.7% which is lesser than LPCC-SFSR.
In case of LPCC-MFS analysis, the least EER of 26.64% obtained for 12 seconds training and testing data for Gaussian mixture of 64 and average EER is 33.48%. Further, the average EER of LPCC-MFS is 0.10% which is less compared with LPCC-MFR. Also for all the data sizes LPCC-MFS has lesser EER than LPCC-MFR.
In case of LPCC-MFSR methods there is considerable improvement in the EER as compared to LPCC-MFS and LPCC-MFR. The least EER in case of LPCC-MFSR is 26.0% for Gaussian mixture of 128 for 12 seconds data size. The average EER of LPCC-MFSR is 33.39%. If we compare the average EER of LPCC-MFSR with LPCC-MFR and LPCC-MFS, LPCC-MFSR is 0.63% and 0.29% lesser in EER as compared with LPCC-MFR and LPCC-MFS, respectively. Other than average reduction in LPCCMFSR, the individual EER for all the data sizes are also substantially lesser than that of LPCC-MFR and LPCC-MFS.
Performance comparison of speaker verification system using (a) LPCC-MFR, (b) LPCC-MFS, and (c) LPCC-MFSR for GMM-UBM modeling.
Table 1 represents the minimum and average EER of MFCC and LPCC analysis techniques. It was observed in Table 1 that the LPCC analysis gives better performance in terms of minimum and average EER which is less as compared to MFCC analysis by using GMM as a modeling technique. The minimum EER of LPCC-SFSR, LPCC-MFR, LPCC-MFS and LPCC-MFSR is 8.5%, 11.75%, 8.4%, and 8.4% less as compared with MFCC-SFSR, MFCC-MFR, MFCC-MFS, and MFCC-MFSR, respectively and the average EER of LPCC-SFSR, LPCC-MFR, LPCC-MFS and LPCC-MFSR is 4.18%, 5.62%, 4.46%, and 4.43% less as compared with MFCC-SFSR, MFCC-MFR, MFCC-MFS, and MFCC-MFSR, respectively, using GMM modeling.
Comparison of average EER (%) of GMM and GMM-UBM modeling for 13 dimensions feature using SFSR and MFSR analysis techniques
Another interesting point observed in Figs. 6 and 7 for GMM-UBM modeling is that, LPCC based MFR, MFS, and MFSR are having lesser EER as compared with MFCC based MFR, MFS, and MFSR for 3, 4, 5, and 6 seconds data. Further, if the train/test speech data are increased to 9 and 12 seconds, MFCC based MFR, MFS, and MFSR are having minimum EER compared to LPCC based MFR, MFS, and MFSR. From this experiment, we observed that when both training and testing data (3–6 seconds) are limited, LPCC performance is better as compared with MFCC. This is because LPCC is able to apprehend more information from speech data this will make a distinction between different speakers [32]. If we increase training and testing data (above 6 seconds), MFCC based feature extraction analysis improves the performance compared to LPCC based feature extraction analysis. The minimum EER of LPCC-SFSR and LPCC-MFR is 1.18%, 0.02% less as compared to MFCC-SFSR and MFCC-MFR, respectively. Further, in case of MFCC-MFR and MFCC-MFSR is having 1.26% and 0.44% less in EER as compared to LPCC-MFS and LPCC-MFSR, respectively. The average EER of LPCC-SFSR, LPCCMFR, LPCC-MFS, and LPCC-MFSR is 0.93%, 0.27%, 0.10%, and 0.03% less as compared with MFCCSFSR, MFCC-MFR, MFCC-MFS, and MFCC-MFSR, respectively.
To justify the above statement, 39 dimensional MFCC and LPCC features are extracted for different analysis techniques and modeled using GMM and GMM-UBM.
These feature vectors contain both static and transitional characteristics of the speaker-specific information [28] . The Δ and ΔΔ coefficients are calculated by capturing the transitional characteristics.
To justify the above statement, 39 dimensional MFCC and LPCC features are extracted for different analysis techniques and modeled using GMM and GMM-UBM.
These feature vectors contain both static and transitional characteristics of the speaker-specific information [28]. The Δ and ΔΔ coefficients are calculated by capturing the transitional characteristics.
Fig. 8(a) and (b) show the experimental results of SFSR analysis using MFCC and LPCC features, respectively. The modeling technique used is GMM. The minimum EER in case of MFCC-SFSR is 30.35% is obtained for 12 seconds data for Gaussian mixtures of 16 compared to other data sizes. The average EER of MFCC-SFSR is 39.52%.
In case of LPCC-SFSR, the least EER is obtained for 12 seconds data for Gaussian mixture of 32 is 29.72% compared with other data sizes. The average EER in case of LPCC-SFSR is 37.95%. The average EER of LPCC-SFSR is minimum as compared with MFCC-SFSR analysis in case of GMM modeling. If we compare the EER of both analysis, LPCC-SFSR is 1.57% lower than MFCC-SFSR.
Performance of SFSR using (a) ΔΔMFCC and (b) ΔΔLPCC features and GMM modeling.
The results of the analysis techniques MFR, MFS, and MFSR are shown in Fig. 9(a)–(c), respectively. The features used are MFCC and modeling is done using GMM. In MFCC-MFR, the least EER is 30.30% obtained for 12 seconds data for Gaussian mixtures of 16 compared to remaining data sizes. The average EER in case of MFCC-MFR is 39.39%. Further, the EER of MFCC-MFR when compared with MFCC-SFSR, MFCC-MFR is 0.13% lower than MFCC-SFSR.
Performance comparison of speaker verification system for ΔΔ using (a) MFCC-MFR, (b) MFCC-MFS, and (c) MFCC-MFSR for GMM modeling.
In MFCC-MFS, the minimum EER is 30.26% for Gaussian mixture of 16 for 12 seconds data as compared to other data sizes. The average EER is 38.25%. It was observed that MFCC-MFS performance is better as compared to MFCC-MFR for all data sizes. When we compared the average EER of MFCC-MFS with MFCC-MFR, MFCC-MFS is having 1.14% lesser EER than MFCC-MFR.
The MFCC-MFSR gives minimum EER of 30.26% which obtained for 12 seconds for Gaussian mixture of 32. The average EER is 37.89%. When the average EER of MFCC-MFSR is compared with MFCC-MFR and MFCC-MFS is 1.5% and 0.36% less in EER, respectively.
The LPCC features are extracted using MFR, MFS, and MFSR analysis techniques and results are shown in Fig. 10(a)–(c), respectively and modeling used is GMM. The result shows that least EER of LPCC-MFR is 30.08% which is obtained for 12 seconds data for Gaussian mixture of 32 compared with all data sizes. The average EER is 37.79%. Further, compare the average EER of LPCC-MFR with LPCCSFSR, the LPCC-MFR average EER is 0.16% lesser than LPCC-SFSR.
In case of LPCC-MFS analysis, the least EER of 29.31% obtained for 12 seconds training and testing data for Gaussian mixture of 16. The average EER is 37.49%. Further, the average EER of LPCC-MFS when compared with LPCC-MFR, LPCC-MFS average EER is 0.30% less than LPCC-MFR. Also for all the data sizes LPCC-MFS has lesser EER than LPCC-MFR.
In case of LPCC-MFSR analysis there is considerable improvement in the EER as compared to LPCCMFS and LPCC-MFR. The least EER in case of LPCC-MFSR is 29.44% for Gaussian mixture of 64 for 12 seconds data size. The average EER of LPCC-MFSR is 36.81%. If we compared the average EER of LPCC-MFSR with LPCC-MFR and LPCC-MFS, LPCC-MFSR is 0.98% and 0.68% lesser in EER as compared with LPCC-MFR and LPCC-MFS, respectively. Other than average reduction in LPCCMFSR, the individual EER for all the data sizes are also considerably lesser than that the LPCC-MFR and LPCC-MFS.
Performance comparison of speaker verification system for ΔΔ using (a) LPCC-MFR, (b) LPCC-MFS, and (c) LPCC-MFSR for GMM modeling.
Fig. 11(a) and (b) shows the experimental results of SFSR analysis in case of MFCC and LPCC, respectively and modeling technique used is GMM-UBM. The minimum EER in case of MFCC-SFSR is 24.48% is obtained for 12 seconds data for Gaussian mixtures of 128 compared to other data sizes. The average EER of MFCC-SFSR is 33.58%.
In case of LPCC-SFSR, the least EER is obtained for 12 seconds data for Gaussian mixture of 64 is 23.71% compared with remaining data sizes. The average EER in case of LPCC-SFSR is 32.72%. The average EER of LPCC-SFSR is minimum as compared with MFCC-SFSR analysis. If we compare the EER of both analysis, LPCC-SFSR is 0.86% lower than MFCC-SFSR.
Performance of SFSR using (a) ΔΔMFCC and (b) ΔΔLPCC features and GMM-UBM modeling.
The results of the analysis techniques MFR, MFS, and MFSR are shown in Fig. 12(a)–(c), respectively. The features used are MFCC and modeling technique used is GMM-UBM. In case of MFCC-MFR, the least EER is 22.4% is obtained for 12 seconds data for Gaussian mixtures of 128 compared to other data sizes. The average EER in case of MFCC-MFR is 32.66%. Further, if we compare the average EER of MFCC-MFR with MFCC-SFSR, MFCC-MFR is 0.06% lower than MFCC-SFSR.
In MFCC-MFS, the minimum EER is 23.25% for Gaussian mixture of 128 for 12 seconds data as compared to remaining data sizes. The average EER is 32.13%. Performance of MFCC-MFS is better as compared to MFCC-MFR for all data sizes. When comparing the average EER of MFCC-MFS with MFCC-MFR, MFCC-MFS is having 0.53% lower in EER than MFCC-MFR.
The MFCC-MFSR gives minimum EER of 22% obtained for 12 seconds for Gaussian mixture of 128. The average EER is 31.83%. While comparing the average EER of MFCC-MFSR with MFCC-MFR and MFCC-MFS is 0.83% and 0.3% less in EER, respectively.
Performance comparison of speaker verification system for ΔΔ using (a) MFCC-MFR, (b) MFCC-MFS, and (c) MFCC-MFSR for GMM-UBM modeling.
Fig. 13(a)–(c) shows performance analysis of MFR, MFS, and MFSR respectively using LPCC features and modeling is done using GMM-UBM. It shows that least EER of LPCC-MFR is 23.69% which is obtained for 12 seconds data for Gaussian mixture of 64 compared with all data sizes. The average EER is 31.91%. Further, while comparing the average EER of LPCC-MFR with LPCC-SFSR, the LPCC-MFR average EER is 0.81% lesser than LPCC-SFSR.
In LPCC-MFS analysis, the least EER of 23.7% obtained for 12 seconds training and testing data for Gaussian mixture of 64 compared with all remaining data sizes. The average EER is 31.7%. Further, the average EER of LPCC-MFS when compared with LPCC-MFR, LPCC-MFS average EER is 0.21% less than LPCC-MFR. Also for all the data sizes LPCC-MFS has lesser EER than LPCC-MFR.
Performance comparison of speaker verification system for ΔΔ using (a) LPCC-MFR, (b) LPCC-MFS, and (c) LPCC-MFSR for GMM-UBM modeling.
In LPCC-MFSR analysis there is considerable improvement in the EER as compared to LPCC-MFS and LPCC-MFR. The least EER in case of LPCC-MFSR is 23.33% for Gaussian mixture of 128 when compared with 12 seconds data size. The average EER of LPCC-MFSR is 31.36%. If we compare the average EER of LPCC-MFSR with LPCC-MFR and LPCC-MFS, LPCC-MFSR is 0.55% and 0.34% lesser in EER as compared with LPCC-MFR and LPCC-MFS, respectively. Other than average EER in LPCCMFSR, the individual EER for all the data sizes are also considerably lesser than that of the LPCC-MFR and LPCC-MFS.
It was observed that, in this experiment also, ΔΔLPCC based MFR, MFS, and MFSR will be having lower EER as compared with MFCC based MFR, MFS and MFSR for 3, 4, 5, and 6 seconds data. If we increase the train/test speech data to 9 and 12 seconds, MFCC based MFR, MFS, and MFSR will have minimum EER compared to LPCC based MFR, MFS, and MFSR. Further, we observed that when both train/test data are limited (3–6 sec), LPCC performance is better as compared with MFCC.
It was observed in the Table 2, even in case of 39 dimension LPCC based analysis techniques will have lesser average EER when compared with MFCC based analysis techniques in case of GMM modeling, but in case of GMM-UBM, MFCC based analysis technique have lesser EER compared to LPCC based techniques. Further, the minimum EER of LPCC-SFSR, LPCC-MFR, LPCC-MFS, and LPCC-MFSR is 0.63%, 0.22%, 0.95%, 0.82% less compared with MFCC-SFSR, MFCC-MFR, MFCC-MFS, and MFCCMFSR, respectively and the average EER of LPCC-SFSR, LPCC-MFR, LPCC-MFS and LPCC-MFSR is 1.57%, 1.6%, 0.76%, and 1.08% less as compared with MFCC-SFSR, MFCC- MFR, MFCC-MFS, and MFCC-MFSR, respectively as modeling done using GMM. Further, the minimum EER of LPCC-SFSR is 0.77% less as compared with MFCC-MFSR, but in other cases MFCC-MFR, MFCC-MFS and MFCCMFSR is having minimum EER of 1.29%, 0.45%, and 1.33% less as compared with EER of LPCC-MFR, LPCC-MFS, and LPCC-MFSR, respectively. The average EER of LPCC-SFSR, LPCC-MFR, LPCC-MFS and LPCC-MFSR is 0.86%, 0.75%, 0.43%, and 0.47% less as compared with MFCC-SFSR, MFCC-MFR, MFCC-MFS, and MFCC-MFSR, respectively.
Comparison of average EER(%) of GMM and GMM-UBM modeling for 39 dimensions feature using SFSR and MFSR analysis techniques