## Danyang Cao* , Zhixin Chen* and Xue Gao*## |

Initial SNR (dB) | LMS (dB) | SS (dB) | WN (dB) | WT (dB) | LMSSS (dB) | |
---|---|---|---|---|---|---|

1 | 15 | 17.9384 | 20.6200 | 17.2421 | 17.2312 | 16.6912 |

2 | 10 | 17.3406 | 16.7576 | 13.2935 | 15.3530 | 16.4608 |

3 | 5 | 15.9026 | 12.9901 | 9.8536 | 14.0289 | 15.6716 |

4 | 0 | 13.1954 | 9.3721 | 6.9636 | 12.0622 | 13.9622 |

5 | -5 | 9.3054 | 6.0238 | 3.9639 | 8.6432 | 11.1432 |

6 | -10 | 4.7390 | 2.6939 | 2.3632 | 4.9161 | 7.9161 |

7 | -15 | -0.1151 | -0.7819 | -0.7200 | -0.1062 | 4.1819 |

Table 2.

Initial SNR (dB) | LMS (dB) | SS (dB) | WN (dB) | WT (dB) | LMSSS (dB) | |
---|---|---|---|---|---|---|

1 | 15 | 2.9384 | 5.6200 | 2.2412 | 2.2312 | 1.6912 |

2 | 10 | 7.3406 | 6.7576 | 3.2935 | 5.3530 | 6.4608 |

3 | 5 | 10.9026 | 7.9901 | 4.8536 | 9.0289 | 10.6716 |

4 | 0 | 13.1954 | 9.3721 | 6.9636 | 12.5622 | 13.9622 |

5 | -5 | 14.3054 | 11.0238 | 8.9639 | 13.1432 | 16.1432 |

6 | -10 | 14.7390 | 12.6939 | 12.3632 | 14.9161 | 17.9161 |

7 | -15 | 14.8849 | 14.2181 | 14.2800 | 14.8938 | 19.1819 |

The following will further compare the effects among these algorithms. Table 2 shows the differences in SNRs before or after the noise reduction.

Table 2 shows the SNRs get how many improvements by using these algorithms. And Fig. 5 is the trend curve of increased SNR after using these algorithms.

It can be seen from Fig. 5 that when the original SNR is lower than 5 dB, the LMSSS provides the highest SNR. When the SNR is lower than 0 dB, with the decrease of SNR, the lifting curve under LMS algorithm and wavelet threshold algorithm has been gradually flattened. The effects when only using spectral subtraction algorithm or wiener filter algorithm are always very low. The combination of LMS and spectral subtraction algorithm not only increases the SNR more, but also gives a still rising trend.

As described above, in the noise environments which has low SNR [TeX:] $$(<5 \text { dB })$$, the LMSSS algorithm is not only better than that only use of LMS algorithm or spectral subtraction alone, but also more adaptable to the noise environment which has stronger noise.

Figs. 6–9 are the comparisons in the effects of the processed speech waveforms. Since the experimental data are so large, only a few typical diagrams are shown.

The above experiments show that the LMSSS algorithm can effectively improve the SNR and achieve the purpose of speech enhancement. With the decrease of initial SNR, the LMSSS algorithm has better performance than the other two algorithms.

But in fact, the SNR is not the only quantitative analysis to the speech recognition and speech enhancing. Degree of damage to the original speech files after noise reduction is also one important factor. Hence, based on the comprehensive consideration about SNR and degree of damage in the output speech files after noise reduction, we designed experiments to verify that LMSSS algorithm is well in improving the accuracy rate of speech recognition.

The experimental data were divided into A, B two groups, and both of them had the same speech contents. Group A was used for identification and group B for comparison. Each group could be divided into 13 smaller groups according to the different SNRs, A1, A2, ..., A13 and B1, B2, ...., B13, and the corresponding SNRs were 20 dB, 18 dB, 16 dB, ..., 4 dB, 2 dB, 0 dB, -2 dB, -4 dB. Each small group had 1,000 speech data which were added random Gaussian white noise.

Traditional speech recognition only includes four steps: preprocessing of speech data, voice activity detection, feature extraction, and template matching [15]. In order to verify the validity of the LMSSS algorithm, we made some changes in the combined LMSSS algorithm to compare with the previous LMS algorithm and spectral subtraction algorithm. And the recognition rate could be verified to be further improved. The speech recognition experiments can be divided into the following steps:

Step 1: Prepare the voice signal w(i), including pre-emphasis, framing and windowing;

Step 2: Use the MFCC_COS algorithm to detect the endpoint value for the processed data, and obtain the starting point A and the end point B.

Step 3: Use the LMSSS algorithm (or LMS algorithm and spectral subtraction algorithm) to do noise reduction for the speech signal x(i) and get final processed speech signal y(i) ;

Step 4: Extract the MFCC characteristic parameters from the speech signal y(i) to obtain the characteristic matrix R.

Step 5: Intercept the corresponding feature matrix R to get the feature matrix W which is corresponded to endpoint detection result according to the values of the starting point A and the end point B.

Step 6: Calculate the similarity between the feature matrix W and each feature matrix in the template library by using dynamic time warping (DTW) algorithm and output the final matching results.

The process of speech recognition can be represented by the following Fig. 10.

The accuracy rate of speech recognition is shown as the Table 3.

The comparison chart for accuracy rate is shown as Fig. 11.

As is shown from Table 3 and Fig. 11, the accuracy rate of speech recognition based on LMSSS algorithm is significantly higher than that of LMS algorithm and spectral subtraction algorithm alone. Although when the initial SNR is less than 0 dB, the accuracy rates are all dropped sharply. The LMSSS algorithm is still higher than that of LMS and spectral subtraction algorithm. In fact, the reason is that the addition of spectral subtraction algorithm compensates for the filtering delay problem which comes from LMS algorithm. And the LMS algorithm will focus on improving the filtering effect, and do not need to reduce the filter parameters all the time. And the noise intensity of initial part of speech file can be reduced again by the spectral subtraction algorithm.

Table 3.

Initial SNR (dB) | Recognition rate (%) | |||
---|---|---|---|---|

LMS | SS | LMSSS | ||

1 | 20 | 80.4 | 96.2 | 96.7 |

2 | 18 | 91.5 | 97.1 | 99.5 |

3 | 16 | 85.5 | 97.3 | 95.5 |

4 | 14 | 96.1 | 100 | 100 |

5 | 12 | 93.7 | 99.1 | 100 |

6 | 10 | 76.6 | 95.7 | 98.4 |

7 | 8 | 73.5 | 93.8 | 98.5 |

8 | 6 | 83.3 | 95.4 | 97.6 |

9 | 4 | 76.3 | 93.1 | 95.4 |

10 | 2 | 65.9 | 80.9 | 88.2 |

11 | 0 | 60.4 | 83.8 | 88.7 |

12 | -2 | 22.4 | 30.3 | 33.8 |

13 | -4 | 18.3 | 20.8 | 22.6 |

Table 4 and Fig. 12 show the comparison of time consumption for 1,000 speech recognitions.

Shown from comparison of time consuming, LMSSS algorithm needs more calculation, and the ‘Time’ is almost doubled than that of spectral subtraction algorithm. But compared with the LMS algorithm, the computational costs are relatively close. If we just consider the time consuming, LMSSS algorithm will take 325.9069 seconds to do 1,000 speech recognitions when the SNR is 0 dB. It will take 0.326 seconds on average in once speech recognition. The LMS algorithm is 0.345 seconds and spectral subtraction algorithm is 0.14 seconds. Considering one time speech recognition, the difference of time consumption is not so obvious. And the 0.345 seconds has included all operations, such as adding white Gaussian noise, preprocessing, frame division, endpoint detection, feature extraction, noise reduction, template matching based on DTW algorithm, and so on. So in the system with low real-time requirements, LMSSS can meet the needs for effective speech recognition. On the whole, although the LMSSS algorithm spends more time, its advantages in speech enhancement and recognition are more obvious.

Table 4.

Initial SNR (dB) | Time consumption (s) | |||
---|---|---|---|---|

LMS | SS | LMSSS | ||

1 | 20 | 367.8192 | 189.1658 | 391.8110 |

2 | 18 | 355.8416 | 178.2789 | 379.0001 |

3 | 16 | 361.0980 | 169.7724 | 368.2490 |

4 | 14 | 362.3975 | 175.4268 | 374.6448 |

5 | 12 | 364.0179 | 154.9321 | 376.1151 |

6 | 10 | 381.0179 | 158.9518 | 386.0327 |

7 | 8 | 359.3360 | 152.7651 | 375.0003 |

8 | 6 | 334.8769 | 151.5179 | 354.4190 |

9 | 4 | 339.6121 | 156.1250 | 364.1827 |

10 | 2 | 337.7034 | 145.0781 | 343.9932 |

11 | 0 | 325.9069 | 139.7418 | 345.5355 |

12 | -2 | 323.4367 | 137.8965 | 343.9680 |

13 | -4 | 320.6598 | 128.9805 | 344.9089 |

In the worst case analysis, as the original SNR continues to decrease, especially when the SNR is lower than 0 dB, the advantage of the LMSSS algorithm will gradually decrease or even be surpassed by the other two algorithms. LMSSS algorithm will bring more damage to original speech file’s content after two consecutive noise reduction operations (LMS noise reduction and spectral subtraction noise reduction). Although the SNR will continue to be increased, the original speech has become very blurred. Also the LMSSS algorithm will take more time than the other algorithms when processing a large number of speech files in real time.

In this study, we expound the spectral subtraction algorithm which has higher computational efficiency and LMS adaptive filtering algorithm which has better noise reduction effect. Finally, we propose LMSSS speech enhancement method based on the noise reduction principle. The LMSSS algorithm adds spectral subtraction operation to the noisy speech after LMS processing, which simplifies the parameter adjustment of the filter and avoids music noise and filtering delay problem. In the speech recognition experiments, LMSSS algorithm gets the highest accuracy of speech recognition. Although the computational costs are larger, the final effects on speech enhancement are optimistic enough within the experimental range (-4 dB–20 dB).

Experiments show that LMSSS algorithm can improve the SNR of original speech better when the original SNR is lower than 5 dB. And when the SNR is in a certain range (0 dB–15 dB), LMSSS algorithm is more suitable for speech recognition, the effect of speech enhancement is also better than LMS algorithm, spectrum subtraction algorithm and more other commonly used algorithms.

This paper is supported by the National Natural Science Foundation of China (No. 41471303), the Basic Scientific Research Plan Project of the Beijing Municipal Commission of Education (2018), the Special Research Foundation of the North China University of Technology (No. PXM2017_014212_000014), the Yuyou Talents Support Program of North China University of Technology, and the Beijing Natural Science Foundation (No. 4162022).

He received B.S. and M.S. degree in Computer Science and Technology from North China University of Technology in 2000 and 2006, respectively. He received his Ph.D. degree in Computer Application Technology in 2012 from University of Science Technology Beijing, China. He is currently an Associate Professor in the Department of Computer Science at the North China University of Technology, China. His research interests cover the fields of artificial intelligence, data mining.

- 1 X. Wang, L. Li, C. Liu, "Study of speech enhancement algorithm based on spectral subtraction,"
*Journal of the Staff and Worker's University*, vol. 2013, no. 6, pp. 85-87, 2013.custom:[[[-]]] - 2 S. Boll, "Suppression of acoustic noise in speech using spectral subtraction,"
*IEEE Transactions on AcousticsSpeech, and Signal Processing*, vol. 27, no. 2, pp. 113-120, 1979.doi:[[[10.1109/tassp.1979.1163209]]] - 3 D. Tsoukalas, M. Paraskevas, J. Mourjopoulos, "Speech enhancement using psychoacoustic criteria," in
*Proceedings of 1993 IEEE International Conference on Acoustics*, Speech, and Signal Processing, Minneapolis, MN, 1993;pp. 359-362. custom:[[[-]]] - 4 J. S. Lim, A. V. Oppenheim, "Enhancement and bandwidth compression of noisy speech,"
*Proceedings of the IEEE*, vol. 67, no. 12, pp. 1586-1604, 1979.doi:[[[10.1109/proc.1979.11540]]] - 5 B. Widrow, M. E. Hoff, "Adaptive switching circuits,"
*Stanford Electronics LabsStanford University, CA, Report No. TR-1553-1*, 1960.doi:[[[10.21236/ad0241531]]] - 6 D. L. Donoho, J. M. Johnstone, "Ideal spatial adaptation by wavelet shrinkage,"
*Biometrika*, vol. 81, no. 3, pp. 425-455, 1994.doi:[[[10.2307/2337118]]] - 7 S. Thomas Alexander,
*Adaptive Signal Processing: Theory and Applications*, NY: Springer, New Y ork, 1986.custom:[[[-]]] - 8 W. Xu, G. Wang, Y. Geng, F. Bai, T. Fei, "Speech enhancement algorithm based on spectral subtraction and variable-step LMS algorithm,"
*Computer Engineering and Applications*, vol. 51, no. 1, pp. 213-217, 2015.custom:[[[-]]] - 9 H. Chen, X. H. Qiu, "Research on speech enhancement of improved spectral subtraction algorithm,"
*Computer Technology and Development*, vol. 24, no. 4, pp. 70-76, 2014.custom:[[[-]]] - 10 A. D. Poularikas, Z. M. Ramadan,
*Adaptive Filtering Primer with MATLAB*, FL: CRC Press, Boca Raton, 2006.custom:[[[-]]] - 11 Z. Song,
*The Application of MA TLAB in Speech Signal Analysis and Synthesis*, Beijing: Beihang University Press, 2013.custom:[[[-]]] - 12 J. Han, C. Wang, C. Lu, L. Zhang, W Ren, Y. Ma, "Robust speech recognition system in noisy environment,"
*Audio Engineering*, vol. 2002, no. 1, pp. 27-29, 2002.custom:[[[-]]] - 13 J. Bai, L. H. Yang, X. Y. Zhang, "An antinoise SVM parameter optimization method for speech recognition,"
*Journal of Central South University: Science and Technology*, vol. 44, no. 2, pp. 604-611, 2013.custom:[[[-]]] - 14 R. Wang, P. Chai, "A method for speech enhancement based on improved spectral subtraction,"
*Pattern Recognition and Artificial Intelligence*, vol. 16, no. 2, pp. 247-251, 2003.custom:[[[-]]] - 15 Y. Yang, W. Shi, "Implementation of adaptive filter on wave-generated magnetic noise based on LMS algorithm,"
*Journal of Jiangsu Institute of Education (Natural Science)*, vol. 27, no. 1, pp. 9-10, 2011.custom:[[[-]]]