As a fundamental part of single microphone speech quality enhancement, noise power spectrum estimation is particularly challenging in adverse environments with low signal-to-noise ratio (SNR) and highly non-stationary background noise. In this paper, we propose a novel scheme which applies human speech properties, such as pitch properties of voiced speech and statistical properties of durations of unvoiced speech, into subband spectral tracking to estimate the power spectrum of non-stationary noise. We show that our proposed method is able to estimate the power spectrum more accurately and faster when the noise is highly non-stationary and the proposed method tracks bursts of noise 4-6 times faster than competitive methods. We also show that the mean square error of the estimated noise spectrum by the proposed method is 15% lower on average than competitive methods. The proposed algorithm is then combined with conventional MMSE-STSA and its overall performance is tested in a speech enhancement application. Simulation results justify that the segmental SNR improvement of the proposed system is on average 0.9 dB higher than the competitive system, and the mean opinion score (MOS) improvement is on average 0.17 higher than the competitive system.

Additional Metadata
Keywords Non-stationary noise estimation, Pitch estimation, Speech enhancement, Voice activity detection
Persistent URL dx.doi.org/10.1016/j.specom.2006.10.002
Journal Speech Communication
Citation
Lin, Z. (Zhong), Goubran, R, & Dansereau, R. (2007). Noise estimation using speech/non-speech frame decision and subband spectral tracking. Speech Communication, 49(7-8), 542–557. doi:10.1016/j.specom.2006.10.002