Model-based single channel speech separation techniques commonly use trained patterns of the individual speakers to separate the speech signals. In most recent proposed techniques, it is assumed that data used in the train and test phase have the same level of energy, a prerequisite which is hardly met in the real situations. Considering this limitation, we propose a technique which estimates the gain associated with the individual speakers from the mixture and thus obviate the need for this assumption. The basic idea is to express the probability density function (PDF) of the mixture in terms of the individual speakers' PDFs and corresponding gains. Then, those patterns and gains which maximize the mixture's PDF are selected and used to recover the speech signals. Experimental results conducted on a wide variety of mixtures with signal-to-signal ratios ranging from 0 to 18 dB show that the proposed technique estimates the speakers' gain with 95% accuracy within the range of the actual gain ± %20. Comparing the separated speech signals with the original ones in terms of SNR criterion with/without including the gain estimation stage, we observe a significant SNR improvement (on average 5.73 dB) for the gain included scenario.
2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA
Department of Systems and Computer Engineering

Radfar, M.H., & Dansereau, R. (2007). Long-term gain estimation in model-based single channel speech separation. Presented at the 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA. doi:10.1109/ASPAA.2007.4393019