We present a new approach for separating two speech signals when only a single recording of their additive mixture is available. In this approach, log spectra of the sources are estimated using maximum a posteriori estimation given the mixture's log spectrum and the probability density functions of the sources. It is shown that the estimation leads to a two-state, non-linear filter whose states are controlled by the means of the sources. The first state of the filter is expressed using a combination of two Wiener filters whose parameters are controlled by the means and variances of the sources and noise variance and the second state is expressed by the means of the sources. Through the experiments, conducted on a wide variety of mixtures, we show that the MAP based estimator outperforms the methods which use binary mask filtering or Wiener filtering for the separation task.

Additional Metadata
Keywords Binary mask, Maximum a posteriori estimation, Single channel speech separation, Source separation
Conference 8th Annual Conference of the International Speech Communication Association, Interspeech 2007
Citation
Radfar, M.H., & Dansereau, R. (2007). Single channel speech separation using maximum a posteriori estimation. Presented at the 8th Annual Conference of the International Speech Communication Association, Interspeech 2007.