International Journal of Image, Graphics and Signal Processing(IJIGSP)
ISSN: 2074-9074 (Print), ISSN: 2074-9082 (Online)
Published By: MECS Press
IJIGSP Vol.11, No.7, Jul. 2019
Performance Analysis of Statistical Approaches and NMF Approaches for Speech Enhancement
Full Text (PDF, 1573KB), PP.9-38
Super-Gaussian Based Bayesian Estimators plays significant role in noise reduction. However, the traditional Bayesian Estimators process only DFT spectral amplitude of noisy speech and the phase is left unprocessed. While deriving Bayesian estimators, consideration of phase information provides improved results. The main objective of this paper is twofold. Firstly, the Super-Gaussian based Complex speech coefficients given Uncertain Phase (CUP) based Bayesian estimators are compared under different noise conditions like White noise, Babble noise, Pink noise, Modulated Pink noise, Factory noise, Car noise, Street noise, F16 noise and M109 noise. Secondly, a novel speech enhancement method is proposed by combining CUP estimators with different NMF approaches and online bases updation. The statistical estimators show less effective results under completely non-stationary assumptions. Non-negative Matrix Factorization (NMF) based algorithms show better performance for non stationary noises. The drawback of NMF is, it requires training and/or requires clean speech and noise signals. This drawback can be overcome by taking the advantages of both statistical approaches and NMF approaches. Such approaches like Posteriori Regularized NMF (PR-NMF), Weibull Rayleigh NMF (WR-NMF), Nakagami Rayleigh (NR-NMF), CUP estimator with Gamma and Generalized Gamma distributions + NMF + Online bases Update (CUP-GG + NMF + OU) and CUP-GG + WR-NMF / NR-NMF + OU are considered for comparison. The objective of this paper is to analyze the performance of speech enhancement methods using Bayesian estimators, NMF approaches, Combination of statistical and NMF approaches. The objective performance measures Perceptual Evaluation of Speech Quality (PESQ), Short-Time Objective Intelligibility (STOI), Signal to Noise Ratio (SNR), Signal to Distortion Ratio (SDR), Segmental SNR (Seg SNR) are considered for comparison.
Cite This Paper
Ravi Kumar Kandagatla, P V Subbaiah, "Performance Analysis of Statistical Approaches and NMF Approaches for Speech Enhancement", International Journal of Image, Graphics and Signal Processing(IJIGSP), Vol.11, No.7, pp. 9-38, 2019.DOI: 10.5815/ijigsp.2019.07.02
R. Martin, “Speech enhancement based on minimum mean-square error estimation and super-Gaussian priors,” IEEE Trans. Speech Audio Process., vol. 13, no. 5, pp. 845–856, Sep. 2005.
C.H.You, S. N.Koh, and S. Rahardja, “β-order MMSE spectral amplitude estimation for speech enhancement,” IEEE Trans. Speech Audio Process., vol. 13, no. 4, pp. 475–486, Jul. 2005
B. Fodor, T. Fingscheidt, “MMSE speech enhancement under speech presence uncertainty assuming (generalized) gamma speech prioris throughout” IEEE International conf. on Acoustics, Speech and Signal Processing (ICASSP), pp.4033-4036, 25-30 March, 2012
Lotter, T., Vary, P., Jan. 2005. Speech enhancement by MAP spectral amplitude estimation using a super-Gaussian speech model. EURASIP J. Appl. Signal Process. 2005 (7), 1110–1126
Cyril Plapous, Clande Marro, Pascal Scalart, “Speech enhancement using Harmonic Regeneration” IEEE conf. on Acoustics, speech, and signal processing (ICASSP),Vol.1, pp. 157-160, 18-23 March, 2005
K. Paliwal, K. W´ojcicki, and B. Shannon, “The importance of phase in speech enhancement,” Speech Commun., vol. 53, no. 4, pp. 465-494, Apr.2011
Gerkmann, T., Krawczyk, M., Feb. 2013. MMSE-optimal spectral amplitude estimation given the STFT-phase. IEEE Signal Process. Lett. 20 (2), 129– 132.
T. Gerkman, “Bayesian estimation of clean speech spectral coefficients given a priori knowledge of the phase,” IEEE Trans, Signal Process., vol. 62, no. 16, pp. 4199-4208, Aug. 2014
Ravi Kumar. K, P. V. Subbaiah, “ Speech enhancement using MMSE estimation of amplitude and complex speech spectral coefficients under phase uncertainty” in Speech Communication journal, Vol.96, pp.10-27, 2018.
D. D. Lee, H. S. Seung, “Algorithms for non-negative matrix factorization” in Proc. Advances in Neural information processing systems (NIPS)., pp. 556 – 562, 2001.
N. J. Bryan, G. J. Mysore, “ An efficient posterior regularized latent variable model for interactive sound source separation, ” in International Conference of Machine Learning (ICML)., pp.208 – 216, June 2013
Kwon Kisoo , Jong Won Shin , Nam Soo Kim “ NMF-based speech enhancement using bases update”. Sig Process Lett IEEE 2015;22(4):450–4
Ravi Kumar K, P.V. Subbaiah, “Enhancement of noisy speech using sub-band harmonic regeneration and speech presence uncertain estimator” in Proc. IEEE International conference on Recent Trends in Electronics, Information & Communication Technology January, 2017
Ravi Kumar, P.V. Subbaiah “ Posteriori Regularization based Non-Negative Matrix Factorization approach for
Speech Enhancement, in International Journal of Innovative Technology and Exploring Engineering, V0l.8, Issue-5S, March, 2019