Journal of Zhejiang University

Journal of Zhejiang University SCIENCE C 2014 Vol.15 No.12 P.1154-1163

Speech enhancement with a GSC-like structure employing sparse coding

Author(s): Li-chun Yang, Yun-tao Qian
Affiliation(s): 1. College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China more
Corresponding email(s): lichun_y@126.com, ytqian@zju.edu.cn
Key Words: Generalized sidelobe canceller, Speech enhancement, Voice activity detection, Dictionary learning, Sparse coding

Share this article to： More <<< Previous Article \|Next Article >>>

Li-chun Yang, Yun-tao Qian. Speech enhancement with a GSC-like structure employing sparse coding[J]. Journal of Zhejiang University Science C, 2014, 15(12): 1154-1163.

@article{title="Speech enhancement with a GSC-like structure employing sparse coding",
author="Li-chun Yang, Yun-tao Qian",
journal="Journal of Zhejiang University Science C",
volume="15",
number="12",
pages="1154-1163",
year="2014",
publisher="Zhejiang University Press & Springer",
doi="10.1631/jzus.C1400085"
}

%0 Journal Article
%T Speech enhancement with a GSC-like structure employing sparse coding
%A Li-chun Yang
%A Yun-tao Qian
%J Journal of Zhejiang University SCIENCE C
%V 15
%N 12
%P 1154-1163
%@ 1869-1951
%D 2014
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.C1400085

TY - JOUR
T1 - Speech enhancement with a GSC-like structure employing sparse coding
A1 - Li-chun Yang
A1 - Yun-tao Qian
J0 - Journal of Zhejiang University Science C
VL - 15
IS - 12
SP - 1154
EP - 1163
%@ 1869-1951
Y1 - 2014
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.C1400085

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: Speech communication is often influenced by various types of interfering signals. To improve the quality of the desired signal, a generalized sidelobe canceller (GSC), which uses a reference signal to estimate the interfering signal, is attracting attention of researchers. However, the interference suppression of GSC is limited since a little residual desired signal leaks into the reference signal. To overcome this problem, we use sparse coding to suppress the residual desired signal while preserving the reference signal. sparse coding with the learned dictionary is usually used to reconstruct the desired signal. As the training samples of a desired signal for dictionary learning are not observable in the real environment, the reconstructed desired signal may contain a lot of residual interfering signal. In contrast, the training samples of the interfering signal during the absence of the desired signal for interferer dictionary learning can be achieved through voice activity detection (VAD). Since the reference signal of an interfering signal is coherent to the interferer dictionary, it can be well restructured by sparse coding, while the residual desired signal will be removed. The performance of GSC will be improved since the estimate of the interfering signal with the proposed reference signal is more accurate than ever. Simulation and experiments on a real acoustic environment show that our proposed method is effective in suppressing interfering signals.

基于稀疏编码的广义旁瓣抵消器语音增强算法

在广义旁瓣抵消器中，利用阻塞矩阵阻塞目标信号得到参考干扰信号，以便估计干扰信号，因此需尽量降低泄漏进参考干扰信号的目标信号。本文使用稀疏编码方法重构通过阻塞矩阵得到的参考干扰信号，以抑制目标干扰信号的泄漏，从而在较小语音失真情况下实现更有效的语音增强。利用非语音段干扰信号作为样本，训练得到干扰信号字典，用以重构参考干扰信号中的干扰信号，而目标信号由于与干扰信号不相关，可以被抑制。本文算法在传统阻塞矩阵基础上，加入干扰信号稀疏编码，实现了抑制残余目标信号的目的。基于稀疏编码的广义旁瓣抵消器具有如下两个特点：一是利用非随机干扰信号结构相对稳定的特性，利用非语音段学习得到干扰信号字典，该字典与参考干扰信号结构特征相关，从而可用于对参考干扰信号的稀疏重构；二是克服了传统方法中单纯使用阻塞矩阵难以有效避免目标信号泄漏的不足，利用干扰信号字典进行稀疏编码，以抑制泄漏的少量残余目标信号的影响。本文提出的基于稀疏编码的参考干扰信号重构方法，可有效抑制参考干扰信号中的少量泄漏目标信号，利用自适应滤波估计得到较为准确的原始干扰信号，可以在目标语音失真度较小的情况下，实现对干扰信号最大程度的抑制。
广义旁瓣抵消器；语音增强；语音活动检测；字典学习；稀疏编码

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Aharon, A.M., Elad, M., 2006. K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process., 54(11):4311-4322.

[2]Avargel, Y., Cohen, I., 2008. Adaptive system identification in the short-time fourier transform domain using cross-multiplicative transfer function approximation. IEEE Trans. Audio Speech Lang. Process., 16(1):162-173.

[3]Elad, M., Aharon, M., 2006. Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans. Image Process., 15(12):3736-3745.

[4]Engan, K., Skretting, K., Husoy, J.H., 2007. Family of iterative LS-based dictionary learning algorithms, ILS-DLA, for sparse signal representation. Dig. Signal Process., 17(1):32-49.

[5]Eshaghi, M., Karami Mollaei, M., 2010. Voice activity detection based on using wavelet packet. Dig. Signal Process., 20(4):1102-1115.

[6]Gannot, S., Burshtein, D., Weinstein, E., 2001. Signal enhancement using beamforming and nonstationarity with applications to speech. IEEE Trans. Signal Process., 49(8):1614-1626.

[7]Gemmeke, J.F., Cranen, B., 2009. Sparse imputation for noise robust speech recognition using soft masks. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, p.4645-4648.

[8]Gribonval, R., Schnass, K., 2008. Some recovery conditions for basis learning by ℓ₁-minimization. IEEE 3rd Int. Symp. on Communications, Control and Signal Processing, p.768-773.

[9]Griffiths, L., Jim, C., 1982. An alternative approach to linearly constrained adaptive beamforming. IEEE Trans. Antennas Propag., 30(1):27-34.

[10]Habets, E.A.P., 2010. Room Impulse Response Generator for MATLAB. Univeristy of Erlangen-Nuremberg, Bavaria, Germany.

[11]He, Y., Han, J., Deng, S., et al., 2012. A solution to residual noise in speech denoising with sparse representation. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, p.4653-4656.

[12]Herbordt, W., Kellermann, W., 2001. Efficient frequency-domain realization of robust generalized sidelobe cancellers. IEEE 4th Workshop on Multimedia Signal Processing, p.377-382.

[13]Hoshuyama, O., Sugiyama, A., Hirano, A., 1999. A robust adaptive beamformer for microphone arrays with a blocking matrix using constrained adaptive filters. IEEE Trans. Signal Process., 47(10):2677-2684.

[14]ITU, 2007. Wideband Extension to Rec. P.862 for the Assessment of Wideband Telephone Networks and Speech Codecs, P.862.2. International Telecommunication Union, Geneva.

[15]Kowalski, M., Torresani, B., 2008. Random models for sparse signals expansion on unions of bases with application to audio signals. IEEE Trans. Signal Process., 56(8):3468-3481.

[16]Krueger, A., Warsitz, E., Haeb-Umbach, R., 2011. Speech enhancement with a GSC-like structure employing eigenvector-based transfer function ratios estimation. IEEE Trans. Audio Speech Lang. Process., 19(1):206-219.

[17]Mairal, J., Bach, F., Ponce, J., et al., 2010. Online learning for matrix factorization and sparse coding. J. Mach. Learn. Res., 11:19-60.

[18]Martin, R., 2001. Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Trans. Speech Audio Process., 9(5):504-512.

[19]Martin, R., 2006. Bias compensation methods for minimum statistics noise power spectral density estimation. Signal Process., 86(6):1215-1229.

[20]Plumbley, M.D., Blumensath, T., Daudet, L., et al., 2010. Sparse representations in audio and music: from coding to source separation. Proc. IEEE, 98(6):995-1005.

[21]Rauhut, H., Schnass, K., Vandergheynst, P., 2008. Compressed sensing and redundant dictionaries. IEEE Trans. Inform. Theory, 54(5):2210-2219.

[22]Rebollo-Neira, L., 2004. Dictionary redundancy elimination. IEEE Proc.-Vis. Image Signal Process., 151(1):31-34.

[23]Sigg, C.D., Dikk, T., Buhmann, J.M., 2012. Speech enhancement using generative dictionary learning. IEEE Trans. Audio Speech Lang. Process., 20(6):1698-1712.

[24]Skretting, K., Engan, K., 2010. Recursive least squares dictionary learning algorithm. IEEE Trans. Signal Process., 58(4):2121-2130.

[25]Sohn, J., Kim, N.S., Sung, W., 1999. A statistical model-based voice activity detection. IEEE Signal Process. Lett., 6(1):1-3.

[26]Talmon, R., Cohen, I., Gannot, S., 2009. Convolutive transfer function generalized sidelobe canceler. IEEE Trans. Audio Speech Lang. Process., 17(7):1420-1434.

[27]Tanyer, S.G., Ozer, H., 2000. Voice activity detection in nonstationary noise. IEEE Trans. Speech Audio Process., 8(4):478-482.

[28]Wright, S.J., Nowak, R.D., Figueiredo, M.A.T., 2009. Sparse reconstruction by separable approximation. IEEE Trans. Signal Process., 57(7):2479-2493.

Open peer comments: Debate/Discuss/Question/Opinion

<1>