CLC number: TN912
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2010-12-30
Cited: 0
Clicked: 7741
Pejman Mowlaee, Abolghasem Sayadian, Hamid Sheikhzadeh. Split vector quantization for sinusoidal amplitude and frequency[J]. Journal of Zhejiang University Science C, 2011, 12(2): 140-154.
@article{title="Split vector quantization for sinusoidal amplitude and frequency",
author="Pejman Mowlaee, Abolghasem Sayadian, Hamid Sheikhzadeh",
journal="Journal of Zhejiang University Science C",
volume="12",
number="2",
pages="140-154",
year="2011",
publisher="Zhejiang University Press & Springer",
doi="10.1631/jzus.C1000020"
}
%0 Journal Article
%T Split vector quantization for sinusoidal amplitude and frequency
%A Pejman Mowlaee
%A Abolghasem Sayadian
%A Hamid Sheikhzadeh
%J Journal of Zhejiang University SCIENCE C
%V 12
%N 2
%P 140-154
%@ 1869-1951
%D 2011
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.C1000020
TY - JOUR
T1 - Split vector quantization for sinusoidal amplitude and frequency
A1 - Pejman Mowlaee
A1 - Abolghasem Sayadian
A1 - Hamid Sheikhzadeh
J0 - Journal of Zhejiang University Science C
VL - 12
IS - 2
SP - 140
EP - 154
%@ 1869-1951
Y1 - 2011
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.C1000020
Abstract: In this paper, we suggest applying tree structure on the sinusoidal parameters. The suggested sinusoidal coder is targeted to find the coded sinusoidal parameters obtained by minimizing a likelihood function in a least square (LS) sense. From a rate-distortion standpoint, we address the problem of how to allocate available bits among different frequency bands to code sinusoids at each frame. For further analyzing the quantization behavior of the proposed method, we assess the quantization performance with respect to other methods: the short-time Fourier transform (STFT) based coder commonly used for speech enhancement or separation, and the line spectral frequency (LSF) coder used in speech coding. Through extensive simulations, we show that the proposed quantizer leads to less spectral distortion as well as higher perceived quality for the re-synthesized signals based on the coded parameters in a model-based approach with respect to previous STFT-based methods. The proposed method lowers the complexity, and, due to its tree-structure, leads to a rapid search capability. It provides flexibility for use in many speaker-independent applications by finding the most likely frequency vectors selected from a list of frequency candidates. Therefore, the proposed quantizer can be considered an attractive candidate for model-based speech applications in both speaker-dependent and speaker-independent scenarios.
[1]Ahmadi, S., Spanias, A.S., 2001. Low bit-rate speech coding based on an improved sinusoidal model. Speech Commun., 34(4):369-390.
[2]Christensen, M.G., 2008. On perceptual distortion measures and parametric modeling. J. Acoust. Soc. Am., 123(5):3804.
[3]Chu, W.C., 2004. Vector quantization of harmonic magnitudes in speech coding applications—a survey and new technique. EURASIP J. Adv. Signal Process., (17):2601-2613.
[4]Cooke, M.P., Barker, J., Cunningham, S.P., Shao, X., 2006. An audiovisual corpus for speech perception and automatic speech recognition. J. Acoust. Soc. Am., 120(5):2421-2424.
[5]Cover, T.M., Thomas, J.A., 2006. Elements of Information Theory. John Wiley and Sons, New York.
[6]Ellis, D.P.W., Weiss, R.J., 2006. Model-Based Monaural Source Separation Using a Vector-Quantized Phase-Vocoder Representation. ICASSP, p.957-960.
[7]Ephraim, Y., 1992. Statistical model based speech enhancement systems. Proc. IEEE, 80(10):1526-1555.
[8]Erkelens, J.S., Broersen, P.M.T., 1996. Reconstruction error distortion measure for quantization of LPC models. Electron. Lett., 32(15):1347-1349.
[9]Gardner, W., Rao, B., 1995. Theoretical analysis of the high rate vector quantization of LPC parameters. IEEE Trans. Speech Audio Process., 3(5):367-381.
[10]Grenander, U., Szegö, G., 1984. Topelitz Forms and Their Applications (2nd Ed.). Chelsea Publishing Company, New York.
[11]Heusdens, R., van de Par, S., 2002. Rate-Distortion Optimal Sinusoidal Modeling of Audio and Speech Using Psychoacoustical Matching Pursuits. ICASSP, 2:1809-1812.
[12]Heusdens, R., Kleijn, W.B., Ozerov, A., 2007. Entropy Constrained High Resolution Lattice Vector Quantization Using a Perceptually Relevant Distortion Measure. Proc. Asilomar.
[13]Hu, Y., Loizou, P., 2007. Subjective comparison and evaluation of speech enhancement algorithms. Speech Commun., 49(7-8):588-601.
[14]ITU-T P.862, 2001. Perceptual Evaluation of Speech Quality (PESQ): an Objective Method for End-to-End Speech Quality Assessment Of Narrow-Band Telephone Networks and Speech Codecs. International Telecommunication Union, Geneva.
[15]Korten, P., Jensen, J., Heusdens, R., 2007. High-resolution spherical quantization of sinusoidal parameters. IEEE Trans. Audio, Speech Lang. Process., 15(3):966-981.
[16]Kristijansson, T., Attias, H., Hershey, J., 2004. Single Microphone Source Separation Using High Resolution Signal Reconstruction. ICASSP, p.817-820.
[17]Loizou, P., 2007. Speech Enhancement Theory and Practice. CRC Press, Boca Raton, FL, USA, p.143.
[18]McAulay, R.J., Quatieri, T.F., 1986. Speech analysis/synthesis based on a sinusoidal representation. IEEE Trans. Acoust. Speech Signal Process., 34(4):744-754.
[19]Moo, P., Neuhoff, D., 1998. Uniform Polar Quantization Revisited. Proc. IEEE Int. Symp. on Information Theory, p.100.
[20]Moore, B.C.J., 1997. An Introduction to the Psychology of Hearing (4th Ed.). Academic Press, New York, p.89-103.
[21]Mowlaee, P., Sayadiyan, A., 2008. Model-Based Monaural Sound Separation by Split-VQ of Sinusoidal Parameters. 16th European Signal Processing Conf.
[22]Mowlaee, P., Sayadiyan, A., Sheikhzadeh, H., 2009. FDMSM robust signal representation for speech mixtures and noise corrupted audio signals. IEICE Electron. Expr., 6(15):1077-1083.
[23]Mowlaee, P., Sayadiyan, A., Sheikhzadeh, H., 2010a. Evaluating single-channel speech separation performance in transform domain. J. Zhejiang Univ.-Sci. C (Comput. & Elctron.), 11(3):160-174.
[24]Mowlaee, P., Christensen, M.G., Jensen, S.H., 2010b. Improved Single-Channel Speech Separation Using Sinusoidal Modeling. ICASSP, p.21-24.
[25]Mowlaee, P., Christensen, M.G., Jensen, S.H., 2010c. Sinusoidal Masks for Single Channel Speech Separation. ICASSP, p.4262-4266.
[26]Mowlaee, P., Saeidi, R., Tan, Z.H., Christensen, M.G., Fränti, P., Jensen, S.H., 2010d. Joint Single-Channel Speech Separation and Speaker Identification. ICASSP, p.4430-4433.
[27]Paliwal, K.K., Atal, B.S., 1993. Efficient vector quantization of LPC parameters at 24 bits/frame. IEEE Trans. Speech Audio Process., 1(1):3-14.
[28]Paliwal, K.K., Kleijn, W.B., 1995. Quantization of LPC Parameters. In: Kleijn, W.B., Paliwal, K.K. (Eds.), Speech Coding and Synthesis. Elsevier, Amsterdam, the Netherlands, p.443-466.
[29]Quatieri, T.F., 2002. Discrete-Time Speech Signal Processing: Principles and Practice. Prentice-Hall, Upper Saddle River, NJ.
[30]Roweis, S., 2003. Factorial Models and Refiltering for Speech Separation and Denoising. 8th European Conf. on Speech Communication and Technology, p.1009-1012.
[31]So, S., Paliwal, K.K., 2007. A comparative study of LPC parameter representations and quantisation schemes for wideband speech coding. Dig. Signal Process., 17(1):114-137.
[32]Vafin, R., Kleijn, W., 2005. Entropy-constrained polar quantization and its application to audio coding. IEEE Trans. Speech Audio Process., 13(2):220-232.
[33]van Schijndel, N.H., Bensa, J., Christensen, M., Colomes, C., Edler, B., Heusdens, R., Jensen, J., Jensen, S.H., Kleijn, W.B., Kot, V., et al., 2008. Adaptive RD optimized hybrid sound coding J. Audio Eng. Soc., 56(10):787-809.
[34]Zavarehei, E., Vaseghi, S., Qin, Y., 2007. Noisy speech enhancement using harmonic-noise model and code-book-based post-processing. IEEE Trans. Audio Speech Lang. Process., 15(4):1194-1203.
Open peer comments: Debate/Discuss/Question/Opinion
<1>