CLC number: TP391; TN912.34
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2013-10-15
Cited: 3
Clicked: 7331
Junhong Zhao, Ji Xu, Wei-qiang Zhang, Hua Yuan, Jia Liu, Shanhong Xia. Exploiting articulatory features for pitch accent detection[J]. Journal of Zhejiang University Science C, 2013, 14(11): 835-844.
@article{title="Exploiting articulatory features for pitch accent detection",
author="Junhong Zhao, Ji Xu, Wei-qiang Zhang, Hua Yuan, Jia Liu, Shanhong Xia",
journal="Journal of Zhejiang University Science C",
volume="14",
number="11",
pages="835-844",
year="2013",
publisher="Zhejiang University Press & Springer",
doi="10.1631/jzus.C1300104"
}
%0 Journal Article
%T Exploiting articulatory features for pitch accent detection
%A Junhong Zhao
%A Ji Xu
%A Wei-qiang Zhang
%A Hua Yuan
%A Jia Liu
%A Shanhong Xia
%J Journal of Zhejiang University SCIENCE C
%V 14
%N 11
%P 835-844
%@ 1869-1951
%D 2013
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.C1300104
TY - JOUR
T1 - Exploiting articulatory features for pitch accent detection
A1 - Junhong Zhao
A1 - Ji Xu
A1 - Wei-qiang Zhang
A1 - Hua Yuan
A1 - Jia Liu
A1 - Shanhong Xia
J0 - Journal of Zhejiang University Science C
VL - 14
IS - 11
SP - 835
EP - 844
%@ 1869-1951
Y1 - 2013
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.C1300104
Abstract: articulatory features describe how articulators are involved in making sounds. Speakers often use a more exaggerated way to pronounce accented phonemes, so articulatory features can be helpful in pitch accent detection. Instead of using the actual articulatory features obtained by direct measurement of articulators, we use the posterior probabilities produced by multi-layer perceptrons (MLPs) as articulatory features. The inputs of MLPs are frame-level acoustic features pre-processed using the split temporal context-2 (STC-2) approach. The outputs are the posterior probabilities of a set of articulatory attributes. These posterior probabilities are averaged piecewise within the range of syllables and eventually act as syllable-level articulatory features. This work is the first to introduce articulatory features into pitch accent detection. Using the articulatory features extracted in this way, together with other traditional acoustic features, can improve the accuracy of pitch accent detection by about 2%.
[1]Ananthakrishnan, S., Narayanan, S., 2008. Automatic prosodic event detection using acoustic, lexical and syntactic evidence. IEEE Trans. Audio Speech Lang. Process., 16(1):216-228.
[2]Black, A.W., Bunnell, H.T., Dou, Y., Muthukumar, P.K., Metze, F., Perry, D., Polzehl, T., Prahallad, K., Steidl, S., Vaughn, C., 2012. Articulatory Features for Expressive Speech Synthesis. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, p.4005-4008.
[3]Chao, H., Yang, Z.L., Liu, W.J., 2012. Improved Tone Modeling by Exploiting Articulatory Features for Mandarin Speech Recognition. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, p.4741-4744.
[4]Cho, T., 2006. Manifestation of prosodic structure in articulatory variation: evidence from lip kinematics in English. Lab. Phonol., 8:519-548.
[5]Erickson, D., 2002. Articulation of extreme formant patterns for emphasized vowels. Phonetica, 59(2-3):134-149.
[6]Fan, R.E., Chen, P.H., Lin, C.J., 2005. Working set selection using second order information for training support vector machines. J. Mach. Learn. Res., 6:1889-1918.
[7]Fougeron, C., 1999. Prosodically Conditioned Articulatory Variations: a Review. UCLA Working Papers in Phonetics, p.1-74.
[8]Hall, M.A., 1999. Correlation-Based Feature Selection for Machine Learning. PhD Thesis, The University of Waikato, New Zealand.
[9]Hall, M.A., Smith, L.A., 1999. Feature Selection for Machine Learning: Comparing a Correlation-Based Filter Approach to the Wrapper. Proc. 12th Int. Florida Artificial Intelligence Research Society Conf., p.235-239.
[10]Iribe, Y., Mori, T., Katsurada, K., Nitta, T., 2010. Pronunciation Instruction Using CG Animation Based on Articulatory Features. Proc. Int. Conf. on Computers in Education, p.501-508.
[11]Iribe, Y., Mori, T., Katsurada, K., Kawai, G., Nitta, T., 2012. Real-Time Visualization of English Pronunciation on an IPA Chart Based on Articulatory Feature Extraction. Proc. Interspeech, p.1271-1274.
[12]Jeon, J.H., Liu, Y., 2009a. Automatic Prosodic Events Detection Using Syllable-Based Acoustic and Syntactic Features. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, p.4565-4568.
[13]Jeon, J.H., Liu, Y., 2009b. Semi-supervised Learning for Automatic Prosodic Event Detection Using Co-training Algorithm. Proc. ACL-IJCNLP, p.540-548.
[14]Jeon, J.H., Liu, Y., 2010. Syllable-Level Prominence Detection with Acoustic Evidence. Proc. Interspeech, p.1772-1775.
[15]Jeon, J.H., Liu, Y., 2012. Automatic prosodic event detection using a novel labeling and selection method in co-training. Speech Commun., 54(3):445-458.
[16]Kirchhoff, K., Fink, G.A., Sagerer, G., 2002. Combining acoustic and articulatory feature information for robust speech recognition. Speech Commun., 37(3-4):303-319.
[17]Krstulovic, S., 1999. LPC-Based Inversion of the DRM Articulatory Model. Proc. European Conf. on Speech Communication and Technology, p.125-128.
[18]Meng, H., Tseng, C.Y., Kondo, M., Harrison, A., Viscelgia, T., 2009. Studying L2 Suprasegmental Features in Asian Enlishes: a Position Paper. Proc. Interspeech, p.1715-1718.
[19]Ostendorf, M., Price, P.J., Shattuck-Hufnagel, S., 1995. The Boston University Radio News Corpus. Linguistic Data Consortium.
[20]Papcun, J., Hochberg, T.R., Thomas, F., Larouche, J., Zacks, J., Levy, S., 1992. Inferring articulation and recognizing gestures from acoustics with a neural network trained on X-ray microbeam data. J. Acoust. Soc. Am., 92(2):688-700.
[21]Qian, Y.M., Liu, J., 2012a. Articulatory Feature Based Multilingual MLPs for Low-Resource Speech Recognition. Proc. Interspeech, p.2602-2605.
[22]Qian, Y.M., Liu, J., 2012b. Cross-Lingualand Ensemble MLPs Strategies for Low-Resource Speech Recognition. Proc. Interspeech, p.2582-2585.
[23]Qian, Y.M., Povey, D., Liu, J., 2011. State-Level Data Borrowing for Low-Resource Speech Recognition Based on Subspace GMMs. Proc. Interspeech, p.553-560.
[24]Qian, Y.M., Xu, J., Liu, J., 2013. Multi-stream posterior features and combining subspace GMMs for low resource LVCSR. Chin. J. Electron., 22(2):291-295.
[25]Richards, H.B., Mason, J.S., Hunt, M., Bridle, J., 1996. Deriving Articulatory Representations of Speech with Various Excitation Modes. Proc. 4th Int. Conf. on Spoken Language, p.1233-1236.
[26]Richards, H.B., Bridle, J., Hunt, M., Mason, J.S., 1997. Vocal Tract Shape Trajectory Estimation Using MLP Analysis-by-Synthesis. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, p.1287-1290.
[27]Sangwan, A., Hansen, J.H.L., 2012. Automatic analysis of Mandarin accented English using phonological features. Speech Commun., 54(1):40-54.
[28]Sangwan, A., Mehrabani, M., Hansen, J.H.L., 2010. Automatic Language Analysis and Identification Based on Speech Production Knowledge. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, p.5006-5010.
[29]Schroeter, J., Sondhi, M.M., 1994. Techniques for estimating vocal-tract shapes from the speech signal. IEEE Trans. Speech Audio Process., 2(1):133-150.
[30]Schwarz, P., Matejka, P., Cernocky, J., 2006. Hierarchical Structure of Neural Networks for Phoneme Recognition. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, p.325-328.
[31]Siniscalchi, S.M., Svendsen, T., Lee, C.H., 2008. Toward a Detector-Based Universal Phone Recognizer. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, p.4261-4264.
[32]Sluijter, A.M.C., van Heuven, V.J., 1996. Acoustic Correlates of Linguistic Stress and Accent in Dutch and American English. Proc. 4th Int. Conf. on Spoken Language, p.630-633.
[33]Sun, X.J., 2002. Pitch Accent Prediction Using Ensemble Machine Learning. Proc. ICSLP, p.953-956.
[34]Taylor, P., 1994. The rise/fall/connection model of intonation. Speech Commun., 15(1-2):169-186.
[35]Taylor, P., 1998. The Tilt Intonation Model. Proc. ICSLP, p.1383-1386.
[36]Witten, I.H., Frank, E., 2005. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Burlington, Massachusetts.
[37]Zhao, J., Yuan, H., Liu, J., Xia, S., 2011. Automatic Lexical Stress Detection Using Acoustic Features for Computer Assisted Language Learning. Proc. APSIPA ASC, p.247-251.
Open peer comments: Debate/Discuss/Question/Opinion
<1>