CLC number: TN912.326, O432
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 0000-00-00
Cited: 0
Clicked: 6047
YU Zhen-li, CHING Pak-chung. PITHC-SYNCHRONOUS ARTICULATORY SYNTHESIS INCORPORATED WITH THE INVERSE SOLUTION OF SPEECH PRODUCTION[J]. Journal of Zhejiang University Science A, 2000, 1(4): 388-393.
@article{title="PITHC-SYNCHRONOUS ARTICULATORY SYNTHESIS INCORPORATED WITH THE INVERSE SOLUTION OF SPEECH PRODUCTION",
author="YU Zhen-li, CHING Pak-chung",
journal="Journal of Zhejiang University Science A",
volume="1",
number="4",
pages="388-393",
year="2000",
publisher="Zhejiang University Press & Springer",
doi="10.1631/jzus.2000.0388"
}
%0 Journal Article
%T PITHC-SYNCHRONOUS ARTICULATORY SYNTHESIS INCORPORATED WITH THE INVERSE SOLUTION OF SPEECH PRODUCTION
%A YU Zhen-li
%A CHING Pak-chung
%J Journal of Zhejiang University SCIENCE A
%V 1
%N 4
%P 388-393
%@ 1869-1951
%D 2000
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.2000.0388
TY - JOUR
T1 - PITHC-SYNCHRONOUS ARTICULATORY SYNTHESIS INCORPORATED WITH THE INVERSE SOLUTION OF SPEECH PRODUCTION
A1 - YU Zhen-li
A1 - CHING Pak-chung
J0 - Journal of Zhejiang University Science A
VL - 1
IS - 4
SP - 388
EP - 393
%@ 1869-1951
Y1 - 2000
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.2000.0388
Abstract: This paper presents a new proposal to synthesize natural sounds with less control parameters by combining the inverse speech production and pitch-synchronous articulatory synthesis. The pitch-synchronous excited Reflection-Type Line Analog (RTLA) model is employed as the synthesis filter. Multi-rate system sampling and dynamic scattering wave adjustment are used to handle the variable VT length and the acoustic continuity. The synthesizer is controlled by vocal-tract (VT) area functions. Given the targets of formant trajectories, the dynamic VT area function which is modeled by time variant VT length is derived using an inverse solution of speech production. A distinguishing feature of this method is that artificially specified formant trace can be precisely aimed in the synthetical sounds. Experimental results show that the formant target can be well matched by the synthetic sounds. Potential application to text-to-speech conversion of this method is discussed.
[1]Entropic Research Lab., 1993. Mannual of Xwaves, ESPS programs Version 5.0
[2]Gupta, S.K. and Schroeter, J., 1993. Pitch-synchronous frame-by-frame and segment-based articulatory analysis by synthesis. J. Acoust. Soc. Am.,94(5):2517-2530.
[3]Kelly, J.L. and Lockbaum, C.C., 1962. Speech synthesis. Proc. 4th Int. Congress on Acoustics, Copenhagen, G(42):1-4.
[4]Liljencrants, J., 1985. Reflection-type line analog synthesis. Ph.D. Thesis, Royal Institution of Technology (KTH), Stockholm, p.141.
[5]Mermelstein, P., 1967. Determination of vocal tract shapes from measured formant frequencies. J. Acoust. Soc .Am., 41(5):1283-1294.
[6]Moulines, E., 1995. TIme-domain and frequency-domain techniques for prosodic modification of speech. In: Speech Coding and Synthesis, Edited by Kleijn, W.B. and Paliwal, K.K., Elsevier, Amsterdam, p.519-555.
[7]Rosenberg, A.E., 1971. Effect of pulse shape on the quality of natural vowels. J. Acoust. Soc. Am., 49(2):583-591.
[8]Schroeder, M.R., 1967. Determination of the geometry of the human vocal tract by acoustic measurements. J. Acoust. Soc. Am., 41(4):1002-1010.
[9]Schroeder, J. and Sondhi, M.M., 1994. Techniques for estimating vocal-tract shapes from the speech signal. IEEE Trans. Speech & Audio Processing, 2(1-II):133-150.
[10]Wu, H.Y., Badin, P. and Cheng, Y.M., 1987. Vocal tract simulation: implementation of continuous variation of the length in Kelly-Lochbaum model, effects of area function spatial sampling. Proc. ICASSP'86, 1:9-12.
[11]Yu, Z.L. and Ching, P.C., 1996. Determination of vocal-tract shapes from formant frequencies based on perturbation theory and interpolation method. Proc. ICASSP'96, Atlanta, USA, 1:369-372.
[12]Yu, Z.L. and Ching, P.C., 1997. Geometrically and acoustically optimized codebook for unique mapping from formants to vocal-tract shape. Proc. EUROSPEECH'97, Rhodes, Greece, 5:2551-2554.
Open peer comments: Debate/Discuss/Question/Opinion
<1>