CLC number: TP391.42
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2011-01-31
Cited: 0
Clicked: 7009
Hai-hua Xu, Jie Zhu. An iterative approach to Bayes risk decoding and system combination[J]. Journal of Zhejiang University Science C, 2011, 12(3): 204-212.
@article{title="An iterative approach to Bayes risk decoding and system combination",
author="Hai-hua Xu, Jie Zhu",
journal="Journal of Zhejiang University Science C",
volume="12",
number="3",
pages="204-212",
year="2011",
publisher="Zhejiang University Press & Springer",
doi="10.1631/jzus.C1000045"
}
%0 Journal Article
%T An iterative approach to Bayes risk decoding and system combination
%A Hai-hua Xu
%A Jie Zhu
%J Journal of Zhejiang University SCIENCE C
%V 12
%N 3
%P 204-212
%@ 1869-1951
%D 2011
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.C1000045
TY - JOUR
T1 - An iterative approach to Bayes risk decoding and system combination
A1 - Hai-hua Xu
A1 - Jie Zhu
J0 - Journal of Zhejiang University Science C
VL - 12
IS - 3
SP - 204
EP - 212
%@ 1869-1951
Y1 - 2011
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.C1000045
Abstract: We describe a novel approach to bayes risk (BR) decoding for speech recognition, in which we attempt to find the hypothesis that minimizes an estimate of the BR with regard to the minimum word error (MWE) metric. To achieve this, we propose improved forward and backward algorithms on the lattices and the whole procedure is optimized recursively. The remarkable characteristics of the proposed approach are that the optimization procedure is expectation-maximization (EM) like and the formation of the updated result is similar to that obtained with the confusion network (CN) decoding method. Experimental results indicated that the proposed method leads to an error reduction for both lattice rescoring and lattice-based system combinations, compared with CN decoding, confusion network combination (CNC), and ROVER methods.
[1]Evermann, G., Woodland, P.C., 2000. Posterior Probability Decoding, Confidence Estimation, and System Combination. Proc. NIST Speech Transcription Workshop, College Park, MD.
[2]Fiscus, J.G., 1997. A Post-Processing System to Yield Reduced Word Error Rates: Recognizer Output Voting Error Reduction (ROVER). Proc. IEEE Workshop Automatic Speech Recognition and Understanding, p.347-354.
[3]Goel, V., Kumar, S., Byrne, W.J., 2000. Minimum Bayes-risk automatic speech recognition. Comput. Speech Lang., 14(2):115-135.
[4]Goel, V., Kumar, S., Byrne, W., 2004. Segmental minimum Bayes-risk decoding for automatic speech recognition. IEEE Trans. Speech Audio Process., 12(3):234-249.
[5]HakKani-Tur, D., Riccardi, G., 2003. A General Algorithm for Word Graph Matrix Decomposition. IEEE Int. Conf. on Acoustic, Speech, and Signal Processing, 1:I-596-I-599.
[6]Heigold, G., Macherey, W., Schluter, R., Ney, H., 2005. Minimum Exact Word Error Training. Proc. IEEE Workshop on Automatic Speech Recognition and Understanding, p.186-190.
[7]Hoffmeister, B., Klein, T., Schluter, R., Ney, H., 2006. Frame Based System Combination and a Comparison with Weighted Rover and CNC. Int. Conf. on Spoken Language Processing, p.1-4.
[8]Levenshtein, V.I., 1966. Binary codes capable of correcting deleltions, insertions and reversals. Sov. Phys. Dokl., 10:707-710.
[9]Mangu, L., Brill, E., Stolcke, A., 2000. Finding consensus in speech recognition: word error minimization and other applications of confusion networks. Comput. Speech Lang., 14(4):373-400.
[10]Ortmanns, S., Ney, H., 1997. A word graph algorithm for large vocabulary continuous speech recognition. Comput. Speech Lang., 11(1):43-72.
[11]Povey, D., 2004. Discriminative Training for Large Vocabulary Speech Recognition. PhD Thesis, Cambridge University.
[12]Povey, D., Woodland, P.C., 2002. Minimum Phone Error and I-Smoothing for Improved Discriminative Training. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, p.105-108.
[13]Povey, D., Kanvevsky, D., Kingsbury, B., 2008. Boosted MMI for Model and Feature-Space Discriminative Training Recognition. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, p.4057-4060.
[14]Schwenk, H., Gauvain, J.L., 2000. Combining Multiple Speech Recognizers Using Voting and Language Model Information. Int. Conf. on Spoken Language Processing, 2:915-918.
[15]Stolcke, A., Konig, Y., Weintraub, M., 1997. Explicit Word Error Minimization in N-Best List Rescoring. Proc. 5th European Conf. on Speech Communication and Technology, 1:163-166.
[16]Wessel, F., Schluter, R., Ney, H., 2001. Explicit Word Error Minimization Using Word Hypothesis Posterior Probabilities. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, 1:33-36.
[17]Xu, H., Povey, D., Zhu, J., Wu, G., 2009a. Minimum Hypothesis Phone Error as a Decoding Method for Speech Recognition. INTERSPEECH, 10th Annual Conf. Int. Speech Communication Association, p.76-79.
[18]Xu, H., Zhu, J., Wu, G., 2009b. An Efficient Multistage Rover Method for Automatic Speech Recognition. IEEE Int. Conf. on Multimedia and Expo, p.894-897.
[19]Xu, H., Povey, D., Mangu, L., Zhu, J., 2010. An Improved Consensus-Like Method for Minimum Bayes Risk Decoding and Lattice Combination. IEEE Int. Conf. on Acoustics Speech and Signal Processing, p.4938-4941.
[20]Young, S., Evermann, G., Gales, M., et al., 2008. The HTK Book. Version 3.4, Cambridge University. Available from http://htk.eng.cam.ac.uk/
Open peer comments: Debate/Discuss/Question/Opinion
<1>