Publishing Service

Polishing & Checking

Journal of Zhejiang University SCIENCE C

ISSN 1869-1951(Print), 1869-196x(Online), Monthly

Mismatched feature detection with finer granularity for emotional speaker recognition

Abstract: The shapes of speakers’ vocal organs change under their different emotional states, which leads to the deviation of the emotional acoustic space of short-time features from the neutral acoustic space and thereby the degradation of the speaker recognition performance. Features deviating greatly from the neutral acoustic space are considered as mismatched features, and they negatively affect speaker recognition systems. Emotion variation produces different feature deformations for different phonemes, so it is reasonable to build a finer model to detect mismatched features under each phoneme. However, given the difficulty of phoneme recognition, three sorts of acoustic class recognition—phoneme classes, Gaussian mixture model (GMM) tokenizer, and probabilistic GMM tokenizer—are proposed to replace phoneme recognition. We propose feature pruning and feature regulation methods to process the mismatched features to improve speaker recognition performance. As for the feature regulation method, a strategy of maximizing the between-class distance and minimizing the within-class distance is adopted to train the transformation matrix to regulate the mismatched features. Experiments conducted on the Mandarin affective speech corpus (MASC) show that our feature pruning and feature regulation methods increase the identification rate (IR) by 3.64% and 6.77%, compared with the baseline GMM-UBM (universal background model) algorithm. Also, corresponding IR increases of 2.09% and 3.32% can be obtained with our methods when applied to the state-of-the-art algorithm i-vector.

Key words: Emotional speaker recognition, Mismatched feature detection, Feature regulation

Chinese Summary  <144> 用于情感说话人识别的精细失真特征检测与修正

研究目的:说话人情感变化时其发音器官会发生形变,导致部分语音特征分布较中性条件下发生一定偏移。这些发生偏移的特征使得说话人识别性能大幅下降,称作"失真特征",需剔除或修正,以提升情感说话人识别系统性能。
创新要点:鉴于不同音素引起的失真特征分布变化存在差异,提出在音素类、高斯符号化和概率高斯符号化三种声学类上的精细失真特征检测模型与修正方法。
研究方法:采用流形分析方法,观测失真特征分布,得到结论:偏离中性特征空间越远,区分说话人能力越差。若基于某项特征的说话人区分能力小于某个阈值,即检测为失真特征(图1)。对于音素类和高斯符号化表示的声学类,采用支持向量机建立可靠–失真特征检测模型;对于概率高斯符号化表征的声学类,采用模糊支持向量机建立可靠–失真特征检测模型。为确保修正后的失真特征逼近真实的中性情形又不损失说话人特性,对检测出的失真特征进行修正时,将失真特征空间映射到可靠特征空间的同时,要使得转换后的失真特征空间和其他说话人的可靠特征空间的距离不会随之减少。
重要结论:情感导致说话人的部分语音特征分布发生变化成为失真特征,通过三种声学类的精细失真特征检测与修正,能够有效处理失真特征,提升系统识别性能。最高的概率高斯符号化下的失真特征修正算法,使得基准的GMM-UBM算法识别率提升6.77%,i-vector算法识别率提升3.32%。

关键词组:情感说话人识别;模糊支持向量机;失真特征检测;特征修正


Share this article to: More

Go to Contents

References:

<Show All>

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





DOI:

10.1631/jzus.C1400002

CLC number:

TP391.4

Download Full Text:

Click Here

Downloaded:

2869

Download summary:

<Click Here> 

Downloaded:

2134

Clicked:

7754

Cited:

2

On-line Access:

2014-10-09

Received:

2014-01-05

Revision Accepted:

2014-05-20

Crosschecked:

2014-09-17

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE