|
Frontiers of Information Technology & Electronic Engineering
ISSN 2095-9184 (print), ISSN 2095-9230 (online)
2015 Vol.16 No.5 P.358-366
Speech emotion recognition with unsupervised feature learning
Abstract: Emotion-based features are critical for achieving high performance in a speech emotion recognition (SER) system. In general, it is difficult to develop these features due to the ambiguity of the ground-truth. In this paper, we apply several unsupervised feature learning algorithms (including K-means clustering, the sparse auto-encoder, and sparse restricted Boltzmann machines), which have promise for learning task-related features by using unlabeled data, to speech emotion recognition. We then evaluate the performance of the proposed approach and present a detailed analysis of the effect of two important factors in the model setup, the content window size and the number of hidden layer nodes. Experimental results show that larger content windows and more hidden nodes contribute to higher performance. We also show that the two-layer network cannot explicitly improve performance compared to a single-layer network.
Key words: Speech emotion recognition, Unsupervised feature learning, Neural network, Affect computing
创新点:提出一种基于数据驱动的无监督情感特征学习方法。该方法能够自动从无标注语音数据中学习产生与情感相关的特征映射函数,用于语音情感特征提取。
方法:采用三种无监督学习算法(K-均值聚类,稀疏自动编码器,稀疏受限玻尔兹曼机)从若干无标注语音块中学习产生与目标相关的特征提取器,继而对整个语音样本进行特征提取(卷积和池化),最后训练一个线性支持向量机对未知样本进行识别。同时对模型涉及的超参数(块大小和隐层结点数目)进行选择。
结论:相对于传统原始特征,学习产生的特征具有一定的稀疏性并且对说话人及其他扰动因素具有一定鲁棒性。实验结果表明,尺寸较大的块和数量较多的隐层结点有助于提升系统性能(图4、5)。
关键词组:
Recommended Papers Related to this topic:
References:
Open peer comments: Debate/Discuss/Question/Opinion
<1>
DOI:
10.1631/FITEE.1400323
CLC number:
TP391.4
Download Full Text:
Downloaded:
5933
Download summary:
<Click Here>Downloaded:
2391Clicked:
10250
Cited:
6
On-line Access:
2024-08-27
Received:
2023-10-17
Revision Accepted:
2024-05-08
Crosschecked:
2015-04-10