JZUS - Journal of Zhejiang University SCIENCE

Frontiers of Information Technology & Electronic Engineering

ISSN 2095-9184 (print), ISSN 2095-9230 (online)

2021 Vol.22 No.5 P.697-708

Latent discriminative representation learning for speaker recognition

Duolin Huang, Qirong Mao, Zhongchen Ma, Zhishen Zheng, Sidheswar Routryar, Elias-Nii-Noi Ocquaye

School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang 212013, China; Jiangsu Key Laboratory of Security Technology for Industrial Cyberspace, Zhenjiang 212013, China

2211708034@stmail.ujs.edu.cn, mao_qr@ujs.edu.cn, zhongchen_ma@ujs.edu.cn, 1209103822@qq.com, sidheswar69@gmail.com, eocquaye@ujs.edu.cn

Abstract: Extracting discriminative speaker-specific representations from speech signals and transforming them into fixed length vectors are key steps in speaker identification and verification systems. In this study, we propose a latent discriminative representation learning method for speaker recognition. We mean that the learned representations in this study are not only discriminative but also relevant. Specifically, we introduce an additional speaker embedded lookup table to explore the relevance between different utterances from the same speaker. Moreover, a reconstruction constraint intended to learn a linear mapping matrix is introduced to make representation discriminative. Experimental results demonstrate that the proposed method outperforms state-of-the-art methods based on the Apollo dataset used in the Fearless Steps Challenge in INTERSPEECH2019 and the TIMIT dataset.

Key words: Speaker recognition, Latent discriminative representation learning, Speaker embedding lookup table, Linear mapping matrix

Chinese Summary <44> 用于说话人识别的潜在可区分性表征学习

黄多林¹，毛启容^1,2，马忠臣¹，郑智燊¹，Sidheswar ROUTRAY¹，Elias-Nii-Noi OCQUAYE¹
¹江苏大学计算机科学与通信工程学院，中国镇江市，212013
²江苏省工业网络空间安全技术重点实验室，中国镇江市，212013

摘要：从语音信号中提取特定说话人的可区分性表征，并将其转换为固定长度的向量是说话人识别和验证系统的关键步骤。提出一种潜在的可区分性表征学习方法，用于说话人识别。我们认为所学表征不仅具有可区分性，还具有相关性。具体来说，引入附加说话人嵌入查找表以探索同一说话人不同语音之间的相关性。此外，引入一个重构约束用于学习线性映射矩阵，使表征更具可区分性。实验结果表明，所提方法在INTERSPEECH2019会议的Fearless Step Challenge挑战赛的Apollo数据集和TIMIT数据集上的性能优于目前最先进方法。

关键词组：说话人识别；潜在可区分性表征学习；说话人嵌入查找表；线性映射矩阵

Share this article to： More

Go to Contents

References:

Open peer comments: Debate/Discuss/Question/Opinion

<1>

DOI:

10.1631/FITEE.1900690

CLC number:

TP391.4

Download Full Text:

Click Here

Downloaded:

6257

Download summary:

Downloaded:

1941

Clicked:

7821

Cited:

On-line Access:

2024-08-27

Received:

2023-10-17

Revision Accepted:

2024-05-08

Crosschecked:

2020-11-18

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE

CONTENTS

INSTR. FOR AUTHOR

FOR REVIEWER

ABOUT JZUS

Publishing Service