Publishing Service

Polishing & Checking

Frontiers of Information Technology & Electronic Engineering

ISSN 2095-9184 (print), ISSN 2095-9230 (online)

Latent source-specific generative factor learning for monaural speech separation using weighted-factor autoencoder

Abstract: Much recent progress in monaural speech separation (MSS) has been achieved through a series of deep learning architectures based on autoencoders, which use an encoder to condense the input signal into compressed features and then feed these features into a decoder to construct a specific audio source of interest. However, these approaches can neither learn generative factors of the original input for MSS nor construct each audio source in mixed speech. In this study, we propose a novel weighted-factor autoencoder (WFAE) model for MSS, which introduces a regularization loss in the objective function to isolate one source without containing other sources. By incorporating a latent attention mechanism and a supervised source constructor in the separation layer, WFAE can learn source-specific generative factors and a set of discriminative features for each source, leading to MSS performance improvement. Experiments on benchmark datasets show that our approach outperforms the existing methods. In terms of three important metrics, WFAE has great success on a relatively challenging MSS case, i.e., speaker-independent MSS.

Key words: Speech separation, Generative factors, Autoencoder, Deep learning

Chinese Summary  <31> 基于加权因子自动编码器和潜在特定源生成因子学习的单通道语音分离


陈静静1,毛启容1,2,秦友才1,钱双庆1,郑志燊1
1江苏大学计算机科学与通信工程学院,中国镇江市,212013
2江苏省工业网络安全技术重点实验室,中国镇江市,212013

摘要:通过一系列基于自动编码器的深度学习网络结构,单通道语音分离方法最近取得诸多进展,其使用编码器将输入信号压缩为中间特征,再把这些特征送入解码器重构感兴趣的特定音频源。然而,这些方法既无法为单通道语音分离学习原始输入的生成因子,也无法构造混合语音中的所有音频源。本文提出一个新的加权因子自动编码器模型,在目标函数中引入正则化损失以约束目标源,摒除其他信号源。通过在分离层中引入潜在注意力机制和监督源构造器,加权因子自动编码器可为每一个信号源习得特定于源的生成因子和一组鉴别性特征,从而提升单通道语音分离性能。在基准数据集上的实验表明所提方法优于现有方法。就3个重要指标而言,加权因子自动编码器在相对更具挑战性的任务(与说话人无关的单通道语音分离)上取得巨大成功。

关键词组:语音分离;生成因子;自动编码器;深度学习


Share this article to: More

Go to Contents

References:

<Show All>

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





DOI:

10.1631/FITEE.2000019

CLC number:

TN912.3

Download Full Text:

Click Here

Downloaded:

8300

Download summary:

<Click Here> 

Downloaded:

1781

Clicked:

5288

Cited:

0

On-line Access:

2024-08-27

Received:

2023-10-17

Revision Accepted:

2024-05-08

Crosschecked:

2020-09-08

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE