|
Frontiers of Information Technology & Electronic Engineering
ISSN 2095-9184 (print), ISSN 2095-9230 (online)
2020 Vol.21 No.11 P.1639-1650
Latent source-specific generative factor learning for monaural speech separation using weighted-factor autoencoder
Abstract: Much recent progress in monaural speech separation (MSS) has been achieved through a series of deep learning architectures based on autoencoders, which use an encoder to condense the input signal into compressed features and then feed these features into a decoder to construct a specific audio source of interest. However, these approaches can neither learn generative factors of the original input for MSS nor construct each audio source in mixed speech. In this study, we propose a novel weighted-factor autoencoder (WFAE) model for MSS, which introduces a regularization loss in the objective function to isolate one source without containing other sources. By incorporating a latent attention mechanism and a supervised source constructor in the separation layer, WFAE can learn source-specific generative factors and a set of discriminative features for each source, leading to MSS performance improvement. Experiments on benchmark datasets show that our approach outperforms the existing methods. In terms of three important metrics, WFAE has great success on a relatively challenging MSS case, i.e., speaker-independent MSS.
Key words: Speech separation, Generative factors, Autoencoder, Deep learning
陈静静1,毛启容1,2,秦友才1,钱双庆1,郑志燊1
1江苏大学计算机科学与通信工程学院,中国镇江市,212013
2江苏省工业网络安全技术重点实验室,中国镇江市,212013
摘要:通过一系列基于自动编码器的深度学习网络结构,单通道语音分离方法最近取得诸多进展,其使用编码器将输入信号压缩为中间特征,再把这些特征送入解码器重构感兴趣的特定音频源。然而,这些方法既无法为单通道语音分离学习原始输入的生成因子,也无法构造混合语音中的所有音频源。本文提出一个新的加权因子自动编码器模型,在目标函数中引入正则化损失以约束目标源,摒除其他信号源。通过在分离层中引入潜在注意力机制和监督源构造器,加权因子自动编码器可为每一个信号源习得特定于源的生成因子和一组鉴别性特征,从而提升单通道语音分离性能。在基准数据集上的实验表明所提方法优于现有方法。就3个重要指标而言,加权因子自动编码器在相对更具挑战性的任务(与说话人无关的单通道语音分离)上取得巨大成功。
关键词组:
References:
Open peer comments: Debate/Discuss/Question/Opinion
<1>
DOI:
10.1631/FITEE.2000019
CLC number:
TN912.3
Download Full Text:
Downloaded:
8300
Download summary:
<Click Here>Downloaded:
1781Clicked:
5288
Cited:
0
On-line Access:
2024-08-27
Received:
2023-10-17
Revision Accepted:
2024-05-08
Crosschecked:
2020-09-08