Publishing Service

Polishing & Checking

Frontiers of Information Technology & Electronic Engineering

ISSN 2095-9184 (print), ISSN 2095-9230 (online)

Federated unsupervised representation learning

Abstract: To leverage the enormous amount of unlabeled data on distributed edge devices, we formulate a new problem in federated learning called federated unsupervised representation learning (FURL) to learn a common representation model without supervision while preserving data privacy. FURL poses two new challenges: (1) data distribution shift (non-independent and identically distributed, non-IID) among clients would make local models focus on different categories, leading to the inconsistency of representation spaces; (2) without unified information among the clients in FURL, the representations across clients would be misaligned. To address these challenges, we propose the federated contrastive averaging with dictionary and alignment (FedCA) algorithm. FedCA is composed of two key modules: a dictionary module to aggregate the representations of samples from each client which can be shared with all clients for consistency of representation space and an alignment module to align the representation of each client on a base model trained on public data. We adopt the contrastive approach for local model training. Through extensive experiments with three evaluation protocols in IID and non-IID settings, we demonstrate that FedCA outperforms all baselines with significant margins.

Key words: Federated learning; Unsupervised learning; Representation learning; Contrastive learning

Chinese Summary  <4> 联邦无监督表示学习

张凤达1,况琨1,陈隆1,游兆阳1,沈弢1,肖俊1
张寅1,吴超2,吴飞1,庄越挺1,李晓林3,4,5
1浙江大学计算机科学与技术学院,中国杭州市,310027
2浙江大学公共管理学院,中国杭州市,310027
3同盾科技,中国杭州市,310000
4中国科学院基础医学与肿瘤研究所,中国杭州市,310018
5杭州灵思智康科技有限公司,中国杭州市,310018
摘要:为利用分布式边缘设备上大量未标记数据,我们在联邦学习中提出一个称为联邦无监督表示学习(FURL)的新问题,以在没有监督的情况下学习通用表示模型,同时保护数据隐私。FURL提出了两个新挑战:(1)客户端之间的数据分布转移(非独立同分布)会使本地模型专注于不同的类别,从而导致表示空间的不一致;(2)如果FURL中客户端之间没有统一的信息,客户端之间的表示就会错位。为了应对这些挑战,我们提出带字典和对齐的联合对比平均(FedCA)算法。FedCA由两个关键模块组成:字典模块,用于聚合来自每个客户端的样本表示并与所有客户端共享,以实现表示空间的一致性;对齐模块,用于将每个客户端的表示与基于公共数据训练的基础模型对齐。我们采用对比方法进行局部模型训练,通过在3个数据集上独立同分布和非独立同分布设定下的大量实验,我们证明FedCA以显著的优势优于所有基线方法。

关键词组:联邦学习;无监督学习;表示学习;对比学习


Share this article to: More

Go to Contents

References:

<Show All>

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





DOI:

10.1631/FITEE.2200268

CLC number:

TP183

Download Full Text:

Click Here

Downloaded:

3937

Download summary:

<Click Here> 

Downloaded:

234

Clicked:

1133

Cited:

0

On-line Access:

2023-08-29

Received:

2022-06-21

Revision Accepted:

2023-08-29

Crosschecked:

2022-10-27

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE