|
Frontiers of Information Technology & Electronic Engineering
ISSN 2095-9184 (print), ISSN 2095-9230 (online)
2023 Vol.24 No.8 P.1181-1193
Federated unsupervised representation learning
Abstract: To leverage the enormous amount of unlabeled data on distributed edge devices, we formulate a new problem in federated learning called federated unsupervised representation learning (FURL) to learn a common representation model without supervision while preserving data privacy. FURL poses two new challenges: (1) data distribution shift (non-independent and identically distributed, non-IID) among clients would make local models focus on different categories, leading to the inconsistency of representation spaces; (2) without unified information among the clients in FURL, the representations across clients would be misaligned. To address these challenges, we propose the federated contrastive averaging with dictionary and alignment (FedCA) algorithm. FedCA is composed of two key modules: a dictionary module to aggregate the representations of samples from each client which can be shared with all clients for consistency of representation space and an alignment module to align the representation of each client on a base model trained on public data. We adopt the contrastive approach for local model training. Through extensive experiments with three evaluation protocols in IID and non-IID settings, we demonstrate that FedCA outperforms all baselines with significant margins.
Key words: Federated learning; Unsupervised learning; Representation learning; Contrastive learning
张寅1,吴超2,吴飞1,庄越挺1,李晓林3,4,5
1浙江大学计算机科学与技术学院,中国杭州市,310027
2浙江大学公共管理学院,中国杭州市,310027
3同盾科技,中国杭州市,310000
4中国科学院基础医学与肿瘤研究所,中国杭州市,310018
5杭州灵思智康科技有限公司,中国杭州市,310018
摘要:为利用分布式边缘设备上大量未标记数据,我们在联邦学习中提出一个称为联邦无监督表示学习(FURL)的新问题,以在没有监督的情况下学习通用表示模型,同时保护数据隐私。FURL提出了两个新挑战:(1)客户端之间的数据分布转移(非独立同分布)会使本地模型专注于不同的类别,从而导致表示空间的不一致;(2)如果FURL中客户端之间没有统一的信息,客户端之间的表示就会错位。为了应对这些挑战,我们提出带字典和对齐的联合对比平均(FedCA)算法。FedCA由两个关键模块组成:字典模块,用于聚合来自每个客户端的样本表示并与所有客户端共享,以实现表示空间的一致性;对齐模块,用于将每个客户端的表示与基于公共数据训练的基础模型对齐。我们采用对比方法进行局部模型训练,通过在3个数据集上独立同分布和非独立同分布设定下的大量实验,我们证明FedCA以显著的优势优于所有基线方法。
关键词组:
References:
Open peer comments: Debate/Discuss/Question/Opinion
<1>
DOI:
10.1631/FITEE.2200268
CLC number:
TP183
Download Full Text:
Downloaded:
6520
Download summary:
<Click Here>Downloaded:
444Clicked:
2217
Cited:
0
On-line Access:
2024-08-27
Received:
2023-10-17
Revision Accepted:
2024-05-08
Crosschecked:
2022-10-27