JZUS - Journal of Zhejiang University SCIENCE

Journal of Zhejiang University SCIENCE A

ISSN 1673-565X(Print), 1862-1775(Online), Monthly

2009 Vol.10 No.4 P.504-511

Regularized canonical correlation analysis with unlabeled data

Xi-chuan ZHOU, Hai-bin SHEN

Institute of VLSI Design, Zhejiang University, Hangzhou 310027, China

zhouxc@vlsi.zju.edu.cn; shenhb@yahoo.cn

Abstract: In standard canonical correlation analysis (CCA), the data from definite datasets are used to estimate their canonical correlation. In real applications, for example in bilingual text retrieval, it may have a great portion of data that we do not know which set it belongs to. This part of data is called unlabeled data, while the rest from definite datasets is called labeled data. We propose a novel method called regularized canonical correlation analysis (RCCA), which makes use of both labeled and unlabeled samples. Specifically, we learn to approximate canonical correlation as if all data were labeled. Then, we describe a generalization of RCCA for the multi-set situation. Experiments on four real world datasets, Yeast, Cloud, Iris, and Haberman, demonstrate that, by incorporating the unlabeled data points, the accuracy of correlation coefficients can be improved by over 30%.

Key words: Canonical correlation analysis (CCA), Regularization, Unlabeled data, Generalized canonical correlation analysis (GCCA)

Share this article to： More

Go to Contents

References:

Open peer comments: Debate/Discuss/Question/Opinion

<1>

DOI:

10.1631/jzus.A0820221

CLC number:

TP301

Download Full Text:

Click Here

Downloaded:

4249

Clicked:

7630

Cited:

On-line Access:

2024-08-27

Received:

2023-10-17

Revision Accepted:

2024-05-08

Crosschecked:

2008-12-26

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE

CONTENTS

INSTR. FOR AUTHOR

FOR REVIEWER

ABOUT JZUS