Full Text:   <3384>

CLC number: TP301

On-line Access: 

Received: 2008-03-25

Revision Accepted: 2008-06-01

Crosschecked: 2008-12-26

Cited: 1

Clicked: 5924

Citations:  Bibtex RefMan EndNote GB/T7714

-   Go to

Article info.
1. Reference List
Open peer comments

Journal of Zhejiang University SCIENCE A 2009 Vol.10 No.4 P.504-511


Regularized canonical correlation analysis with unlabeled data

Author(s):  Xi-chuan ZHOU, Hai-bin SHEN

Affiliation(s):  Institute of VLSI Design, Zhejiang University, Hangzhou 310027, China

Corresponding email(s):   zhouxc@vlsi.zju.edu.cn, shenhb@yahoo.cn

Key Words:  Canonical correlation analysis (CCA), Regularization, Unlabeled data, Generalized canonical correlation analysis (GCCA)

Xi-chuan ZHOU, Hai-bin SHEN. Regularized canonical correlation analysis with unlabeled data[J]. Journal of Zhejiang University Science A, 2009, 10(4): 504-511.

@article{title="Regularized canonical correlation analysis with unlabeled data",
author="Xi-chuan ZHOU, Hai-bin SHEN",
journal="Journal of Zhejiang University Science A",
publisher="Zhejiang University Press & Springer",

%0 Journal Article
%T Regularized canonical correlation analysis with unlabeled data
%A Xi-chuan ZHOU
%A Hai-bin SHEN
%J Journal of Zhejiang University SCIENCE A
%V 10
%N 4
%P 504-511
%@ 1673-565X
%D 2009
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.A0820221

T1 - Regularized canonical correlation analysis with unlabeled data
A1 - Xi-chuan ZHOU
A1 - Hai-bin SHEN
J0 - Journal of Zhejiang University Science A
VL - 10
IS - 4
SP - 504
EP - 511
%@ 1673-565X
Y1 - 2009
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.A0820221

In standard canonical correlation analysis (CCA), the data from definite datasets are used to estimate their canonical correlation. In real applications, for example in bilingual text retrieval, it may have a great portion of data that we do not know which set it belongs to. This part of data is called unlabeled data, while the rest from definite datasets is called labeled data. We propose a novel method called regularized canonical correlation analysis (RCCA), which makes use of both labeled and unlabeled samples. Specifically, we learn to approximate canonical correlation as if all data were labeled. Then, we describe a generalization of RCCA for the multi-set situation. Experiments on four real world datasets, Yeast, Cloud, Iris, and Haberman, demonstrate that, by incorporating the unlabeled data points, the accuracy of correlation coefficients can be improved by over 30%.

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article


[1] Bach, F.R., Jordan, M.I., 2005. A Probabilistic Interpretation of Canonical Correlation Analysis. Technical Report. Department of Statistics, University of California, Berkeley, CA.

[2] Cohen, J., West, S.G., Cohen, P., Aiken, L., 2002. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. Lawrence Erlbaum Associates, Inc., Mahwah, New Jersey, USA.

[3] Gestel, T.V., Suykens, J., Brabanter, J.D., 2001. Kernel Canonical Correlation Analysis and Least Squares Support Vector Machines. Int. Conf. on Artificial Neural Networks, 2130:384-389.

[4] Gittins, R., 1985. Canonical analysis: a review with applications in ecology. Psychometrika, 51(3):495-497.

[5] Gou, Z.K., Fyfe, C., 2004. A canonical correlation neural network for multicollinearity and functional data. Neural Networks, 17(2):285-293.

[6] Hardoon, D., Szedmak, S., Shawe-Taylor, J., 2004. Canonical correlation analysis: an overview with application to learning methods. Neural Comput., 16(12):2639-2664.

[7] Hotelling, H., 1936. Relations between two sets of variants. Biometrika, 28(3-4):321-377.

[8] Kettenring, J., 1971. Canonical analysis of several sets of variables. Biometrika, 58(3):433-451.

[9] Kuss, M., Graepel, T., 2003. The Geometry of Kernel Canonical Correlation Analysis. Max Planck Institute for Biological Cybernetics, Tübingen, Germany.

[10] Shawe-Taylor, J., Cristianini, N., 2004. Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge, UK.

[11] Vert, J., Kanehisa, M., 2003. Graph-driven Features Extraction from Micro-array Data Using Diffusion Kernels and Kernel CCA. Advances in Neural Information Processing Systems. MIT Press, Cambridge, MA.

[12] Vinokourov, A., Shawe-Taylor, J., Cristianini, N., 2003. Inferring a Semantic Representation of Text via Cross-language Correlation Analysis. Advances in Neural Information Processing Systems. MIT Press, Cambridge, MA.

[13] Yamanishi, Y., Vert, J., Nakaya, A., Kanehisa, M., 2003. Extraction of correlated gene clusters from multiple genomic data by generalized kernel canonical. Bioinformatics, 19(Suppl. 1):323-330.

Open peer comments: Debate/Discuss/Question/Opinion


Please provide your name, email address and a comment

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - 2024 Journal of Zhejiang University-SCIENCE