Full Text:   <3481>

CLC number: TP37; TP391

On-line Access: 2024-08-27

Received: 2023-10-17

Revision Accepted: 2024-05-08

Crosschecked: 0000-00-00

Cited: 2

Clicked: 6527

Citations:  Bibtex RefMan EndNote GB/T7714

-   Go to

Article info.
Open peer comments

Journal of Zhejiang University SCIENCE A 2008 Vol.9 No.2 P.241-249

http://doi.org/10.1631/jzus.A071191


Understanding visual-auditory correlation from heterogeneous features for cross-media retrieval


Author(s):  Hong ZHANG, Yan-yun WANG, Hong PAN, Fei WU

Affiliation(s):  College of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan 430081, China; more

Corresponding email(s):   zhanghong_zju@yahoo.com.cn

Key Words:  Heterogeneity, Cross-media retrieval, Subspace optimization, Dynamic correlation update


Hong ZHANG, Yan-yun WANG, Hong PAN, Fei WU. Understanding visual-auditory correlation from heterogeneous features for cross-media retrieval[J]. Journal of Zhejiang University Science A, 2008, 9(2): 241-249.

@article{title="Understanding visual-auditory correlation from heterogeneous features for cross-media retrieval",
author="Hong ZHANG, Yan-yun WANG, Hong PAN, Fei WU",
journal="Journal of Zhejiang University Science A",
volume="9",
number="2",
pages="241-249",
year="2008",
publisher="Zhejiang University Press & Springer",
doi="10.1631/jzus.A071191"
}

%0 Journal Article
%T Understanding visual-auditory correlation from heterogeneous features for cross-media retrieval
%A Hong ZHANG
%A Yan-yun WANG
%A Hong PAN
%A Fei WU
%J Journal of Zhejiang University SCIENCE A
%V 9
%N 2
%P 241-249
%@ 1673-565X
%D 2008
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.A071191

TY - JOUR
T1 - Understanding visual-auditory correlation from heterogeneous features for cross-media retrieval
A1 - Hong ZHANG
A1 - Yan-yun WANG
A1 - Hong PAN
A1 - Fei WU
J0 - Journal of Zhejiang University Science A
VL - 9
IS - 2
SP - 241
EP - 249
%@ 1673-565X
Y1 - 2008
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.A071191


Abstract: 
cross-media retrieval is an interesting research topic, which seeks to remove the barriers among different modalities. To enable cross-media retrieval, it is needed to find the correlation measures between heterogeneous low-level features and to judge the semantic similarity. This paper presents a novel approach to learn cross-media correlation between visual features and auditory features for image-audio retrieval. A semi-supervised correlation preserving mapping (SSCPM) method is described to construct the isomorphic SSCPM subspace where canonical correlations between the original visual and auditory features are further preserved. subspace optimization algorithm is proposed to improve the local image cluster and audio cluster quality in an interactive way. A unique relevance feedback strategy is developed to update the knowledge of cross-media correlation by learning from user behaviors, so retrieval performance is enhanced in a progressive manner. Experimental results show that the performance of our approach is effective.

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1] Barnard, K., Duygulu, P., de Freitas, N., Forsyth, D., Blei, D., Jordan, M.I., 2003. Matching words and pictures. J. Machine Learning Research, 3(6):1107-1135.

[2] Chen, Z., Liu, W.Y., Zhang, F., Li, M.J., Zhang, H.J., 2001. Web mining for web image retrieval. J. Amer. Soc. Inf. Sci. & Tech., 52(10):831-839.

[3] Duygulu, P., Barnard, K., de Freitas, J.F.G., Forsyth, D.A., 2002. Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary. Proc. 7th European Conf. on Computer Vision, p.97-112.

[4] Hotelling, H., 1936. Relations between two sets of variables. Biometrika, 28:321-377.

[5] Jeon, J., Lavrenko, V., Manmatha, R., 2003. Automatic Image Annotation and Retrieval using Cross-media Relevance Models. Proc. Int. ACM Conf. on Research and Development in Information Retrieval, p.119-126.

[6] Lu, T.C., Chang, C.C., 2007. Color image retrieval technique based on color features and image bitmap. Int. J. Inf. Processing and Management, 43(2):461-472.

[7] McGurk, H., MacDonald, J., 1976. Hearing lips and seeing voices. Nature, 264:746-748.

[8] Slaney, M., Covell, M., 2000. FaceSync: A Linear Operator for Measuring Synchronization of Video Facial Images and Audio Tracks. Proc. Neural Information Processing Systems, p.814-820.

[9] Snoek, C., Worring, M., Smeulders, A.W.M., 2005. Early versus Late Fusion in Semantic Video Analysis. Proc. ACM Multimedia, p.399-402.

[10] Tan, B., Shen, X.H., Zhai, C.X., 2006. Mining Long-term Search History to Improve Search Accuracy. Proc. Int. Conf. on Knowledge Discovery and Data Mining, p.718-723.

[11] Wang, X.J., Ma, W.Y., Xue, G.R., Li, X., 2004. Multi-model Similarity Propagation and its Applications for Web Image Retrieval. Proc. ACM Multimedia, p.944-951.

[12] Wu, F., Zhang, H., Zhuang, Y.T., 2006. Learning Semantic Correlation for Cross-media Retrieval. Proc. Int. Conf. on Image Processing, p.1465-1468.

[13] Xing, E.P., Ng, A.Y., Jordan, M.I., Russell, S., 2003. Distance Metric Learning with Application to Clustering with Side-information. Proc. Neural Information Processing Systems, 15:505-512.

[14] Yang, J., Hauptmann, A., 2004. Naming Every Individual in News Video Monologues. Proc. ACM Multimedia, p.580-587.

[15] Ye, J.P., Li, Q., 2005. A two-stage linear discriminant analysis via QR-decomposition. IEEE Trans. on Pattern Anal. Machine Intell., 27(6):929-941.

[16] Zhang, C., Chen, T., 2002. An active learning framework for content-based information retrieval. IEEE Trans. on Multimedia, 4(2):260-268.

[17] Zhang, H., Zhuang, Y.T., Wu, F., 2007. Cross-modal Correlation Learning for Clustering on Image-Audio Dataset. Proc. ACM Multimedia, p.273-276.

[18] Zhang, Z.Y., Liu, Z.C., Adler, D., Cohen, M.F., Hanson, E., Shan, Y., 2004. Robust and rapid generation of animated faces from video images: a model-based modeling approach. Int. J. Computer Vision, 58(2):93-119.

[19] Zhao, R., Grosky, W.I., 2002. Negotiating the semantic gap: from feature maps to semantic landscapes. Pattern Recognition, 35(3):593-600.

[20] Zhao, X.Y., Zhuang, Y.T., Wu, F., 2002. Audio Clip Retrieval with Fast Relevance Feedback based on Constrained Fuzzy Clustering and Stored Index Table. Proc. 3rd Pacific-Rim Conf. on Multimedia, p.237-244.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - 2024 Journal of Zhejiang University-SCIENCE