CLC number: TP37; TP391
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 0000-00-00
Cited: 2
Clicked: 6527
Hong ZHANG, Yan-yun WANG, Hong PAN, Fei WU. Understanding visual-auditory correlation from heterogeneous features for cross-media retrieval[J]. Journal of Zhejiang University Science A, 2008, 9(2): 241-249.
@article{title="Understanding visual-auditory correlation from heterogeneous features for cross-media retrieval",
author="Hong ZHANG, Yan-yun WANG, Hong PAN, Fei WU",
journal="Journal of Zhejiang University Science A",
volume="9",
number="2",
pages="241-249",
year="2008",
publisher="Zhejiang University Press & Springer",
doi="10.1631/jzus.A071191"
}
%0 Journal Article
%T Understanding visual-auditory correlation from heterogeneous features for cross-media retrieval
%A Hong ZHANG
%A Yan-yun WANG
%A Hong PAN
%A Fei WU
%J Journal of Zhejiang University SCIENCE A
%V 9
%N 2
%P 241-249
%@ 1673-565X
%D 2008
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.A071191
TY - JOUR
T1 - Understanding visual-auditory correlation from heterogeneous features for cross-media retrieval
A1 - Hong ZHANG
A1 - Yan-yun WANG
A1 - Hong PAN
A1 - Fei WU
J0 - Journal of Zhejiang University Science A
VL - 9
IS - 2
SP - 241
EP - 249
%@ 1673-565X
Y1 - 2008
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.A071191
Abstract: cross-media retrieval is an interesting research topic, which seeks to remove the barriers among different modalities. To enable cross-media retrieval, it is needed to find the correlation measures between heterogeneous low-level features and to judge the semantic similarity. This paper presents a novel approach to learn cross-media correlation between visual features and auditory features for image-audio retrieval. A semi-supervised correlation preserving mapping (SSCPM) method is described to construct the isomorphic SSCPM subspace where canonical correlations between the original visual and auditory features are further preserved. subspace optimization algorithm is proposed to improve the local image cluster and audio cluster quality in an interactive way. A unique relevance feedback strategy is developed to update the knowledge of cross-media correlation by learning from user behaviors, so retrieval performance is enhanced in a progressive manner. Experimental results show that the performance of our approach is effective.
[1] Barnard, K., Duygulu, P., de Freitas, N., Forsyth, D., Blei, D., Jordan, M.I., 2003. Matching words and pictures. J. Machine Learning Research, 3(6):1107-1135.
[2] Chen, Z., Liu, W.Y., Zhang, F., Li, M.J., Zhang, H.J., 2001. Web mining for web image retrieval. J. Amer. Soc. Inf. Sci. & Tech., 52(10):831-839.
[3] Duygulu, P., Barnard, K., de Freitas, J.F.G., Forsyth, D.A., 2002. Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary. Proc. 7th European Conf. on Computer Vision, p.97-112.
[4] Hotelling, H., 1936. Relations between two sets of variables. Biometrika, 28:321-377.
[5] Jeon, J., Lavrenko, V., Manmatha, R., 2003. Automatic Image Annotation and Retrieval using Cross-media Relevance Models. Proc. Int. ACM Conf. on Research and Development in Information Retrieval, p.119-126.
[6] Lu, T.C., Chang, C.C., 2007. Color image retrieval technique based on color features and image bitmap. Int. J. Inf. Processing and Management, 43(2):461-472.
[7] McGurk, H., MacDonald, J., 1976. Hearing lips and seeing voices. Nature, 264:746-748.
[8] Slaney, M., Covell, M., 2000. FaceSync: A Linear Operator for Measuring Synchronization of Video Facial Images and Audio Tracks. Proc. Neural Information Processing Systems, p.814-820.
[9] Snoek, C., Worring, M., Smeulders, A.W.M., 2005. Early versus Late Fusion in Semantic Video Analysis. Proc. ACM Multimedia, p.399-402.
[10] Tan, B., Shen, X.H., Zhai, C.X., 2006. Mining Long-term Search History to Improve Search Accuracy. Proc. Int. Conf. on Knowledge Discovery and Data Mining, p.718-723.
[11] Wang, X.J., Ma, W.Y., Xue, G.R., Li, X., 2004. Multi-model Similarity Propagation and its Applications for Web Image Retrieval. Proc. ACM Multimedia, p.944-951.
[12] Wu, F., Zhang, H., Zhuang, Y.T., 2006. Learning Semantic Correlation for Cross-media Retrieval. Proc. Int. Conf. on Image Processing, p.1465-1468.
[13] Xing, E.P., Ng, A.Y., Jordan, M.I., Russell, S., 2003. Distance Metric Learning with Application to Clustering with Side-information. Proc. Neural Information Processing Systems, 15:505-512.
[14] Yang, J., Hauptmann, A., 2004. Naming Every Individual in News Video Monologues. Proc. ACM Multimedia, p.580-587.
[15] Ye, J.P., Li, Q., 2005. A two-stage linear discriminant analysis via QR-decomposition. IEEE Trans. on Pattern Anal. Machine Intell., 27(6):929-941.
[16] Zhang, C., Chen, T., 2002. An active learning framework for content-based information retrieval. IEEE Trans. on Multimedia, 4(2):260-268.
[17] Zhang, H., Zhuang, Y.T., Wu, F., 2007. Cross-modal Correlation Learning for Clustering on Image-Audio Dataset. Proc. ACM Multimedia, p.273-276.
[18] Zhang, Z.Y., Liu, Z.C., Adler, D., Cohen, M.F., Hanson, E., Shan, Y., 2004. Robust and rapid generation of animated faces from video images: a model-based modeling approach. Int. J. Computer Vision, 58(2):93-119.
[19] Zhao, R., Grosky, W.I., 2002. Negotiating the semantic gap: from feature maps to semantic landscapes. Pattern Recognition, 35(3):593-600.
[20] Zhao, X.Y., Zhuang, Y.T., Wu, F., 2002. Audio Clip Retrieval with Fast Relevance Feedback based on Constrained Fuzzy Clustering and Stored Index Table. Proc. 3rd Pacific-Rim Conf. on Multimedia, p.237-244.
Open peer comments: Debate/Discuss/Question/Opinion
<1>