Institute of Computer Science and Technology, Peking University, Beijing 100871, China; Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China; Institute of Information Science, Beijing Jiaotong University, Beijing 100044, China; National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China; Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; Department of Computer Science and Technology, Xi’an Jiaotong University, Xi’an 710049, China; School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China
pengyuxin@pku.edu.cn, wwzhu@tsinghua.edu.cn
Abstract: Cross-media analysis and reasoning is an active research area in computer science, and a promising direction for artificial intelligence. However, to the best of our knowledge, no existing work has summarized the state-of-the-art methods for cross-media analysis and reasoning or presented advances, challenges, and future directions for the field. To address these issues, we provide an overview as follows: (1) theory and model for cross-media uniform representation; (2) cross-media correlation understanding and deep mining; (3) cross-media knowledge graph construction and learning methodologies; (4) cross-media knowledge evolution and reasoning; (5) cross-media description and generation; (6) cross-media intelligent engines; and (7) cross-media intelligent applications. By presenting approaches, advances, and future directions in cross-media analysis and reasoning, our goal is not only to draw more attention to the state-of-the-art advances in the field, but also to provide technical insights by discussing the challenges and research directions in these areas.