CLC number:
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2021-04-29
Cited: 0
Clicked: 4627
Yueting Zhuang, Siliang Tang. Visual knowledge: an attempt to explore machine creativity[J]. Frontiers of Information Technology & Electronic Engineering, 2021, 22(5): 619-624.
@article{title="Visual knowledge: an attempt to explore machine creativity",
author="Yueting Zhuang, Siliang Tang",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="22",
number="5",
pages="619-624",
year="2021",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2100116"
}
%0 Journal Article
%T Visual knowledge: an attempt to explore machine creativity
%A Yueting Zhuang
%A Siliang Tang
%J Frontiers of Information Technology & Electronic Engineering
%V 22
%N 5
%P 619-624
%@ 2095-9184
%D 2021
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2100116
TY - JOUR
T1 - Visual knowledge: an attempt to explore machine creativity
A1 - Yueting Zhuang
A1 - Siliang Tang
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 22
IS - 5
SP - 619
EP - 624
%@ 2095-9184
Y1 - 2021
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2100116
Abstract: One question that has long puzzled the artificial intelligence (AI) community is: Can AI be creative? Or, can the reasoning process be creative? Starting at noetic science, this paper discusses the issues of visual knowledge representation and its potential applications to machine creativity. In this paper, we enumerate related research on imagery-thinking-based reasoning, then focus on a special type of visual knowledge representation, i.e., visual scene graph, and finally review the problem of visual scene graph construction and its potential applications in detail. All the evidence suggests that visual knowledge and visual thinking not only can improve the performance of current AI tasks but can be used in the practice of machine creativity.
[1]Arnheim R, 1997. Visual Thinking. University of California Press, San Francisco, USA.
[2]Bau D, Zhu JY, Wulff J, et al., 2019. Seeing what a GAN cannot generate. Proc IEEE/CVF Int Conf on Computer Vision, p.4501-4510.
[3]Chen L, Zhang HW, Xiao J, et al., 2019. Counterfactual critic multi-agent training for scene graph generation. Proc IEEE/CVF Int Conf on Computer Vision, p.4612-4622.
[4]Denis M, 1991. Imagery and thinking. In: Cornoldi C, McDaniel MA (Eds.), Imagery and Cognition. Springer, New York, NY, USA, p.103-131.
[5]Elgammal A, Liu BC, Elhoseiny M, et al., 2017. CAN: creative adversarial networks, generating “art” by learning about styles and deviating from style norms. https://arxiv.org/abs/1706.07068
[6]Gazzaniga MS, 1967. The split brain in man. Sci Am, 217(2):24-29.
[7]Gu JX, Zhao HD, Lin Z, et al., 2019. Scene graph generation with external knowledge and image reconstruction. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.1969-1978.
[8]Haurilet M, Roitberg A, Stiefelhagen R, 2019. It’s not about the journey; it’s about the destination: following soft paths under question-guidance for visual reasoning. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.1930-1939.
[9]Herzig R, Bar A, Xu HJ, et al., 2020. Learning canonical representations for scene graph to image generation. 16th European Conf on Computer Vision, p.210-227.
[10]Hudson DA, Manning CD, 2019. GQA: a new dataset for real-world visual reasoning and compositional question answering. https://arxiv.org/abs/1902.09506
[11]Johnson J, Gupta A, Li FF, 2018. Image generation from scene graphs. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.1219-1228.
[12]Kolodner J, 2014. Case-Based Reasoning. Morgan Kaufmann, San Mateo, USA.
[13]Krishna R, Zhu YK, Groth O, et al., 2017. Visual genome: connecting language and vision using crowdsourced dense image annotations. Int J Comput Vis, 123(1):32-73.
[14]Li ML, Zareian A, Zeng Q, et al., 2020. Cross-media structured common space for multimedia event extraction. https://arxiv.org/abs/2005.02472
[15]Li YL, Xu L, Huang XJ, et al., 2019. HAKE: human activity knowledge engine. https://arxiv.org/abs/1904.06539v2
[16]Liu DQ, Zhang HW, Zha ZJ, et al., 2019. Referring expression grounding by marginalizing scene graph likelihood. https://arxiv.org/abs/1906.03561v1
[17]McCarthy J, Minsky ML, Rochester N, et al., 2006. A proposal for the Dartmouth summer research project on artificial intelligence. AI Mag, 27(4):12-14.
[18]Mittal G, Agrawal S, Agarwal A, et al., 2019. Interactive image generation using scene graphs. https://arxiv.org/abs/1905.03743
[19]Mu Z, Tang S, Tan J, et al., 2021. Disentangled motif-aware graph learning for phrase grounding. Proc 35th AAAI Conf on Artificial Intelligence.
[20]Norcliffe-Brown W, Vafeais E, Parisot S, 2018. Learning conditioned graph structures for interpretable visual question answering. https://arxiv.org/abs/1806.07243v1
[21]Pan YH, 2019. On visual knowledge. Front Inform Technol Electron Eng, 20(8):1021-1025.
[22]Pan YH, 2020a. Miniaturized five fundamental issues about visual knowledge. Front Inform Technol Electron Eng, online.
[23]Pan YH, 2020b. Multiple knowledge representation of artificial intelligence. Engineering, 6(3):216-217.
[24]Radford A, Metz L, Chintala S, 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. https://arxiv.org/abs/1511.06434
[25]Shen K, Wu LF, Xu FL, et al., 2020. Hierarchical attention based spatial-temporal graph-to-sequence learning for grounded video description. Proc 29th Int Joint Conf on Artificial Intelligence, p.941-947.
[26]Tripathi S, Bhiwandiwalla A, Bastidas A, et al., 2019. Using scene graph context to improve image generation. https://arxiv.org/abs/1901.03762
[27]Yang JW, Lu JS, Lee S, et al., 2018. Graph R-CNN for scene graph generation. Proc 15th European Conf on Computer Vision, p.690-706.
[28]Yang X, Tang KH, Zhang HW, et al., 2019. Auto-encoding scene graphs for image captioning. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.10677-10686.
[29]Yang XY, Mei T, Xu YQ, et al., 2016. Automatic generation of visual-textual presentation layout. ACM Trans Multim Comput Commun Appl, 12(2):33.
[30]Yu RC, Li A, Morariu VI, et al., 2017. Visual relationship detection with internal and external linguistic knowledge distillation. Proc IEEE Int Conf on Computer Vision, p.1068-1076.
[31]Zareian A, Karaman S, Chang SF, 2020. Weakly supervised visual semantic parsing. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.3733-3742.
[32]Zhang HW, Kyaw Z, Chang SF, et al., 2017. Visual translation embedding network for visual relation detection. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.3107-3115.
[33]Zhang W, Wang XE, Tang S, et al., 2020. Relational graph learning for grounded video description generation. Proc 28th ACM Int Conf on Multimedia, p.3807-3828.
[34]Zhang W, Shi H, Tang S, et al., 2021. Consensus graph representation learning for better grounded image captioning. Proc 35th AAAI Conf on Artificial Intelligence.
Open peer comments: Debate/Discuss/Question/Opinion
<1>