|
Frontiers of Information Technology & Electronic Engineering
ISSN 2095-9184 (print), ISSN 2095-9230 (online)
2024 Vol.25 No.1 P.42-63
Prompt learning in computer vision: a survey
Abstract: Prompt learning has attracted broad attention in computer vision since the large pre-trained vision-language models (VLMs) exploded. Based on the close relationship between vision and language information built by VLM, prompt learning becomes a crucial technique in many important applications such as artificial intelligence generated content (AIGC). In this survey, we provide a progressive and comprehensive review of visual prompt learning as related to AIGC. We begin by introducing VLM, the foundation of visual prompt learning. Then, we review the vision prompt learning methods and prompt-guided generative models, and discuss how to improve the efficiency of adapting AIGC models to specific downstream tasks. Finally, we provide some promising research directions concerning prompt learning.
Key words: Prompt learning; Visual prompt tuning (VPT); Image generation; Image classification; Artificial intelligence generated content (AIGC)
1上海市智能信息处理重点实验室,计算机科学技术学院,复旦大学,中国上海市,200438
2类脑智能科学与技术研究院,复旦大学,中国上海市,200433
3脑科学前沿科学中心,复旦大学,中国上海市,200433
4上海脑科学与类脑研究中心,中国上海市,201210
摘要:自大型预训练视觉-语言模型(VLM)爆发以来,提示学习已在计算机视觉领域引发广泛关注。基于VLM构建的视觉和语言信息之间的密切关系,提示学习成为许多重要应用领域(如人工智能内容生成(AIGC))中的关键技术。本综述循序渐进且全面地总结了与AIGC相关的视觉提示学习。首先介绍了VLM,它是视觉提示学习的基础。然后,回顾了视觉提示学习方法和提示引导生成模型,并讨论了如何提高将AIGC模型适用于下游特定任务的效率。最后,提供了一些有前景的关于提示学习的研究方向。
关键词组:
References:
Open peer comments: Debate/Discuss/Question/Opinion
<1>
DOI:
10.1631/FITEE.2300389
CLC number:
TP181
Download Full Text:
Downloaded:
3309
Download summary:
<Click Here>Downloaded:
322Clicked:
3269
Cited:
0
On-line Access:
2024-08-27
Received:
2023-10-17
Revision Accepted:
2024-05-08
Crosschecked:
2023-10-17