JZUS - Journal of Zhejiang University SCIENCE

Frontiers of Information Technology & Electronic Engineering

ISSN 2095-9184 (print), ISSN 2095-9230 (online)

2024 Vol.25 No.1 P.42-63

Prompt learning in computer vision: a survey

Yiming LEI, Jingqi LI, Zilong LI, Yuan CAO, Hongming SHAN

Shanghai Key Laboratory of Intelligent Information Processing, School of Computer Science, Fudan University, Shanghai 200438, China; Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China; MOE Frontiers Center for Brain Science, Fudan University, Shanghai 200433, China; Shanghai Center for Brain Science and Brain-Inspired Technology, Shanghai 201210, China

ymlei@fudan.edu.cn, hmshan@fudan.edu.cn

Abstract: Prompt learning has attracted broad attention in computer vision since the large pre-trained vision-language models (VLMs) exploded. Based on the close relationship between vision and language information built by VLM, prompt learning becomes a crucial technique in many important applications such as artificial intelligence generated content (AIGC). In this survey, we provide a progressive and comprehensive review of visual prompt learning as related to AIGC. We begin by introducing VLM, the foundation of visual prompt learning. Then, we review the vision prompt learning methods and prompt-guided generative models, and discuss how to improve the efficiency of adapting AIGC models to specific downstream tasks. Finally, we provide some promising research directions concerning prompt learning.

Key words: Prompt learning; Visual prompt tuning (VPT); Image generation; Image classification; Artificial intelligence generated content (AIGC)

Chinese Summary <26> 计算机视觉中的提示学习：综述

雷一鸣¹，李婧琦¹，李子龙¹，曹原¹，单洪明^2,3,4
¹上海市智能信息处理重点实验室，计算机科学技术学院，复旦大学，中国上海市，200438
²类脑智能科学与技术研究院，复旦大学，中国上海市，200433
³脑科学前沿科学中心，复旦大学，中国上海市，200433
⁴上海脑科学与类脑研究中心，中国上海市，201210
摘要：自大型预训练视觉-语言模型（VLM）爆发以来，提示学习已在计算机视觉领域引发广泛关注。基于VLM构建的视觉和语言信息之间的密切关系，提示学习成为许多重要应用领域（如人工智能内容生成（AIGC））中的关键技术。本综述循序渐进且全面地总结了与AIGC相关的视觉提示学习。首先介绍了VLM，它是视觉提示学习的基础。然后，回顾了视觉提示学习方法和提示引导生成模型，并讨论了如何提高将AIGC模型适用于下游特定任务的效率。最后，提供了一些有前景的关于提示学习的研究方向。

关键词组：提示学习；视觉提示微调；图像生成；图像分类；人工智能内容生成（AIGC）

Share this article to： More

Go to Contents

References:

Open peer comments: Debate/Discuss/Question/Opinion

<1>

DOI:

10.1631/FITEE.2300389

CLC number:

TP181

Download Full Text:

Click Here

Downloaded:

18850

Download summary:

Downloaded:

1372

Clicked:

7369

Cited:

On-line Access:

2024-08-27

Received:

2023-10-17

Revision Accepted:

2024-05-08

Crosschecked:

2023-10-17

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE

CONTENTS

INSTR. FOR AUTHOR

FOR REVIEWER

ABOUT JZUS

Publishing Service