Publishing Service

Polishing & Checking

Frontiers of Information Technology & Electronic Engineering

ISSN 2095-9184 (print), ISSN 2095-9230 (online)

Few-shot exemplar-driven inpainting with parameter-efficient diffusion fine-tuning

Abstract: Text-to-image diffusion models have demonstrated impressive capabilities in image generation and have been effectively applied to image inpainting. While text prompt provides an intuitive guidance for conditional inpainting, users often seek the ability to inpaint a specific object with customized appearance by providing an exemplar image. Unfortunately, existing methods struggle to achieve high fidelity in exemplar-driven inpainting. To address this, we use a plug-and-play low-rank adaptation (LoRA) module based on a pretrained text-driven inpainting model. The LoRA module is dedicated to learn the exemplar-specific concepts through few-shot fine-tuning, bringing improved fitting capability to customized exemplar images, without intensive training on large-scale datasets. Additionally, we introduce GPT-4V prompting and prior noise initialization techniques to further facilitate the fidelity in inpainting results. In brief, the denoising diffusion process first starts with the noise derived from a composite exemplar–background image, and is subsequently guided by an expressive prompt generated from the exemplar using the GPT-4V model. Extensive experiments demonstrate that our method achieves state-of-the-art performance, qualitatively and quantitatively, offering users an exemplar-driven inpainting tool with enhanced customization capability.

Key words: Diffusion model; Image inpainting; Exemplar-driven; Few-shot fine-tuning

Chinese Summary  <7> 基于参数高效扩散微调的少样本参考图驱动图像补全

杨诗远1,顾峥2,郝文月1,汪毅1,蔡怀宇1,陈晓冬1
1天津大学精密仪器与光电子工程学院光电信息技术教育部重点实验室,中国天津市,300072
2南京大学计算机软件新技术国家重点实验室,中国南京市,210008
摘要:文本到图像的扩散模型在图像生成方面展现了卓越的能力,并已广泛应用于图像补全任务。尽管文本提示能够为有条件的图像补全提供直观指导,但用户往往希望通过提供参考图像为特定对象补全个性化外观。然而,现有的参考图驱动图像补全方法难以实现高保真度的补全效果。为解决这一问题,我们基于预训练的文本驱动图像补全模型提出一种即插即用的低秩适配(LoRA)模块。该模块通过少样本微调学习参考图像的特定特征,显著提升了对自定义参考图像的拟合能力,并且无需在大规模数据集上进行大量训练。此外,引入GPT-4V提示词和先验噪声初始化技术,进一步提升补全结果的保真度。简而言之,去噪扩散过程首先从由复合参考-背景图像派生的初始噪声开始,进而由GPT-4V从参考图中生成的丰富提示词引导后续生成过程。大量实验表明,我们的方法在定性和定量指标上都达到目前最高水平,为用户提供了一个具有更强定制化能力的参考图驱动图像补全工具。

关键词组:扩散模型;图像补全;参考图驱动;少样本微调


Share this article to: More

Go to Contents

References:

<Show All>

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





DOI:

10.1631/FITEE.2400395

CLC number:

TP183

Download Full Text:

Click Here

Downloaded:

1682

Download summary:

<Click Here> 

Downloaded:

222

Clicked:

999

Cited:

0

On-line Access:

2025-06-04

Received:

2024-03-14

Revision Accepted:

2024-10-25

Crosschecked:

2025-09-04

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE