JZUS - Journal of Zhejiang University SCIENCE

Frontiers of Information Technology & Electronic Engineering

ISSN 2095-9184 (print), ISSN 2095-9230 (online)

2023 Vol.24 No.10 P.1403-1415

Robust cross-modal retrieval with alignment refurbishment

Jinyi GUO, Jieyu DING

School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China; School of Mathematics and Statistics, Qingdao University, Qingdao 266071, China

jinyi_g@njust.edu.cn, djy@qdu.edu.cn

Abstract: Cross-modal retrieval tries to achieve mutual retrieval between modalities by establishing consistent alignment for different modal data. Currently, many cross-modal retrieval methods have been proposed and have achieved excellent results; however, these are trained with clean cross-modal pairs, which are semantically matched but costly, compared with easily available data with noise alignment (i.e., paired but mismatched in semantics). When training these methods with noise-aligned data, the performance degrades dramatically. Therefore, we propose a robust cross-modal retrieval with alignment refurbishment (RCAR), which significantly reduces the impact of noise on the model. Specifically, RCAR first conducts multi-task learning to slow down the overfitting to the noise to make data separable. Then, RCAR uses a two-component beta-mixture model to divide them into clean and noise alignments and refurbishes the label according to the posterior probability of the noise-alignment component. In addition, we define partial and complete noises in the noise-alignment paradigm. Experimental results show that, compared with the popular cross-modal retrieval methods, RCAR achieves more robust performance with both types of noise.

Key words: Cross-modal retrieval; Robust learning; Alignment correction; Beta-mixture model

Chinese Summary <30> 基于对齐自修正的鲁棒跨模态检索

郭金一¹，丁洁玉²
¹南京理工大学计算机科学与工程学院，中国南京市，210094
²青岛大学数学与统计学院，中国青岛市，266071
摘要：跨模态检索通过为不同模态数据建立一致的对齐方式来实现模态间的相互检索。目前多种跨模态检索方法已被提出并取得良好性能。这些方法使用干净对齐的跨模态数据进行训练。虽然这些数据在语义上是匹配的，但相较于互联网上容易获得的噪声对齐的数据（即成对但在语义上不匹配），标注成本很高。当用噪声对齐的数据训练这些模型时，它们的性能会急剧下降。因此，本文提出一种对齐自修正的鲁棒跨模态检索算法（RCAR），显著降低了噪声数据对模型的影响。具体来说，RCAR首先进行多任务学习，减缓模型对噪声数据的过拟合，使数据分离。然后，利用两成分的贝塔混合模型将数据分为干净数据和噪声数据，并根据后验概率修正对齐标签。此外，在噪声对齐范式中定义两种噪声类型：部分噪声数据和完全噪声数据。实验结果表明，与当下流行的跨模态检索方法相比，RCAR在两种类型的噪声下都能取得更稳健的性能。

关键词组：跨模态检索；鲁棒学习；对齐修正；贝塔混合模型

Share this article to： More

Go to Contents

References:

Open peer comments: Debate/Discuss/Question/Opinion

<1>

DOI:

10.1631/FITEE.2200514

CLC number:

TP391

Download Full Text:

Click Here

Downloaded:

5525

Clicked:

2953

Cited:

On-line Access:

2024-08-27

Received:

2023-10-17

Revision Accepted:

2024-05-08

Crosschecked:

2023-02-16

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE

CONTENTS

INSTR. FOR AUTHOR

FOR REVIEWER

ABOUT JZUS

Publishing Service