|
Frontiers of Information Technology & Electronic Engineering
ISSN 2095-9184 (print), ISSN 2095-9230 (online)
2023 Vol.24 No.10 P.1403-1415
Robust cross-modal retrieval with alignment refurbishment
Abstract: Cross-modal retrieval tries to achieve mutual retrieval between modalities by establishing consistent alignment for different modal data. Currently, many cross-modal retrieval methods have been proposed and have achieved excellent results; however, these are trained with clean cross-modal pairs, which are semantically matched but costly, compared with easily available data with noise alignment (i.e., paired but mismatched in semantics). When training these methods with noise-aligned data, the performance degrades dramatically. Therefore, we propose a robust cross-modal retrieval with alignment refurbishment (RCAR), which significantly reduces the impact of noise on the model. Specifically, RCAR first conducts multi-task learning to slow down the overfitting to the noise to make data separable. Then, RCAR uses a two-component beta-mixture model to divide them into clean and noise alignments and refurbishes the label according to the posterior probability of the noise-alignment component. In addition, we define partial and complete noises in the noise-alignment paradigm. Experimental results show that, compared with the popular cross-modal retrieval methods, RCAR achieves more robust performance with both types of noise.
Key words: Cross-modal retrieval; Robust learning; Alignment correction; Beta-mixture model
1南京理工大学计算机科学与工程学院,中国南京市,210094
2青岛大学数学与统计学院,中国青岛市,266071
摘要:跨模态检索通过为不同模态数据建立一致的对齐方式来实现模态间的相互检索。目前多种跨模态检索方法已被提出并取得良好性能。这些方法使用干净对齐的跨模态数据进行训练。虽然这些数据在语义上是匹配的,但相较于互联网上容易获得的噪声对齐的数据(即成对但在语义上不匹配),标注成本很高。当用噪声对齐的数据训练这些模型时,它们的性能会急剧下降。因此,本文提出一种对齐自修正的鲁棒跨模态检索算法(RCAR),显著降低了噪声数据对模型的影响。具体来说,RCAR首先进行多任务学习,减缓模型对噪声数据的过拟合,使数据分离。然后,利用两成分的贝塔混合模型将数据分为干净数据和噪声数据,并根据后验概率修正对齐标签。此外,在噪声对齐范式中定义两种噪声类型:部分噪声数据和完全噪声数据。实验结果表明,与当下流行的跨模态检索方法相比,RCAR在两种类型的噪声下都能取得更稳健的性能。
关键词组:
References:
Open peer comments: Debate/Discuss/Question/Opinion
<1>
DOI:
10.1631/FITEE.2200514
CLC number:
TP391
Download Full Text:
Downloaded:
4638
Clicked:
1767
Cited:
0
On-line Access:
2024-08-27
Received:
2023-10-17
Revision Accepted:
2024-05-08
Crosschecked:
2023-02-16