Publishing Service

Polishing & Checking

Frontiers of Information Technology & Electronic Engineering

ISSN 2095-9184 (print), ISSN 2095-9230 (online)

Temporal fidelity enhancement for video action recognition

Abstract: Temporal attention mechanisms are essential for video action recognition, enabling models to focus on semantically informative moments. However, these models frequently exhibit temporal infidelity—misaligned attention weights caused by limited training diversity and the absence of fine-grained temporal supervision. While video-level labels provide coarse-grained action guidance, the lack of detailed constraints allows attention noise to persist, especially in complex scenarios with distracting spatial elements. To address this issue, we propose temporal fidelity enhancement (TFE), a competitive learning paradigm based on the disentangled information bottleneck (DisenIB) theory. TFE mitigates temporal infidelity by decoupling action-relevant semantics from spurious correlations through adversarial feature disentanglement. Using pre-trained representations for initialization, TFE establishes an adversarial process in which segments with elevated temporal attention compete against contexts with diminished action relevance. This mechanism ensures temporal consistency and enhances the fidelity of attention patterns without requiring explicit fine-grained supervision. Extensive studies on UCF101, HMDB-51, and Charades benchmarks validate the effectiveness of our method, with significant improvements in action recognition accuracy.

Key words: Action recognition; Disentangled information bottleneck; Temporal modeling; Temporal fidelity

Chinese Summary  <7> 视频行为识别中的时序保真度增强

许少武1,贾熹滨1,孙倩美2,常晶2
1北京工业大学信息学部,中国北京市,100124
2首都医科大学附属北京朝阳医院,中国北京市,100020
摘要:时序注意力机制对于视频行为识别至关重要,它使模型能够聚焦于具有丰富语义信息的关键片段。然而,这些模型常因训练多样性有限和缺乏细粒度时序监督而出现时序失真现象--即注意力权重与语义内容错位。尽管视频级标签提供了粗粒度的行为指引,但细节约束的缺失导致注意力噪声持续存在,尤其在包含干扰性空间元素的复杂场景中。针对这一问题,本文提出时序保真度增强(TFE)——一种基于解耦信息瓶颈(DisenIB)理论的对抗性学习范式。TFE通过对抗性特征解耦将行为相关语义与虚假相关性分离,从而缓解时序失真问题。该方法利用预训练表征进行初始化,建立对抗学习流程,即高时序注意力片段与行为相关性弱化的上下文相互竞争。该方法无需细粒度监督标签即可确保时序一致性,并提升注意力权重的保真度。在UCF101、HMDB-51和Charades基准数据集上的大量实验验证了该方法的有效性,结果表明TFE可令行为识别准确率显著提升。

关键词组:行为识别;解耦信息瓶颈;时序建模;时序保真度


Share this article to: More

Go to Contents

References:

<Show All>

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





DOI:

10.1631/FITEE.2500164

CLC number:

TP391.41

Download Full Text:

Click Here

Downloaded:

316

Download summary:

<Click Here> 

Downloaded:

190

Clicked:

430

Cited:

0

On-line Access:

2025-06-04

Received:

2025-05-14

Revision Accepted:

2025-06-04

Crosschecked:

2025-09-04

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE