Full Text:   <3>

CLC number: 

On-line Access: 2025-08-15

Received: 2025-03-14

Revision Accepted: 2025-06-04

Crosschecked: 0000-00-00

Cited: 0

Clicked: 9

Citations:  Bibtex RefMan EndNote GB/T7714

-   Go to

Article info.
Open peer comments

Journal of Zhejiang University SCIENCE C 1998 Vol.-1 No.-1 P.

http://doi.org/10.1631/FITEE.2500164


Temporal fidelity enhancement for video action recognition


Author(s):  Shaowu XU1, Xibin JIA1, Qianmei SUN2, Jing CHANG2

Affiliation(s):  1Faculty of Information Technology, Beijing Univerisity of Technology, Beijing 100124, China; more

Corresponding email(s):   swxu@emails.bjut.edu.cn, jiaxibin@bjut.edu.cn, sunqianmei5825@126.com, cj006006@126.com

Key Words:  Action recognition, Disentangled information bottleneck (DisenIB), Temporal modeling, Temporal fidelity


Shaowu XU1, Xibin JIA1, Qianmei SUN2, Jing CHANG2. Temporal fidelity enhancement for video action recognition[J]. Frontiers of Information Technology & Electronic Engineering, 1998, -1(-1): .

@article{title="Temporal fidelity enhancement for video action recognition",
author="Shaowu XU1, Xibin JIA1, Qianmei SUN2, Jing CHANG2",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="-1",
number="-1",
pages="",
year="1998",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2500164"
}

%0 Journal Article
%T Temporal fidelity enhancement for video action recognition
%A Shaowu XU1
%A Xibin JIA1
%A Qianmei SUN2
%A Jing CHANG2
%J Journal of Zhejiang University SCIENCE C
%V -1
%N -1
%P
%@ 2095-9184
%D 1998
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2500164

TY - JOUR
T1 - Temporal fidelity enhancement for video action recognition
A1 - Shaowu XU1
A1 - Xibin JIA1
A1 - Qianmei SUN2
A1 - Jing CHANG2
J0 - Journal of Zhejiang University Science C
VL - -1
IS - -1
SP -
EP -
%@ 2095-9184
Y1 - 1998
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2500164


Abstract: 
Temporal attention mechanisms are essential for video action recognition, enabling models to focus on semantically informative moments. However, these models frequently exhibit temporal infidelity-misaligned attention weights caused by limited training diversity and the absence of fine-grained temporal supervision. While video-level labels provide coarse-grained action guidance, the lack of detailed constraints allows attention noise to persist, especially in complex scenarios with distracting spatial elements. To address this issue, we propose temporal fidelity enhancement (TFE), a competitive learning paradigm based on disentangled information bottleneck (Dis-enIB) theory. TFE mitigates temporal infidelity by decoupling action-relevant semantics from spurious correlations through adversarial feature disentanglement. Using pre-trained representations for initialization, TFE establishes an adversarial process in which segments with elevated temporal attention compete against contexts with diminished action relevance. This mechanism ensures temporal consistency and enhances the fidelity of attention patterns with-out requiring explicit fine-grained supervision. Extensive studies on UCF-101, HMDB-51, and Charades benchmarks validate the effectiveness of our method, with significant improvements in action recognition accuracy.

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - 2025 Journal of Zhejiang University-SCIENCE