Full Text:   <17>

CLC number: 

On-line Access: 2025-04-17

Received: 2024-10-29

Revision Accepted: 2025-02-09

Crosschecked: 0000-00-00

Cited: 0

Clicked: 24

Citations:  Bibtex RefMan EndNote GB/T7714

-   Go to

Article info.
Open peer comments

Journal of Zhejiang University SCIENCE C 1998 Vol.-1 No.-1 P.

http://doi.org/10.1631/FITEE.2400960


End-to-end object detection using a query-selection encoder with hierarchical feature-aware attention


Author(s):  Zuyi WANG1, Zhimeng ZHENG1, Jun MENG1, 2, Li XU1, 2

Affiliation(s):  1College of Electrical Engineering, Zhejiang University, Hangzhou 310027, China; more

Corresponding email(s):   junmeng@zju.edu.cn, xupower@zju.edu.cn

Key Words:  End-to-end object detection, Query-selection encoder, Hierarchical feature-aware attention


Zuyi WANG1, Zhimeng ZHENG1, Jun MENG1,2, Li XU1,2. End-to-end object detection using a query-selection encoder with hierarchical feature-aware attention[J]. Frontiers of Information Technology & Electronic Engineering, 1998, -1(-1): .

@article{title="End-to-end object detection using a query-selection encoder with hierarchical feature-aware attention",
author="Zuyi WANG1, Zhimeng ZHENG1, Jun MENG1,2, Li XU1,2",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="-1",
number="-1",
pages="",
year="1998",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2400960"
}

%0 Journal Article
%T End-to-end object detection using a query-selection encoder with hierarchical feature-aware attention
%A Zuyi WANG1
%A Zhimeng ZHENG1
%A Jun MENG1
%A
2
%A Li XU1
%A
2
%J Journal of Zhejiang University SCIENCE C
%V -1
%N -1
%P
%@ 2095-9184
%D 1998
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2400960

TY - JOUR
T1 - End-to-end object detection using a query-selection encoder with hierarchical feature-aware attention
A1 - Zuyi WANG1
A1 - Zhimeng ZHENG1
A1 - Jun MENG1
A1 -
2
A1 - Li XU1
A1 -
2
J0 - Journal of Zhejiang University Science C
VL - -1
IS - -1
SP -
EP -
%@ 2095-9184
Y1 - 1998
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2400960


Abstract: 
end-to-end object detection methods have attracted extensive interest recently since they alleviate the need for complicated human-designed components and simplify the detection pipeline. However, these methods suffer from slower training convergence and inferior detection performance compared to conventional detectors, as their feature fusion and selection processes are constrained by insufficient positive supervision. To address this issue, we introduce a novel query-selection encoder (QSE) designed for end-to-end object detectors to improve training convergence speed and detection accuracy. The QSE is composed of multiple encoder layers stacked on top of the backbone. A lightweight head network is added after each encoder layer to continuously optimize features in a cascading manner, providing more positive supervision for efficient training. Additionally, a hierarchical featureaware attention (HFA) mechanism is incorporated in each encoder layer, including in-level feature attention and cross-level feature attention, to enhance the interaction between features from different levels. HFA can effectively suppress similar feature representations and highlight discriminative ones, thereby accelerating the feature selection process. Our method is highly versatile in accommodating both CNN-based and transformer-based detectors. Extensive experiments were conducted on the popular benchmark datasets MS COCO, CrowdHuman and PASCAL VOC to demonstrate the effectiveness of our method. The results showed that CNN-based and transformer-based detectors using QSE can achieve better end-to-end performance in a shorter training setting.

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - 2025 Journal of Zhejiang University-SCIENCE