CLC number:
On-line Access: 2021-06-25
Received: 2020-12-04
Revision Accepted: 2021-05-26
Crosschecked: 0000-00-00
Cited: 0
Clicked: 2854
Wanpeng XU, Ling ZOU, Lingda WU, Yue QI, Zhaoyong QIAN. Depth estimation using an improved stereo network[J]. Frontiers of Information Technology & Electronic Engineering, 1998, -1(-1): .
@article{title="Depth estimation using an improved stereo network",
author="Wanpeng XU, Ling ZOU, Lingda WU, Yue QI, Zhaoyong QIAN",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="-1",
number="-1",
pages="",
year="1998",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2000676"
}
%0 Journal Article
%T Depth estimation using an improved stereo network
%A Wanpeng XU
%A Ling ZOU
%A Lingda WU
%A Yue QI
%A Zhaoyong QIAN
%J Journal of Zhejiang University SCIENCE C
%V -1
%N -1
%P
%@ 2095-9184
%D 1998
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2000676
TY - JOUR
T1 - Depth estimation using an improved stereo network
A1 - Wanpeng XU
A1 - Ling ZOU
A1 - Lingda WU
A1 - Yue QI
A1 - Zhaoyong QIAN
J0 - Journal of Zhejiang University Science C
VL - -1
IS - -1
SP -
EP -
%@ 2095-9184
Y1 - 1998
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2000676
Abstract: self-supervised depth estimators present excellent results that are comparable to the fully supervised
approaches to monocular depth estimation, by employing view synthesis between the target and reference images in
the training data. ResNet, which has served as a backbone network has some structural deficiencies when applied
to downstream fields, because its original purpose was to cope with classification problems. The low-texture area
also makes performance suffer greatly. To address these problems, we propose a set of improvements that lead to
superior predictions. First, we boost the information flow in the network and improve the ability to learn spatial
structures by improving the network structures. Second, we use a binary mask to remove the pixels in low-texture
areas between the target and reference images to more accurately reconstruct the image. Finally, we input the target
and reference images randomly to expand the data set and pre-train it on ImageNet, so that the model obtains
a favorable general feature representation. We demonstrate state-of-the-art performance on an Eigen split of the
KITTI driving dataset using stereo pairs.
Open peer comments: Debate/Discuss/Question/Opinion
<1>