Full Text:   <1742>

CLC number: 

On-line Access: 2021-06-25

Received: 2020-12-04

Revision Accepted: 2021-05-26

Crosschecked: 0000-00-00

Cited: 0

Clicked: 2854

Citations:  Bibtex RefMan EndNote GB/T7714

-   Go to

Article info.
Open peer comments

Journal of Zhejiang University SCIENCE C 1998 Vol.-1 No.-1 P.


Depth estimation using an improved stereo network

Author(s):  Wanpeng XU, Ling ZOU, Lingda WU, Yue QI, Zhaoyong QIAN

Affiliation(s):  Science and Technology on Complex Electronic System Simulation Laboratory, Space Engineering University, Beijing 101416, China; more

Corresponding email(s):   xuwp@pcl.ac.cn, zouling@bfa.edu.cn

Key Words:  Monocular depth estimation, Self-supervised, Image reconstruction

Wanpeng XU, Ling ZOU, Lingda WU, Yue QI, Zhaoyong QIAN. Depth estimation using an improved stereo network[J]. Frontiers of Information Technology & Electronic Engineering, 1998, -1(-1): .

@article{title="Depth estimation using an improved stereo network",
author="Wanpeng XU, Ling ZOU, Lingda WU, Yue QI, Zhaoyong QIAN",
journal="Frontiers of Information Technology & Electronic Engineering",
publisher="Zhejiang University Press & Springer",

%0 Journal Article
%T Depth estimation using an improved stereo network
%A Wanpeng XU
%A Ling ZOU
%A Lingda WU
%A Yue QI
%A Zhaoyong QIAN
%J Journal of Zhejiang University SCIENCE C
%V -1
%N -1
%@ 2095-9184
%D 1998
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2000676

T1 - Depth estimation using an improved stereo network
A1 - Wanpeng XU
A1 - Ling ZOU
A1 - Lingda WU
A1 - Yue QI
A1 - Zhaoyong QIAN
J0 - Journal of Zhejiang University Science C
VL - -1
IS - -1
SP -
EP -
%@ 2095-9184
Y1 - 1998
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2000676

self-supervised depth estimators present excellent results that are comparable to the fully supervised approaches to monocular depth estimation, by employing view synthesis between the target and reference images in the training data. ResNet, which has served as a backbone network has some structural deficiencies when applied to downstream fields, because its original purpose was to cope with classification problems. The low-texture area also makes performance suffer greatly. To address these problems, we propose a set of improvements that lead to superior predictions. First, we boost the information flow in the network and improve the ability to learn spatial structures by improving the network structures. Second, we use a binary mask to remove the pixels in low-texture areas between the target and reference images to more accurately reconstruct the image. Finally, we input the target and reference images randomly to expand the data set and pre-train it on ImageNet, so that the model obtains a favorable general feature representation. We demonstrate state-of-the-art performance on an Eigen split of the KITTI driving dataset using stereo pairs.

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Open peer comments: Debate/Discuss/Question/Opinion


Please provide your name, email address and a comment

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - 2022 Journal of Zhejiang University-SCIENCE