The U-Net-like coarse-to-fine network design is currently the dominant choice for dense prediction tasks. Although this design can often achieve competitive performance, it suffers from some inherent limitations, such as training error propagation from low to high resolution and the dependency on the deeper and heavier backbones. To design an effective network that performs better, we instead propose Recurrent Multiscale Feature Modulation (R-MSFM), a new lightweight network design for self-supervised monocular depth estimation. R-MSFM extracts per-pixel features, builds a multiscale feature modulation module, and performs recurrent depth refinement through a parameter-shared decoder at a fixed resolution. This network design enables our R-MSFM to maintain a more lightweight architecture and fundamentally avoid error propagation caused by the coarse-to-fine design. Furthermore, we introduce the mask geometry consistency loss to facilitate our R-MSFM for geometry consistent depth learning. This loss penalizes the inconsistency of the estimated depths between adjacent views within the nonoccluded and nonstationary regions. Experimental results demonstrate the superiority of our proposed R-MSFM both at model size and inference speed, and show state-of-the-art results on two datasets: KITTI and Make3D.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TPAMI.2024.3420165 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!