This article presents a simple yet effective multilayer perceptron (MLP) architecture, namely CycleMLP, which is a versatile neural backbone network capable of solving various tasks of dense visual predictions such as object detection, segmentation, and human pose estimation. Compared to recent advanced MLP architectures such as MLP-Mixer (Tolstikhin et al. 2021), ResMLP (Touvron et al. 2021), and gMLP (Liu et al. 2021), whose architectures are sensitive to image size and are infeasible in dense prediction tasks, CycleMLP has two appealing advantages: 1) CycleMLP can cope with various spatial sizes of images; 2) CycleMLP achieves linear computational complexity with respect to the image size by using local windows. In contrast, previous MLPs have O(N) computational complexity due to their full connections in space. 3) The relationship between convolution, multi-head self-attention in Transformer, and CycleMLP are discussed through an intuitive theoretical analysis. We build a family of models that can surpass state-of-the-art MLP and Transformer models e.g., Swin Transformer (Liu et al. 2021), while using fewer parameters and FLOPs. CycleMLP expands the MLP-like models' applicability, making them versatile backbone networks that achieve competitive results on dense prediction tasks For example, CycleMLP-Tiny outperforms Swin-Tiny by 1.3% mIoU on ADE20 K dataset with fewer FLOPs. Moreover, CycleMLP also shows excellent zero-shot robustness on ImageNet-C dataset.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TPAMI.2023.3303397DOI Listing

Publication Analysis

Top Keywords

cyclemlp
8
dense visual
8
visual predictions
8
liu 2021
8
image size
8
dense prediction
8
prediction tasks
8
computational complexity
8
flops cyclemlp
8
cyclemlp mlp-like
4

Similar Publications

This article presents a simple yet effective multilayer perceptron (MLP) architecture, namely CycleMLP, which is a versatile neural backbone network capable of solving various tasks of dense visual predictions such as object detection, segmentation, and human pose estimation. Compared to recent advanced MLP architectures such as MLP-Mixer (Tolstikhin et al. 2021), ResMLP (Touvron et al.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!