Frequency-Assisted Local Attention in Lower Layers of Visual Transformers.

Int J Neural Syst

School of Mechanical Engineering and Automation, Northeastern University, Wenhua Road, Shen Yang, Liao Ning, P. R. China.

Published: April 2025

Since vision transformers excel at establishing global relationships between features, they play an important role in current vision tasks. However, the global attention mechanism restricts the capture of local features, making convolutional assistance necessary. This paper indicates that transformer-based models can attend to local information without using convolutional blocks, similar to convolutional kernels, by employing a special initialization method. Therefore, this paper proposes a novel hybrid multi-scale model called Frequency-Assisted Local Attention Transformer (FALAT). FALAT introduces a Frequency-Assisted Window-based Positional Self-Attention (FWPSA) module that limits the attention distance of query tokens, enabling the capture of local contents in the early stage. The information from value tokens in the frequency domain enhances information diversity during self-attention computation. Additionally, the traditional convolutional method is replaced with a depth-wise separable convolution to downsample in the spatial reduction attention module for long-distance contents in the later stages. Experimental results demonstrate that FALAT-S achieves 83.0% accuracy on IN-1k with an input size of [Formula: see text] using 29.9[Formula: see text]M parameters and 5.6[Formula: see text]G FLOPs. This model outperforms the Next-ViT-S by 0.9[Formula: see text]AP/0.8[Formula: see text]AP with Mask-R-CNN [Formula: see text] on COCO and surpasses the recent FastViT-SA36 by 3.1% mIoU with FPN on ADE20k.

Download full-text PDF

Source
http://dx.doi.org/10.1142/S0129065725500157DOI Listing

Publication Analysis

Top Keywords

frequency-assisted local
8
local attention
8
capture local
8
[formula text]
8
attention
5
attention lower
4
lower layers
4
layers visual
4
visual transformers
4
transformers vision
4

Similar Publications

Frequency-Assisted Local Attention in Lower Layers of Visual Transformers.

Int J Neural Syst

April 2025

School of Mechanical Engineering and Automation, Northeastern University, Wenhua Road, Shen Yang, Liao Ning, P. R. China.

Since vision transformers excel at establishing global relationships between features, they play an important role in current vision tasks. However, the global attention mechanism restricts the capture of local features, making convolutional assistance necessary. This paper indicates that transformer-based models can attend to local information without using convolutional blocks, similar to convolutional kernels, by employing a special initialization method.

View Article and Find Full Text PDF

Laparoscopic application of radio frequency energy enables in situ renal tumor ablation and partial nephrectomy.

J Urol

January 2003

Department of Urology, Clinical Center for Minimally Invasive Urologic Cancer Treatment, University of Texas Southwestern Medical Center, 5323 Harry Hines Boulevard, Dallas, TX 75390-9110, USA.

Purpose: To our knowledge we present the initial series of renal mass in situ laparoscopic radio frequency ablation. We also discuss the indications for and results of subsequent laparoscopic partial nephrectomy.

Materials And Methods: Laparoscopic radio frequency ablation was performed in 13 patients with a mean age of 59 years (range 18 to 81) and a total of 17 small enhancing renal masses.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!