Recently, substantial research effort has focused on how to apply CNNs or RNNs to better capture temporal patterns in videos, so as to improve the accuracy of video classification. In this paper, we investigate the potential of a purely attention based local feature integration. Accounting for the characteristics of such features in video classification, we first propose Basic Attention Clusters (BAC), which concatenates the output of multiple attention units applied in parallel, and introduce a shifting operation to capture more diverse signals. Experiments show that BAC can achieve excellent results on multiple datasets. However, BAC treats all feature channels as an indivisible whole, which is suboptimal for achieving a finer-grained local feature integration over the channel dimension. Additionally, it treats the entire local feature sequence as an unordered set, thus ignoring the sequential relationships. To improve over BAC, we further propose the channel pyramid attention schema by splitting features into sub-features at multiple scales for coarse-to-fine sub-feature interaction modeling, and propose the temporal pyramid attention schema by dividing the feature sequences into ordered sub-sequences of multiple lengths to account for the sequential order. Our final model pyramidĂ—pyramid attention clusters (PPAC) combines both channel pyramid attention and temporal pyramid attention to focus on the most important sub-features, while also preserving the temporal information of the video. We demonstrate the effectiveness of PPAC on seven real-world video classification datasets. Our model achieves competitive results across all of these, showing that our proposed framework can consistently outperform the existing local feature integration methods across a range of different scenarios.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TPAMI.2020.3029554DOI Listing

Publication Analysis

Top Keywords

local feature
20
feature integration
16
video classification
16
pyramid attention
16
purely attention
8
attention based
8
based local
8
attention
8
attention clusters
8
channel pyramid
8

Similar Publications

Peripapillary pachychoroid syndrome (PPS) is a recently described condition, classified within the pachychoroid disease spectrum characterized by focal or diffuse thickening of the choroid due to dilation of choroidal vessels in the Haller's layer (pachyvessels), thinning of the choriocapillaris and the Sattler's layer, and accompanied by increased choroidal permeability and damage to the retinal pigment epithelium. Unlike other pachychoroid diseases that involve changes in the central retina, PPS presents with choroidal thickening and intra- or subretinal fluid located nasally in the macular region, near the optic disc. This review aims to summarize and analyze current data on the clinical features, pathogenesis, and treatment options for PPS found in the literature.

View Article and Find Full Text PDF

Background: The subcellular localization of mRNA plays a crucial role in gene expression regulation and various cellular processes. However, existing wet lab techniques like RNA-FISH are usually time-consuming, labor-intensive, and limited to specific tissue types. Researchers have developed several computational methods to predict mRNA subcellular localization to address this.

View Article and Find Full Text PDF

Social media generates vast amounts of spatio-temporal sequential data. However, current methods often ignore the complex spatio-temporal correlations within these data. This oversight makes it difficult to fully capture the dynamic features of the data.

View Article and Find Full Text PDF

The quantity of cable conductors is a crucial parameter in cable manufacturing, and accurately detecting the number of conductors can effectively promote the digital transformation of the cable manufacturing industry. Challenges such as high density, adhesion, and knife mark interference in cable conductor images make intelligent detection of conductor quantity particularly difficult. To address these challenges, this study proposes the YOLO-cable model, which is an improvement made upon the YOLOv10 model.

View Article and Find Full Text PDF

A new prediction model based on deep learning for pig house environment.

Sci Rep

December 2024

School of Mechanical and Electrical Engineering, Qiqihar University, Qiqihar, 161006, China.

A prediction model of the pig house environment based on Bayesian optimization (BO), squeeze and excitation block (SE), convolutional neural network (CNN) and gated recurrent unit (GRU) is proposed to improve the prediction accuracy and animal welfare and take control measures in advance. To ensure the optimal model configuration, the model uses a BO algorithm to fine-tune hyper-parameters, such as the number of GRUs, initial learning rate and L2 normal form regularization factor. The environmental data are fed into the SE-CNN block, which extracts the local features of the data through convolutional operations.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!