A Two-Stream Deep Fusion Framework for High-Resolution Aerial Scene Classification.

Comput Intell Neurosci

Air Defense and Anti-Missile College, Air Force Engineering University, Xi'an 710051, China.

Published: August 2018

One of the challenging problems in understanding high-resolution remote sensing images is aerial scene classification. A well-designed feature representation method and classifier can improve classification accuracy. In this paper, we construct a new two-stream deep architecture for aerial scene classification. First, we use two pretrained convolutional neural networks (CNNs) as feature extractor to learn deep features from the original aerial image and the processed aerial image through saliency detection, respectively. Second, two feature fusion strategies are adopted to fuse the two different types of deep convolutional features extracted by the original RGB stream and the saliency stream. Finally, we use the extreme learning machine (ELM) classifier for final classification with the fused features. The effectiveness of the proposed architecture is tested on four challenging datasets: UC-Merced dataset with 21 scene categories, WHU-RS dataset with 19 scene categories, AID dataset with 30 scene categories, and NWPU-RESISC45 dataset with 45 challenging scene categories. The experimental results demonstrate that our architecture gets a significant classification accuracy improvement over all state-of-the-art references.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5822919PMC
http://dx.doi.org/10.1155/2018/8639367DOI Listing

Publication Analysis

Top Keywords

scene categories
16
aerial scene
12
scene classification
12
dataset scene
12
two-stream deep
8
classification accuracy
8
aerial image
8
scene
7
classification
6
aerial
5

Similar Publications

Disgust is a basic emotion that motivates avoidance behaviors to protect organisms from pathogens. Objects of disgust are acquired through classical conditioning mechanisms. Oculomotor avoidance serves as an objective marker of disgust, yet previous studies have relied on repeated presentations to establish disgust conditioning.

View Article and Find Full Text PDF

To correctly parse the visual scene, one must detect edges and determine their underlying cause. Previous work has demonstrated that image-computable neural networks trained to differentiate natural shadow and occlusion edges exhibited sensitivity to boundary sharpness and texture differences. Although these models showed a strong correlation with human performance on an edge classification task, this previous study did not directly investigate whether humans actually make use of boundary sharpness and texture cues when classifying edges as shadows or occlusions.

View Article and Find Full Text PDF

In this paper, a new method for producing movie trailers is presented. In the proposed method, the problem is divided into two sub-problems: "genre identification" and "genre-based trailer production". To solve the first sub-problem, the poster image and subtitle text processing strategy has been used in which, a convolutional neural network (CNN) model has been used to extract features related to the movie genre from its poster image.

View Article and Find Full Text PDF

Visible-thermal small object detection (RGBT SOD) is a significant yet challenging task with a wide range of applications, including video surveillance, traffic monitoring, search and rescue. However, existing studies mainly focus on either visible or thermal modality, while RGBT SOD is rarely explored. Although some RGBT datasets have been developed, the insufficient quantity, limited diversity, unitary application, misaligned images and large target size cannot provide an impartial benchmark to evaluate RGBT SOD algorithms.

View Article and Find Full Text PDF

Does object-to-scene binding depend on object and scene consistency?

Atten Percept Psychophys

March 2025

Department of Psychology, California State University San Marcos, 333 S Twin Oaks Valley Rd, San Marcos, CA, 92096, USA.

Memory for semantically inconsistent objects in scenes is greater than that for semantically consistent objects - a phenomenon known as the inconsistent object advantage (Hollingworth & Henderson, Visual Cognition, 7(1-3), 213-235, 2000). Semantically inconsistent objects are also fixated longer and more often than consistent objects (Henderson et al., Journal of Experimental Psychology: Human Perception and Performance, 25(1), 210-228, 1999), potentially leaving less time for encoding the rest of the scene in which the objects occur.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!