Ice environments pose challenges for conventional underwater acoustic localization techniques due to their multipath and non-linear nature. In this paper, we compare different deep learning networks, such as Transformers, Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTM) networks, and Vision Transformers (ViTs), for passive localization and tracking of single moving, on-ice acoustic sources using two underwater acoustic vector sensors. We incorporate ordinal classification as a localization approach and compare the results with other standard methods. We conduct experiments passively recording the acoustic signature of an anthropogenic source on the ice and analyze these data. The results demonstrate that Vision Transformers are a strong contender for tracking moving acoustic sources on ice. Additionally, we show that classification as a localization technique can outperform regression for networks more suited for classification, such as the CNN and ViTs.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9269127PMC
http://dx.doi.org/10.3390/s22134703DOI Listing

Publication Analysis

Top Keywords

vision transformers
12
ordinal classification
8
underwater acoustic
8
acoustic sources
8
classification localization
8
acoustic
5
through-ice acoustic
4
acoustic source
4
source tracking
4
tracking vision
4

Similar Publications

Objective: The objective of this research is to enhance pneumonia detection in chest X-rays by leveraging a novel hybrid deep learning model that combines Convolutional Neural Networks (CNNs) with modified Swin Transformer blocks. This study aims to significantly improve diagnostic accuracy, reduce misclassifications, and provide a robust, deployable solution for underdeveloped regions where access to conventional diagnostics and treatment is limited.

Methods: The study developed a hybrid model architecture integrating CNNs with modified Swin Transformer blocks to work seamlessly within the same model.

View Article and Find Full Text PDF

A systematic review of deep learning in MRI-based cerebral vascular occlusion-based brain diseases.

Neuroscience

January 2025

Department of Computer Engineering, Faculty of Engineering, Igdir University, 76000, Igdir, Turkey. Electronic address:

Neurological disorders, including cerebral vascular occlusions and strokes, present a major global health challenge due to their high mortality rates and long-term disabilities. Early diagnosis, particularly within the first hours, is crucial for preventing irreversible damage and improving patient outcomes. Although neuroimaging techniques like magnetic resonance imaging (MRI) have advanced significantly, traditional methods often fail to fully capture the complexity of brain lesions.

View Article and Find Full Text PDF

Recent advances of artificial intelligence (AI) in retinal imaging found its application in two major categories: discriminative and generative AI. For discriminative tasks, conventional convolutional neural networks (CNNs) are still major AI techniques. Vision transformers (ViT), inspired by the transformer architecture in natural language processing, has emerged as useful techniques for discriminating retinal images.

View Article and Find Full Text PDF

Study Design: Systematic review.

Objective: Artificial intelligence (AI) and deep learning (DL) models have recently emerged as tools to improve fracture detection, mainly through imaging modalities such as computed tomography (CT) and radiographs. This systematic review evaluates the diagnostic performance of AI and DL models in detecting cervical spine fractures and assesses their potential role in clinical practice.

View Article and Find Full Text PDF

A Comparison Study of Person Identification Using IR Array Sensors and LiDAR.

Sensors (Basel)

January 2025

Faculty of Science and Technology, Keio University, Yokohama 223-8522, Japan.

Person identification is a critical task in applications such as security and surveillance, requiring reliable systems that perform robustly under diverse conditions. This study evaluates the Vision Transformer (ViT) and ResNet34 models across three modalities-RGB, thermal, and depth-using datasets collected with infrared array sensors and LiDAR sensors in controlled scenarios and varying resolutions (16 × 12 to 640 × 480) to explore their effectiveness in person identification. Preprocessing techniques, including YOLO-based cropping, were employed to improve subject isolation.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!