With the rapid growth in demand for security surveillance, assisted driving, and remote sensing, object detection networks with robust environmental perception and high detection accuracy have become a research focus. However, single-modality image detection technologies face limitations in environmental adaptability, often affected by factors such as lighting conditions, fog, rain, and obstacles like vegetation, leading to information loss and reduced detection accuracy. We propose an object detection network that integrates features from visible light and infrared images-IV-YOLO-to address these challenges. This network is based on YOLOv8 (You Only Look Once v8) and employs a dual-branch fusion structure that leverages the complementary features of infrared and visible light images for target detection. We designed a Bidirectional Pyramid Feature Fusion structure (Bi-Fusion) to effectively integrate multimodal features, reducing errors from feature redundancy and extracting fine-grained features for small object detection. Additionally, we developed a Shuffle-SPP structure that combines channel and spatial attention to enhance the focus on deep features and extract richer information through upsampling. Regarding model optimization, we designed a loss function tailored for multi-scale object detection, accelerating the convergence speed of the network during training. Compared with the current state-of-the-art Dual-YOLO model, IV-YOLO achieves mAP improvements of 2.8%, 1.1%, and 2.2% on the Drone Vehicle, FLIR, and KAIST datasets, respectively. On the Drone Vehicle and FLIR datasets, IV-YOLO has a parameter count of 4.31 M and achieves a frame rate of 203.2 fps, significantly outperforming YOLOv8n (5.92 M parameters, 188.6 fps on the Drone Vehicle dataset) and YOLO-FIR (7.1 M parameters, 83.3 fps on the FLIR dataset), which had previously achieved the best performance on these datasets. This demonstrates that IV-YOLO achieves higher real-time detection performance while maintaining lower parameter complexity, making it highly promising for applications in autonomous driving, public safety, and beyond.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11478367 | PMC |
http://dx.doi.org/10.3390/s24196181 | DOI Listing |
Sci Rep
January 2025
School of Cyberspace Security, Hebei University of Engineering Science, Shijiazhuang, 050091, China.
Aerial images can cover a wide area and capture rich scene information. These images are often taken from a high altitude and contain many small objects. It is difficult to detect small objects accurately because their features are not obvious and are susceptible to background interference.
View Article and Find Full Text PDFSci Rep
January 2025
College of Computer and Data Science, Minjiang University, Fuzhou, 350018, China.
This study presents a novel approach to identifying meters and their pointers in modern industrial scenarios using deep learning. We developed a neural network model that can detect gauges and one or more of their pointers on low-quality images. We use an encoder network, jump connections, and a modified Convolutional Block Attention Module (CBAM) to detect gauge panels and pointer keypoints in images.
View Article and Find Full Text PDFTalanta
January 2025
Instituto de Historia (IH-CCHS), CSIC, C/ Albasanz 26-28, 28037, Madrid, Spain. Electronic address:
Analysis of glass-based artworks is important for authentication purposes. In recent years, there have been rapid advancements and improvements in the characterization of glass objects using different analytical approaches. The present study presents an interdisciplinary and multi-analytical authentication approach that provides useful tools and markers to unmask possible imitations, counterfeiting, and forgeries in Cultural Heritage glass beads by comparing the composition of historical and modern glass beads.
View Article and Find Full Text PDFNeural Netw
December 2024
Institute of Automation, Chinese Academy of Sciences, MAIS, Beijing, 100190, China; University of Chinese Academy of Sciences, Beijing, 101408, China.
In the rapidly evolving field of deep learning, Convolutional Neural Networks (CNNs) retain their unique strengths and applicability in processing grid-structured data such as images, despite the surge of Transformer architectures. This paper explores alternatives to the standard convolution, with the objective of augmenting its feature extraction prowess while maintaining a similar parameter count. We propose innovative solutions targeting depthwise separable convolution and standard convolution, culminating in our Multi-scale Progressive Inference Convolution (MPIC).
View Article and Find Full Text PDFAfr J Reprod Health
November 2024
Department of Obstetrics and Gynecology, Wuxi No.2 People's Hospital, Wuxi 214002, Jiangsu Province, China.
Cervical cancer (CC) is a malignant tumor in females characterized by high incidence and mortality rates, often resulting in a poor prognosis for patients. Zoledronic acid (ZA), a third-generation bisphosphonate, exhibits anti-tumor properties across various types of tumors. To further understand the effect of ZA in the treatment of CC, this article included two kinds of human CC cells (CCCs) as the research object, examining the impact of varying levels of ZA on the cells' biological properties.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!