As an excellent single-stage object detector based on neural networks, YOLOv5 has found extensive applications in the industrial domain; however, it still exhibits certain design limitations. To address these issues, this paper proposes Efficient Scale Fusion YOLO (ESF-YOLO). Firstly, the Multi-Sampling Conv Module (MSCM) is designed, which enhances the backbone network's learning capability for low-level features through multi-scale receptive fields and cross-scale feature fusion. Secondly, to tackle occlusion issues, a new Block-wise Channel Attention Module (BCAM) is designed, assigning greater weights to channels corresponding to critical information. Next, a lightweight Decoupled Head (LD-Head) is devised. Additionally, the loss function is redesigned to address asynchrony between labels and confidences, alleviating the imbalance between positive and negative samples during the neural network training. Finally, an adaptive scale factor for Intersection over Union (IoU) calculation is innovatively proposed, adjusting bounding box sizes adaptively to accommodate targets of different sizes in the dataset. Experimental results on the SODA10M and CBIA8K datasets demonstrate that ESF-YOLO increases Average Precision at 0.50 IoU (AP50) by 3.93 and 2.24%, Average Precision at 0.75 IoU (AP75) by 4.77 and 4.85%, and mean Average Precision (mAP) by 4 and 5.39%, respectively, validating the model's broad applicability.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11033406PMC
http://dx.doi.org/10.3389/fnins.2024.1371418DOI Listing

Publication Analysis

Top Keywords

average precision
12
object detector
8
detector based
8
based neural
8
neural networks
8
esf-yolo accurate
4
accurate universal
4
universal object
4
networks excellent
4
excellent single-stage
4

Similar Publications

Multi-label zero-shot learning (ML-ZSL) strives to recognize all objects in an image, regardless of whether they are present in the training data. Recent methods incorporate an attention mechanism to locate labels in the image and generate class-specific semantic information. However, the attention mechanism built on visual features treats label embeddings equally in the prediction score, leading to severe semantic ambiguity.

View Article and Find Full Text PDF

Objective: This study aimed to compare the accuracy of Qlone, Magiscan, and 3dMD with that of direct anthropometry (DA).

Methods: The study involved 41 patients. Sixteen facial landmarks, including six individual and five paired points, were marked on each participant's face.

View Article and Find Full Text PDF

Background: This study aimed to investigate deoxyribonucleic acid (DNA) copy number variations (CNVs) in children with neurodevelopmental disorders and their association with craniofacial abnormalities.

Methods: A total of 1,457 children who visited the Child Health Department of our hospital for unexplained Neurodevelopmental disorders (NDDs) between November 2019 and December 2022 were enrolled. Peripheral venous blood samples (2 mL) were collected from the children and their parents for whole-exome sequencing.

View Article and Find Full Text PDF

The unsaturated hydraulic conductivity (K) is one of the most important properties for evaluating moisture and gas migration in soil. However, the precise measurement of K in the laboratory often requires considerable time and economic costs. Currently, the most commonly used method to calculate K is to obtain it from the soil-water characteristic curve (SWCC) and saturated hydraulic conductivity.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!