Optimizing sequence data analysis using convolution neural network for the prediction of CNV bait positions.

BMC Bioinformatics

Albert Szent-Györgyi Health Centre, University of Szeged, Korányi fasor 14-15, Szeged, H-6725, Csongrád-Csanád, Hungary.

Published: December 2024

Background: Accurate prediction of copy number variations (CNVs) from targeted capture next-generation sequencing (NGS) data relies on effective normalization of read coverage profiles. The normalization process is particularly challenging due to hidden systemic biases such as GC bias, which can significantly affect the sensitivity and specificity of CNV detection. In many cases, the kit manifests provide only the genome coordinates of the targeted regions, and the exact bait design of the oligo capture baits is not available. Although the on-target regions significantly overlap with the bait design, a lack of adequate information allows less accurate normalization of the coverage data. In this study, we propose a novel approach that utilizes a 1D convolution neural network (CNN) model to predict the positions of capture baits in complex whole-exome sequencing (WES) kits. By accurately identifying the exact positions of bait coordinates, our model enables precise normalization of GC bias across target regions, thereby allowing better CNV data normalization.

Results: We evaluated the optimal hyperparameters, model architecture, and complexity to predict the likely positions of the oligo capture baits. Our analysis shows that the CNN models outperform the Dense NN for bait predictions. Batch normalization is the most important parameter for the stable training of CNN models. Our results indicate that the spatiality of the data plays an important role in the prediction performance. We have shown that combined input data, including experimental coverage, on-target information, and sequence data, are critical for bait prediction. Furthermore, comparison with the on-target information indicated that the CNN models performed better in predicting bait positions that exhibited a high degree of overlap (>90%) with the true bait positions.

Results: This study highlights the potential of utilizing CNN-based approaches to optimize coverage data analysis and improve copy number data normalization. Subsequent CNV detection based on these predicted coordinates facilitates more accurate measurement of coverage profiles and better normalization for GC bias. As a result, this approach could reduce systemic bias and improve the sensitivity and specificity of CNV detection in genomic studies.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11669243PMC
http://dx.doi.org/10.1186/s12859-024-06006-yDOI Listing

Publication Analysis

Top Keywords

cnv detection
12
capture baits
12
cnn models
12
data
9
sequence data
8
data analysis
8
convolution neural
8
neural network
8
bait
8
bait positions
8

Similar Publications

Low prevalence of copy number variation in pfmdr1 and pfpm2 in Plasmodium falciparum isolates from southern Angola.

Malar J

January 2025

Global Health and Tropical Medicine, GHTM, Associate Laboratory in Translation and Innovation Towards Global Health, LA-REAL, Instituto de Higiene e Medicina Tropical, IHMT, Universidade NOVA de Lisboa, UNL, Rua da Junqueira 100, 1349-008, Lisbon, Portugal.

Background: Malaria is the parasitic disease with the highest global morbidity and mortality. According to estimates from the World Health Organization (WHO), there were around 249 million cases in 2022, with 3.4% occurring in Angola.

View Article and Find Full Text PDF

Pathological variants in HPV-independent vulvar tumours.

Sci Rep

January 2025

Department of Laboratory Medicine, Clinical Pathology and Genetics, Faculty of Medicine and Health, Örebro University, Örebro, Sweden.

Vulvar cancer is a rare gynaecological disease that can be caused by infection with human papillomavirus (HPV). The mutational frequencies and landscape for HPV-associated and HPV-independent vulvar tumor development are supposedly two distinctly different pathways and more detailed knowledge on target biological mechanisms for individualized future treatments is needed. The study included formalin-fixed paraffin-embedded (FFPE) samples from 32 cancer patients (16 HPV-negative and 16 HPV-associated), treated in Örebro, Sweden from 1988 to 2008.

View Article and Find Full Text PDF

Background: loss of function manifests across a broad spectrum of phenotypes, ranging from severe prenatal onset to asymptomatic cases. Bilateral periventricular nodular heterotopia (BPNH) consistently occurs in affected individuals. This retrospective study involving French patients with BPNH evaluates the prevalence of gene dosage anomalies and investigates genotype-phenotype correlations in a large cohort of French patients with BPNH.

View Article and Find Full Text PDF

Background: Juvenile granulosa cell tumor (JGCT) of the ovary is a rare tumor with distinct clinicopathological and hormonal features primarily affecting young women and children. We conducted a complex clinicopathological, immunohistochemical, and molecular analysis of five cases of JGCT.

Methods: The immunohistochemical examination was performed with 32 markers, including markers that have not been previously investigated.

View Article and Find Full Text PDF

Canine coronavirus (CCoV), canine respiratory coronavirus (CRCoV), canine adenovirus type 2 (CAV-2), and canine norovirus (CNV) are important pathogens for canine viral gastrointestinal and respiratory diseases. Especially, co-infections with these viruses exacerbate the damages of diseases. In this study, four pairs of primers and probes were designed to specifically amplify the conserved regions of the CCoV M gene, CRCoV N gene, CAV-2 hexon gene, and CNV RdRp gene.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!