Since 2017, we have used IonTorrent NGS platform in our hospital to diagnose and treat cancer. Analyzing variants at each run requires considerable time, and we are still struggling with some variants that appear correct on the metrics at first, but are found to be negative upon further investigation. Can any machine learning algorithm (ML) help us classify NGS variants? This has led us to investigate which ML can fit our NGS data and to develop a tool that can be routinely implemented to help biologists. Currently, one of the greatest challenges in medicine is processing a significant quantity of data. This is particularly true in molecular biology with the advantage of next-generation sequencing (NGS) for profiling and identifying molecular tumors and their treatment. In addition to bioinformatics pipelines, artificial intelligence (AI) can be valuable in helping to analyze mutation variants. Generating sequencing data from patient DNA samples has become easy to perform in clinical trials. However, analyzing the massive quantities of genomic or transcriptomic data and extracting the key biomarkers associated with a clinical response to a specific therapy requires a formidable combination of scientific expertise, biomolecular skills and a panel of bioinformatic and biostatistic tools, in which artificial intelligence is now successful in developing future routine diagnostics. However, cancer genome complexity and technical artifacts make identifying real variants challenging. We present a machine learning method for classifying pathogenic single nucleotide variants (SNVs), single nucleotide polymorphisms (SNPs), multiple nucleotide variants (MNVs), insertions, and deletions detected by NGS from different types of tumor specimens, such as: colorectal, melanoma, lung and glioma cancer. We compared our NGS data to different machine learning algorithms using the k-fold cross-validation method and to neural networks (deep learning) to measure the performance of the different ML algorithms and determine which one is a valid model for confirming NGS variant calls in cancer diagnosis. We trained our machine learning with 70% of our data samples, extracted from our local database (our data structure had 7 parameters: chromosome, position, exon, variant allele frequency, minor allele frequency, coverage and protein description) and validated it with the 30% remaining data. The model offering the best accuracy was chosen and implemented in the NGS analysis routine. Artificial intelligence was developed with the R script language version 3.6.0. We trained our model on 70% of 102,011 variants. Our best error rate (0.22%) was found with random forest machine learning (ntree = 500 and mtry = 4), with an AUC of 0.99. Neural networks achieved some good scores. The final trained model with the neural network achieved an accuracy of 98% and an ROC-AUC of 0.99 with validation data. We tested our RF model to interpret more than 2000 variants from our NGS database: 20 variants were misclassified (error rate < 1%). The errors were nomenclature problems and false positives. After adding false positives to our training database and implementing our RF model routinely, our error rate was always < 0.5%. The RF model shows excellent results for oncosomatic NGS interpretation and can easily be implemented in other molecular biology laboratories. AI is becoming increasingly important in molecular biomedical analysis and can be very helpful in processing medical data. Neural networks show a good capacity in variant classification, and in the future, they may be useful in predicting more complex variants.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8575902 | PMC |
http://dx.doi.org/10.1038/s41598-021-01253-y | DOI Listing |
J Transl Med
January 2025
Department of Clinical Laboratory, The First Hospital of Jilin University, Changchun, 130000, China.
Background: Recent studies suggest a connection between immunoglobulin light chains (IgLCs) and coronary heart disease (CHD). However, current diagnostic methods using peripheral blood IgLCs levels or subtype ratios show limited accuracy for CHD, lacking comprehensive assessment and posing challenges in early detection and precise disease severity evaluation. We aim to develop and validate a Coronary Health Index (CHI) incorporating total IgLCs levels and their distribution.
View Article and Find Full Text PDFJ Transl Med
January 2025
State Key Laboratory of Cardiovascular Diseases and Medical Innovation Center, School of Medicine, Shanghai East Hospital, Tongji University, Shanghai, 200120, China.
Background: Dilated cardiomyopathy (DCM) is one of the most common causes of heart failure. Infiltration and alterations in non-cardiomyocytes of the human heart involve crucially in the occurrence of DCM and associated immunotherapeutic approaches.
Methods: We constructed a single-cell transcriptional atlas of DCM and normal patients.
BMC Med Inform Decis Mak
January 2025
The First Affiliated Hospital, and College of Clinical Medicine of Henan University of Science and Technology, Luoyang, China.
Background: The diagnosis and treatment of epilepsy continue to face numerous challenges, highlighting the urgent need for the development of rapid, accurate, and non-invasive methods for seizure detection. In recent years, advancements in the analysis of electroencephalogram (EEG) signals have garnered widespread attention, particularly in the area of seizure recognition.
Methods: A novel hybrid deep learning approach that combines feature fusion for efficient seizure detection is proposed in this study.
BMC Oral Health
January 2025
Department of Stomatology, People's Hospital of Xinjiang Autonomous Region, Urumqi City, China.
Background: The progression and severity of periodontitis (PD) are associated with the release of extracellular vesicles by periodontal tissue cells. However, the precise mechanisms through which exosome-related genes (ERGs) influence PD remain unclear. This study aimed to investigate the role and potential mechanisms of key exosome-related genes in PD using transcriptome profiling at the single-cell level.
View Article and Find Full Text PDFBMC Med Inform Decis Mak
January 2025
QUEST Center for Responsible Research, Berlin Institute of Health at Charité Universitätsmedizin Berlin, Berlin, Germany.
Background: Machine learning (ML) is increasingly used to predict clinical deterioration in intensive care unit (ICU) patients through scoring systems. Although promising, such algorithms often overfit their training cohort and perform worse at new hospitals. Thus, external validation is a critical - but frequently overlooked - step to establish the reliability of predicted risk scores to translate them into clinical practice.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!