Background: Rare disease diagnoses are often delayed by years, including multiple doctor visits, and potential imprecise or incorrect diagnoses before receiving the correct one. Machine learning could solve this problem by flagging potential patients that doctors should examine more closely.

Methods: Making the prediction situation as close as possible to real situation, we tested different masking sizes. In the masking phase, data was removed, and it was applied to all data points following the first rare disease diagnosis, including the day when the diagnosis was received, and in addition applied to selected number of days before initial diagnosis. Performance of machine learning models were compared with positive predictive value (PPV), negative predictive value (NPV), prevalence PPV (pPPV), prevalence NPV (pNPV), accuracy (ACC) and area under the receiver operation characteristics curve (AUC).

Results: XGBoost had PPVs over 90 % in all masking settings, and InceptionVasGloMyotides had most of the PPVs over 90 %, but not as consistently. When the prevalence of the diseases was considered XGBoost achieved highest value of 8.8 % in binary classification with 30 days masking and InceptionVasGloMyotides achieved the best value of 6 % in the binary classification as well, but with 2160 days and 4320 days masking. ACC were varying between 89 % and 98 % with XGBoost and InceptionVasGloMyotides having variation between 79 % and 94 %. AUC on the other hand varied between 72.6 % and 94.5 % with InceptionVasGloMyotides and for XGBoost it varied between 69.9 % and 96.4 %.

Conclusions: XGBoost and InceptionVasGloMyotides could successfully predict rare diseases for patients at least 30 days prior to initial rare disease diagnose. In addition, we managed to build performative custom deep learning model.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.cmpb.2023.107917DOI Listing

Publication Analysis

Top Keywords

machine learning
12
rare disease
12
ppvs 90 %
8
binary classification
8
days masking
8
xgboost inceptionvasglomyotides
8
masking
5
days
5
xgboost
5
inceptionvasglomyotides
5

Similar Publications

Accuracy of Radiomics in the Identification of Extrathyroidal Extension and BRAF Mutations in Papillary Thyroid Carcinoma: A Systematic Review and Meta-analysis.

Acad Radiol

January 2025

Department of Radiology and Intervention, Hospital Pakar Kanak-Kanak (UKM Specialist Children's Hospital), Universiti Kebangsaan Malaysia, Jalan Yaacob Latif, Bandar Tun Razak, 56000, Kuala Lumpur, Malaysia (Y.L., F.Y.L., J.N.C., H.A.H., H.A.M.); Makmal Pemprosesan Imej Kefungsian (Functional Image Processing Laboratory), Department of Radiology, Universiti Kebangsaan Malaysia, Jalan Yaacob Latif, Bandar Tun Razak, Kuala Lumpur 56000, Malaysia (H.A.M.). Electronic address:

Rationale And Objectives: Extrathyroidal extension (ETE) and BRAF mutation in papillary thyroid cancer (PTC) increase mortality and recurrence risk. Preoperative identification presents considerable challenges. Although radiomics has emerged as a potential tool for identifying ETE and BRAF mutation, systematic evidence supporting its effectiveness remains insufficient.

View Article and Find Full Text PDF

Predicting postoperative adhesive small bowel obstruction in infants under 3 months with intestinal malrotation: a random forest approach.

J Pediatr (Rio J)

January 2025

Department of General Surgery and Neonatal Surgery, Liangjiang Wing, Children's Hospital of Chongqing Medical University, National Clinical Research Center for Child Health and Disorders, Ministry of Education Key Laboratory of Child Development and Disorders, Chongqing Key Laboratory of Pediatrics, Chongqing, China. Electronic address:

Objective: This study aimed to develop a predictive model using a random forest algorithm to determine the likelihood of postoperative adhesive small bowel obstruction (ASBO) in infants under 3 months with intestinal malrotation.

Methods: A machine learning model was used to predict postoperative adhesive small bowel obstruction using comprehensive clinical data extracted from 107 patients with a follow-up of at least 24 months. The Boruta algorithm was used for selecting clinical features, and nested cross-validation tuned and selected hyper-parameters for the random forest model.

View Article and Find Full Text PDF

PreTKcat: A pre-trained representation learning and machine learning framework for predicting enzyme turnover number.

Comput Biol Chem

January 2025

College of Artificial Intelligence, Tianjin University of Science and Technology, No. 9, 13th Street, Tianjin Economic-Technological Development Area, Tianjin, 300457, China. Electronic address:

The enzyme turnover number (k) is crucial for understanding enzyme kinetics and optimizing biotechnological processes. However, experimentally measured k values are limited due to the high cost and labor intensity of wet-lab measurements, necessitating robust computational methods. To address this issue, we propose PreTKcat, a framework that integrates pre-trained representation learning and machine learning to predict k values.

View Article and Find Full Text PDF

Machine learning-based identification of animal feeding operations in the United States on a parcel-scale.

Sci Total Environ

January 2025

Department of Biological and Agricultural Engineering, University of Arkansas, United States of America. Electronic address:

The increasing global demand for meat and dairy products, fueled by rapid industrialization, has led to the expansion of Animal Feeding Operations (AFOs) in the United States (US). These operations, often found in clusters, generate large amounts of manure, posing a considerable risk to water quality due to the concentrated waste streams they produce. Accurately mapping AFOs is essential for effective environmental and disease management, yet many facilities remain undocumented due to variations in federal and state regulations.

View Article and Find Full Text PDF

The long-term presence of antibiotics in the aquatic environment will affect ecology and human health. Techniques for determining antibiotics are often time-consuming, labor-intensive and costly, and it is desirable to seek new methods to achieve rapid prediction of antibiotics. Many scholars have shown the effectiveness of machine learning in water quality prediction, however, its effectiveness in predicting antibiotic concentrations in the aquatic environment remains inconclusive.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!