Improving machine learning-based bitewing segmentation with synthetic data.

J Dent

Department of Conservative Dentistry and Periodontology, LMU University Hospital, LMU Munich, Goethestraße 70, 80 336, Munich, Germany. Electronic address:

Published: March 2025

Objectives: Class imbalance in datasets is one of the challenges of machine learning (ML) in medical image analysis. We employed synthetic data to overcome class imbalance when segmenting bitewing radiographs as an exemplary task for using ML.

Methods: After segmenting bitewings into classes, i.e. dental structures, restorations, and background, the pixel-level representation of implants in the training set (1543 bitewings) and testing set (177 bitewings) was 0.03% and 0.07%, respectively. A diffusion model and a generative adversarial network (pix2pix) were used to generate a dataset synthetically enriched in implants. A U-Net segmentation model was trained on (1) the original dataset, (2) the synthetic dataset, (3) on the synthetic dataset and fine-tuned on the original dataset, or (4) on a dataset which was naïvely oversampled with images containing implants.

Results: U-Net trained on the original dataset was unable to segment implants in the testing set. Model performance was significantly improved by naïve over-sampling, achieving the highest precision. The model trained only on synthetic data performed worse than naïve over-sampling in all metrics, but with fine-tuning on original data, it resulted in the highest Dice score, recall, F1 score and ROC AUC, respectively. The performance on other classes than implants was similar for all strategies except training only on synthetic data, which tended to perform worse.

Conclusions: The use of synthetic data alone may deteriorate the performance of segmentation models. However, fine-tuning on original data could significantly enhance model performance, especially for heavily underrepresented classes.

Clinical Significance: This study explored the use of synthetic data to enhance segmentation of bitewing radiographs, focusing on underrepresented classes like implants. Pre-training on synthetic data followed by fine-tuning on original data yielded the best results, highlighting the potential of synthetic data to advance AI-driven dental imaging and ultimately support clinical decision-making.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jdent.2025.105679DOI Listing

Publication Analysis

Top Keywords

synthetic data
32
original dataset
12
fine-tuning original
12
original data
12
data
11
synthetic
10
class imbalance
8
bitewing radiographs
8
testing set
8
model trained
8

Similar Publications

Motivation: Inferring gene networks provides insights into biological pathways and functional relationships among genes. When gene expression samples exhibit heterogeneity, they may originate from unknown subtypes, prompting the utilization of mixture Gaussian graphical model for simultaneous subclassification and gene network inference. However, this method overlooks the heterogeneity of network relationships across subtypes and does not sufficiently emphasize shared relationships.

View Article and Find Full Text PDF

Structural investigation of an RNA device that regulates PD-1 expression in mammalian cells.

Nucleic Acids Res

February 2025

Protein-Nucleic Acid Interaction Section, Center for Structural Biology, Center for Cancer Research, National Cancer Institute, Frederick, MD, 21702, United States.

Synthetic RNA devices are engineered to control gene expression and offer great potential in both biotechnology and clinical applications. Here, we present multidisciplinary structural and biochemical data for a tetracycline (Tc)-responsive RNA device (D43) in both ligand-free and bound states, providing a structure-dynamical basis for signal transmission. Activation of self-cleavage is achieved via ligand-induced conformational and dynamical changes that stabilize the elongated bridging helix harboring the communication module, which drives proper coordination of the catalytic residues.

View Article and Find Full Text PDF

Data augmented lung cancer prediction framework using the nested case control NLST cohort.

Front Oncol

February 2025

Centre de Recherche du CHU de Québec, Université Laval, Québec, QC, Canada.

Purpose: In the context of lung cancer screening, the scarcity of well-labeled medical images poses a significant challenge to implement supervised learning-based deep learning methods. While data augmentation is an effective technique for countering the difficulties caused by insufficient data, it has not been fully explored in the context of lung cancer screening. In this research study, we analyzed the state-of-the-art (SOTA) data augmentation techniques for lung cancer binary prediction.

View Article and Find Full Text PDF

Heart disease is a leading cause of mortality worldwide, making accurate early detection essential for effective treatment and management. This study introduces a novel hybrid machine-learning approach that combines transfer learning using the VGG16 convolutional neural network (CNN) with various machine-learning classifiers for heart disease detection. A conditional tabular generative adversarial network (CTGAN) was employed to generate synthetic data samples from actual datasets; these were evaluated using statistical metrics, correlation analysis, and domain expert assessments to ensure the quality of the synthetic datasets.

View Article and Find Full Text PDF

Unlabelled: Drought is a natural disaster that exerts considerable adverse impacts on the agricultural sector. This study aimed to investigate the cytokinin-mediated carbohydrate accumulation in the aerial parts of the plant and the roots in four-month-old drought-stressed tall fescue ( Schreb.) plants.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!