Predictive modeling holds a large potential in clinical decision-making, yet its effectiveness can be hindered by inherent data imbalances in clinical datasets. This study investigates the utility of synthetic data for improving the performance of predictive modeling on realistic small imbalanced clinical datasets. We compared various synthetic data generation methods including Generative Adversarial Networks, Normalizing Flows, and Variational Autoencoders to the standard baselines for correcting for class underrepresentation on four clinical datasets.
View Article and Find Full Text PDFClinical notes contain valuable information for research and monitoring quality of care. Named Entity Recognition (NER) is the process for identifying relevant pieces of information such as diagnoses, treatments, side effects, etc., and bring them to a more structured form.
View Article and Find Full Text PDFStud Health Technol Inform
August 2024
Generative machine learning models such as Generative Adversarial Networks (GANs) have been shown to be especially successful in generating realistic synthetic data in image and tabular domains. However, it has been shown that such generative models, as well as the generated synthetic data, can reveal information contained in their privacy-sensitive training data, and therefore must be carefully evaluated before being used. The gold standard method through which such privacy leakage can be estimated is simulating membership inference attacks (MIAs), in which an attacker attempts to learn whether a given sample was part of the training data of a generative model.
View Article and Find Full Text PDFBackground: Reference intervals (RIs) for patient test results are in standard use across many medical disciplines, allowing physicians to identify measurements indicating potentially pathological states with relative ease. The process of inferring cohort-specific RIs is, however, often ignored because of the high costs and cumbersome efforts associated with it. Sophisticated analysis tools are required to automatically infer relevant and locally specific RIs directly from routine laboratory data.
View Article and Find Full Text PDF