The number of papers presenting machine learning (ML) models that are being submitted to and published in the Journal of Medical Internet Research and other JMIR Publications journals has steadily increased. Editors and peer reviewers involved in the review process for such manuscripts often go through multiple review cycles to enhance the quality and completeness of reporting. The use of reporting guidelines or checklists can help ensure consistency in the quality of submitted (and published) scientific manuscripts and, for example, avoid instances of missing information.
View Article and Find Full Text PDFBackground: Electronic health records are a valuable source of patient information that must be properly deidentified before being shared with researchers. This process requires expertise and time. In addition, synthetic data have considerably reduced the restrictions on the use and sharing of real data, allowing researchers to access it more rapidly with far fewer privacy constraints.
View Article and Find Full Text PDFData bias is a major concern in biomedical research, especially when evaluating large-scale observational datasets. It leads to imprecise predictions and inconsistent estimates in standard regression models. We compare the performance of commonly used bias-mitigating approaches (resampling, algorithmic, and post hoc approaches) against a synthetic data-augmentation method that utilizes sequential boosted decision trees to synthesize under-represented groups.
View Article and Find Full Text PDFObjective: To provide a brief overview of artificial intelligence (AI) application within the field of eating disorders (EDs) and propose focused solutions for research.
Method: An overview and summary of AI application pertinent to EDs with focus on AI's ability to address issues relating to data sharing and pooling (and associated privacy concerns), data augmentation, as well as bias within datasets is provided.
Results: In addition to clinical applications, AI can utilize useful tools to help combat commonly encountered challenges in ED research, including issues relating to low prevalence of specific subpopulations of patients, small overall sample sizes, and bias within datasets.
Patients, families, healthcare providers and funders face multiple comparable treatment options without knowing which provides the best quality of care. As a step towards improving this, the REthinking Clinical Trials (REaCT) pragmatic trials program started in 2014 to break down many of the traditional barriers to performing clinical trials. However, until other innovative methodologies become widely used, the impact of this program will remain limited.
View Article and Find Full Text PDFSynthetic data generation is being increasingly used as a privacy preserving approach for sharing health data. In addition to protecting privacy, it is important to ensure that generated data has high utility. A common way to assess utility is the ability of synthetic data to replicate results from the real data.
View Article and Find Full Text PDFJCO Clin Cancer Inform
September 2023
Purpose: There is strong interest from patients, researchers, the pharmaceutical industry, medical journal editors, funders of research, and regulators in sharing clinical trial data for secondary analysis. However, data access remains a challenge because of concerns about patient privacy. It has been argued that synthetic data generation (SDG) is an effective way to address these privacy concerns.
View Article and Find Full Text PDFIntroduction: The burden of metabolic syndrome (MetS) and its components has been increasing mainly amongst male individuals. Nevertheless, clinical outcomes related to MetS (i.e.
View Article and Find Full Text PDFBackground: It is evident that COVID-19 will remain a public health concern in the coming years, largely driven by variants of concern (VOC). It is critical to continuously monitor vaccine effectiveness as new variants emerge and new vaccines and/or boosters are developed. Systematic surveillance of the scientific evidence base is necessary to inform public health action and identify key uncertainties.
View Article and Find Full Text PDFBackground: The reporting of machine learning (ML) prognostic and diagnostic modeling studies is often inadequate, making it difficult to understand and replicate such studies. To address this issue, multiple consensus and expert reporting guidelines for ML studies have been published. However, these guidelines cover different parts of the analytics lifecycle, and individually, none of them provide a complete set of reporting requirements.
View Article and Find Full Text PDFA status update on applying generative AI to synthetic data generation.
View Article and Find Full Text PDFGetting access to administrative health data for research purposes is a difficult and time-consuming process due to increasingly demanding privacy regulations. An alternative method for sharing administrative health data would be to share synthetic datasets where the records do not correspond to real individuals, but the patterns and relationships seen in the data are reproduced. This paper assesses the feasibility of generating synthetic administrative health data using a recurrent deep learning model.
View Article and Find Full Text PDFAims: The aim of this study was to elucidate whether sex and gender factors influence access to health care and/or are associated with cardiovascular (CV) outcomes of individuals with diabetes mellitus (DM) across different countries.
Methods: Using data from the Canadian Community Health Survey (8.4% of respondent reporting DM) and the European Health Interview Survey (7.
With many anonymization algorithms developed for structured medical health data (SMHD) in the last decade, our systematic review provides a comprehensive bird's eye view of algorithms for SMHD anonymization. This systematic review was conducted according to the recommendations in the Cochrane Handbook for Reviews of Interventions and reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA). Eligible articles from the PubMed, ACM digital library, Medline, IEEE, Embase, Web of Science Collection, Scopus, ProQuest Dissertation, and Theses Global databases were identified through systematic searches.
View Article and Find Full Text PDFSynthetic data generation is the process of using machine learning methods to train a model that captures the patterns in a real dataset. Then new or synthetic data can be generated from that trained model. The synthetic data does not have a one-to-one mapping to the original data or to real patients, and therefore has the potential of privacy preserving properties.
View Article and Find Full Text PDFBackground: One of the increasingly accepted methods to evaluate the privacy of synthetic data is by measuring the risk of membership disclosure. This is a measure of the F1 accuracy that an adversary would correctly ascertain that a target individual from the same population as the real data is in the dataset used to train the generative model, and is commonly estimated using a data partitioning methodology with a 0.5 partitioning parameter.
View Article and Find Full Text PDFJMIR AI is a new journal with a focus on publishing applied artificial intelligence and machine learning research. This editorial provides an overview of the primary objectives, the focus areas of the journal, and the types of articles that are within scope.
View Article and Find Full Text PDFBackground: One common way to share health data for secondary analysis while meeting increasingly strict privacy regulations is to de-identify it. To demonstrate that the risk of re-identification is acceptably low, re-identification risk metrics are used. There is a dearth of good risk estimators modeling the attack scenario where an adversary selects a record from the microdata sample and attempts to match it with individuals in the population.
View Article and Find Full Text PDFPurpose: Machine learning (ML) is a powerful tool for interrogating datasets and learning relationships between multiple variables. We utilized a ML model to identify those early breast cancer (EBC) patients at highest risk of developing severe vasomotor symptoms (VMS).
Methods: A gradient boosted decision model utilizing cross-sectional survey data from 360 EBC patients was created.
Objective: To examine sex and gender roles in COVID-19 test positivity and hospitalisation in sex-stratified predictive models using machine learning.
Design: Cross-sectional study.
Setting: UK Biobank prospective cohort.
Background: A regular task by developers and users of synthetic data generation (SDG) methods is to evaluate and compare the utility of these methods. Multiple utility metrics have been proposed and used to evaluate synthetic data. However, they have not been validated in general or for comparing SDG methods.
View Article and Find Full Text PDFBackground: Despite the frequency of vasomotor symptoms (VMS) in patients with early breast cancer (EBC), their optimal management remains unknown. A patient survey was performed to determine perspectives on this important clinical challenge.
Methods: Patients with EBC experiencing VMS participated in an anonymous survey.
This article provides a state-of-the-art summary of location privacy issues and geoprivacy-preserving methods in public health interventions and health research involving disaggregate geographic data about individuals. Synthetic data generation (from real data using machine learning) is discussed in detail as a promising privacy-preserving approach. To fully achieve their goals, privacy-preserving methods should form part of a wider comprehensive socio-technical framework for the appropriate disclosure, use and dissemination of data containing personal identifiable information.
View Article and Find Full Text PDF