AI Article Synopsis

  • The study explores the use of real-world clinical databases to assess breast cancer genetics, focusing on the prevalence and effectiveness of prevention strategies and treatments, while also highlighting the biases and issues in these data sets that can affect analysis.
  • Using a large health network's data, the research involved cleaning and cross-referencing information on variants in genes related to breast cancer, ultimately aiming to improve the accuracy of genetic variant assessments according to established guidelines.
  • The findings revealed demographic imbalances in the patient cohort and emphasized that incorrect designations of genetic variants were a major source of data loss, but that manual curation and reassessment can significantly enhance data quality and interpretation.

Article Abstract

Purpose: The emergence of large real-world clinical databases and tools to mine electronic medical records has allowed for an unprecedented look at large data sets with clinical and epidemiologic correlates. In clinical cancer genetics, real-world databases allow for the investigation of prevalence and effectiveness of prevention strategies and targeted treatments and for the identification of barriers to better outcomes. However, real-world data sets have inherent biases and problems (eg, selection bias, incomplete data, measurement error) that may hamper adequate analysis and affect statistical power.

Methods: Here, we leverage a real-world clinical data set from a large health network for patients with breast cancer tested for variants in and (N = 12,423). We conducted data cleaning and harmonization, cross-referenced with publicly available databases, performed variant reassessment and functional assays, and used functional data to inform a variant's clinical significance applying American College of Medical Geneticists and the Association of Molecular Pathology guidelines.

Results: In the cohort, White and Black patients were over-represented, whereas non-White Hispanic and Asian patients were under-represented. Incorrect or missing variant designations were the most significant contributor to data loss. While manual curation corrected many incorrect designations, a sizable fraction of patient carriers remained with incorrect or missing variant designations. Despite the large number of patients with clinical significance not reported, original reported clinical significance assessments were accurate. Reassessment of variants in which clinical significance was not reported led to a marked improvement in data quality.

Conclusion: We identify the most common issues with and testing data entry and suggest approaches to minimize data loss and keep interpretation of clinical significance of variants up to date.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11161245PMC
http://dx.doi.org/10.1200/CCI.23.00251DOI Listing

Publication Analysis

Top Keywords

clinical significance
20
data
12
clinical
9
patients breast
8
breast cancer
8
real-world data
8
real-world clinical
8
data sets
8
incorrect missing
8
missing variant
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!