AMIA Jt Summits Transl Sci Proc
June 2023
This reproducibility study presents an algorithm to weigh in race distribution data of clinical research study samples when training biomedical embeddings. We extracted 12,864 PubMed abstracts published between January 1, 2000 and January 1, 2022 and weighed them based on the race distribution data extracted from their corresponding clinical trials registered on ClinicalTrials.gov.
View Article and Find Full Text PDFObjective: To develop a computable representation for medical evidence and to contribute a gold standard dataset of annotated randomized controlled trial (RCT) abstracts, along with a natural language processing (NLP) pipeline for transforming free-text RCT evidence in PubMed into the structured representation.
Materials And Methods: Our representation, EvidenceMap, consists of 3 levels of abstraction: Medical Evidence Entity, Proposition and Map, to represent the hierarchical structure of medical evidence composition. Randomly selected RCT abstracts were annotated following EvidenceMap based on the consensus of 2 independent annotators to train an NLP pipeline.