Tens of thousands of RNA-sequencing experiments comprising hundreds of thousands of individual samples have now been performed. These data represent a broad range of experimental conditions, sequencing technologies, and hypotheses under study. The Recount project has aggregated and uniformly processed hundreds of thousands of publicly available RNA-seq samples.
View Article and Find Full Text PDFPurpose: Real-world data (RWD) derived from electronic health records (EHRs) are often used to understand population-level relationships between patient characteristics and cancer outcomes. Machine learning (ML) methods enable researchers to extract characteristics from unstructured clinical notes, and represent a more cost-effective and scalable approach than manual expert abstraction. These extracted data are then used in epidemiologic or statistical models as if they were abstracted observations.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
December 2020
Many modern problems in medicine and public health leverage machine-learning methods to predict outcomes based on observable covariates. In a wide array of settings, predicted outcomes are used in subsequent statistical analysis, often without accounting for the distinction between observed and predicted outcomes. We call inference with predicted outcomes postprediction inference.
View Article and Find Full Text PDF