Publications by authors named "Siruo Wang"

Tens of thousands of RNA-sequencing experiments comprising hundreds of thousands of individual samples have now been performed. These data represent a broad range of experimental conditions, sequencing technologies, and hypotheses under study. The Recount project has aggregated and uniformly processed hundreds of thousands of publicly available RNA-seq samples.

View Article and Find Full Text PDF

Purpose: Real-world data (RWD) derived from electronic health records (EHRs) are often used to understand population-level relationships between patient characteristics and cancer outcomes. Machine learning (ML) methods enable researchers to extract characteristics from unstructured clinical notes, and represent a more cost-effective and scalable approach than manual expert abstraction. These extracted data are then used in epidemiologic or statistical models as if they were abstracted observations.

View Article and Find Full Text PDF

Many modern problems in medicine and public health leverage machine-learning methods to predict outcomes based on observable covariates. In a wide array of settings, predicted outcomes are used in subsequent statistical analysis, often without accounting for the distinction between observed and predicted outcomes. We call inference with predicted outcomes postprediction inference.

View Article and Find Full Text PDF
Article Synopsis
  • * Most new junctions were found in older samples, suggesting that recent RNA-seq data has contributed little to novel well-supported junctions, indicating a potential plateau in annotation updates.
  • * It's crucial to enhance gene annotations, especially for less frequently expressed genes, using the stable data available in the Sequence Read Archive (SRA) to capture splicing variations previously missed.
View Article and Find Full Text PDF