Publications by authors named "Yannick Pouliot"

Background: There are known disparities in incidence and outcomes of colorectal cancer (CRC) by race and ethnicity. Some of these disparities may be mediated by molecular changes in tumors that occur at different rates across populations. Genetic ancestry is a measure complementary to race and ethnicity that can overcome missing data issues and better capture genetic similarity in admixed populations.

View Article and Find Full Text PDF

The incompleteness of race and ethnicity information in real-world data (RWD) hampers its utility in promoting healthcare equity. This study introduces two methods-one heuristic and the other machine learning-based-to impute race and ethnicity from genetic ancestry using tumor profiling data. Analyzing de-identified data from over 100,000 cancer patients sequenced with the Tempus xT panel, we demonstrate that both methods outperform existing geolocation and surname-based methods, with the machine learning approach achieving high recall (range: 0.

View Article and Find Full Text PDF

Summary: RNA sequencing (RNA-seq) can be applied to diverse tasks including quantifying gene expression, discovering quantitative trait loci and identifying gene fusion events. Although RNA-seq can detect germline variants, the complexities of variable transcript abundance, target capture and amplification introduce challenging sources of error. Here, we extend DeepVariant, a deep-learning-based variant caller, to learn and account for the unique challenges presented by RNA-seq data.

View Article and Find Full Text PDF

Next-generation deep sequencing of gene panels is being adopted as a diagnostic test to identify actionable mutations in cancer patient samples. However, clinical samples, such as formalin-fixed, paraffin-embedded specimens, frequently provide low quantities of degraded, poor quality DNA. To overcome these issues, many sequencing assays rely on extensive PCR amplification leading to an accumulation of bias and artifacts.

View Article and Find Full Text PDF

Respiratory viral infections are a significant burden to healthcare worldwide. Many whole genome expression profiles have identified different respiratory viral infection signatures, but these have not translated to clinical practice. Here, we performed two integrated, multi-cohort analyses of publicly available transcriptional data of viral infections.

View Article and Find Full Text PDF

Introduction: In the present study, we sought to identify markers in patients with anti-neutrophil cytoplasmic antibody (ANCA)-associated vasculitis (AAV) that distinguish those achieving remission at 6 months following rituximab or cyclophosphamide treatment from those for whom treatment failed in the Rituximab in ANCA-Associated Vasculitis (RAVE) trial.

Methods: Clinical and flow cytometry data from the RAVE trial were downloaded from the Immunology Database and Analysis Portal and Immune Tolerance Network TrialShare public repositories. Flow cytometry data were analyzed using validated automated gating and joined with clinical data.

View Article and Find Full Text PDF

The Center for Expanded Data Annotation and Retrieval is studying the creation of comprehensive and expressive metadata for biomedical datasets to facilitate data discovery, data interpretation, and data reuse. We take advantage of emerging community-based standard templates for describing different kinds of biomedical datasets, and we investigate the use of computational techniques to help investigators to assemble templates and to fill in their values. We are creating a repository of metadata from which we plan to identify metadata patterns that will drive predictive data entry when filling in metadata templates.

View Article and Find Full Text PDF

With the continued exponential expansion of publicly available genomic data and access to low-cost, high-throughput molecular technologies for profiling patient populations, computational technologies and informatics are becoming vital considerations in genomic medicine. Although cloud computing technology is being heralded as a key enabling technology for the future of genomic research, available case studies are limited to applications in the domain of high-throughput sequence data analysis. The goal of this study was to evaluate the computational and economic characteristics of cloud computing in performing a large-scale data integration and analysis representative of research problems in genomic medicine.

View Article and Find Full Text PDF

Background: Using computational database searches, we have demonstrated previously that no gene sequences could be found for at least 36% of enzyme activities that have been assigned an Enzyme Commission number. Here we present a follow-up literature-based survey involving a statistically significant sample of such "orphan" activities. The survey was intended to determine whether sequences for these enzyme activities are truly unknown, or whether these sequences are absent from the public sequence databases but can be found in the literature.

View Article and Find Full Text PDF

Background: This article addresses the problem of interoperation of heterogeneous bioinformatics databases.

Results: We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining.

View Article and Find Full Text PDF