Publications by authors named "David Bujold"

Article Synopsis
  • The COVID-19 pandemic spurred global efforts to sequence SARS-CoV-2 genomes to monitor its evolution and guide public health decisions, resulting in millions of genome sequences being shared worldwide.
  • The Canadian COVID-19 Genomics Network (CanCOGeN - VirusSeq) launched the Canadian VirusSeq Data Portal to provide open access to genomic sequences and standardized contextual data while adhering to FAIR standards.
  • The portal emphasizes data quality, privacy compliance, and security, and is used alongside tools like Viral AI and the CoVaRR-Net to facilitate ongoing research and analysis of SARS-CoV-2 variants in Canada.
View Article and Find Full Text PDF

The COVID-19 pandemic led to a large global effort to sequence SARS-CoV-2 genomes from patient samples to track viral evolution and inform public health response. Millions of SARS-CoV-2 genome sequences have been deposited in global public repositories. The Canadian COVID-19 Genomics Network (CanCOGeN - VirusSeq), a consortium tasked with coordinating expanded sequencing of SARS-CoV-2 genomes across Canada early in the pandemic, created the Canadian VirusSeq Data Portal, with associated data pipelines and procedures, to support these efforts.

View Article and Find Full Text PDF

Motivation: Human epigenomic data has been generated by large consortia for thousands of cell types to be used as a reference map of normal and disease chromatin states. Since epigenetic data contains potentially identifiable information, similarly to genetic data, most raw files generated by these consortia are stored in controlled-access databases. It is important to protect identifiable information, but this should not hinder secure sharing of these valuable datasets.

View Article and Find Full Text PDF

Humans display remarkable interindividual variation in their immune response to identical challenges. Yet, our understanding of the genetic and epigenetic factors contributing to such variation remains limited. Here we performed in-depth genetic, epigenetic and transcriptional profiling on primary macrophages derived from individuals of European and African ancestry before and after infection with influenza A virus.

View Article and Find Full Text PDF

Motivation: Human epigenomic data has been generated by large consortia for thousands of cell types to be used as a reference map of normal and disease chromatin states. Since epigenetic data contains potentially identifiable information, similarly to genetic data, most raw files generated by these consortia are stored in controlled-access databases. It is important to protect identifiable information, but this should not hinder secure sharing of these valuable datasets.

View Article and Find Full Text PDF

We present the Canadian Open Neuroscience Platform (CONP) portal to answer the research community's need for flexible data sharing resources and provide advanced tools for search and processing infrastructure capacity. This portal differs from previous data sharing projects as it integrates datasets originating from a number of already existing platforms or databases through DataLad, a file level data integrity and access layer. The portal is also an entry point for searching and accessing a large number of standardized and containerized software and links to a computing infrastructure.

View Article and Find Full Text PDF

Summary: Large-scale sharing of genomic quantification data requires standardized access interfaces. In this Global Alliance for Genomics and Health project, we developed RNAget, an API for secure access to genomic quantification data in matrix form. RNAget provides for slicing matrices to extract desired subsets of data and is applicable to all expression matrix-format data, including RNA sequencing and microarrays.

View Article and Find Full Text PDF

The Global Alliance for Genomics and Health (GA4GH) aims to accelerate biomedical advances by enabling the responsible sharing of clinical and genomic data through both harmonized data aggregation and federated approaches. The decreasing cost of genomic sequencing (along with other genome-wide molecular assays) and increasing evidence of its clinical utility will soon drive the generation of sequence data from tens of millions of humans, with increasing levels of diversity. In this perspective, we present the GA4GH strategies for addressing the major challenges of this data revolution.

View Article and Find Full Text PDF

We present the Canadian Distributed Infrastructure for Genomics (CanDIG) platform, which enables federated querying and analysis of human genomics and linked biomedical data. CanDIG leverages the standards and frameworks of the Global Alliance for Genomics and Health (GA4GH) and currently hosts data for five pan-Canadian projects. We describe CanDIG's key design decisions and features as a guide for other federated data systems.

View Article and Find Full Text PDF

Background: Québec was the Canadian province most impacted by COVID-19, with 401,462 cases as of September 24th, 2021, and 11,347 deaths due mostly to a very severe first pandemic wave. In April 2020, we assembled the Coronavirus Sequencing in Québec (CoVSeQ) consortium to sequence SARS-CoV-2 genomes in Québec to track viral introduction events and transmission within the province.

Methods: Using genomic epidemiology, we investigated the arrival of SARS-CoV-2 to Québec.

View Article and Find Full Text PDF

In the past decade, there has been a surge in the number of sensitive human genomic and health datasets available to researchers via Data Access Agreements (DAAs) and managed by Data Access Committees (DACs). As this form of sharing increases, so do the challenges of achieving a reasonable level of data protection, particularly in the context of international data sharing. Here, we consider how excessive variation across DAAs can hinder these goals, and suggest a core set of clauses that could prove useful in future attempts to harmonize data governance.

View Article and Find Full Text PDF
Article Synopsis
  • The decreasing cost of sequencing and advancements in genomics technologies are increasing the demand for validated bioinformatics software for large-scale data processing.
  • GenPipes is a Python-based framework designed for developing and deploying complex workflows, optimized for high-performance computing and the cloud.
  • It offers 12 validated pipelines for various genomics applications, is open source, and allows researchers to easily analyze and customize their workflows based on their specific needs.
View Article and Find Full Text PDF

Summary: In recent years, major initiatives such as the International Human Epigenome Consortium have generated thousands of high-quality genome-wide datasets for a large variety of assays and cell types. This data can be used as a reference to assess whether the signal from a user-provided dataset corresponds to its expected experiment, as well as to help reveal unexpected biological associations. We have developed the epiGenomic Efficient Correlator (epiGeEC) tool to enable genome-wide comparisons of very large numbers of datasets.

View Article and Find Full Text PDF
Article Synopsis
  • * Researchers conducted an epigenome-wide association study involving DNA methylation patterns across 406,365 sites in immune cells from 52 twin pairs, revealing significant differences in those affected by T1D compared to their healthy twins and unrelated individuals.
  • * The identified differentially variable CpG positions (DVPs) are stable over time, linked to key regulatory elements, and highlight immune cell pathways, suggesting that epigenetic changes may start to occur after birth and could play a role in T1D pathogenesis.
View Article and Find Full Text PDF

The International Human Epigenome Consortium (IHEC) coordinates the production of reference epigenome maps through the characterization of the regulome, methylome, and transcriptome from a wide range of tissues and cell types. To define conventions ensuring the compatibility of datasets and establish an infrastructure enabling data integration, analysis, and sharing, we developed the IHEC Data Portal (http://epigenomesportal.ca/ihec).

View Article and Find Full Text PDF

Many common variants have been associated with hematological traits, but identification of causal genes and pathways has proven challenging. We performed a genome-wide association analysis in the UK Biobank and INTERVAL studies, testing 29.5 million genetic variants for association with 36 red cell, white cell, and platelet properties in 173,480 European-ancestry participants.

View Article and Find Full Text PDF
Article Synopsis
  • The study investigates how genetic and epigenetic factors influence disease traits in human immune cells by profiling three major cell types from nearly 200 individuals.
  • Researchers quantitatively analyze the contributions of these factors to gene transcription, identifying potential confounding influences in epigenome-wide association studies.
  • The findings reveal coordinated genetic effects on gene expression and highlight 345 immune disease loci, providing insights into the relationship between genomic elements and disease risk.
View Article and Find Full Text PDF