The COVID-19 pandemic led to a large global effort to sequence SARS-CoV-2 genomes from patient samples to track viral evolution and inform public health response. Millions of SARS-CoV-2 genome sequences have been deposited in global public repositories. The Canadian COVID-19 Genomics Network (CanCOGeN - VirusSeq), a consortium tasked with coordinating expanded sequencing of SARS-CoV-2 genomes across Canada early in the pandemic, created the Canadian VirusSeq Data Portal, with associated data pipelines and procedures, to support these efforts.
View Article and Find Full Text PDFMotivation: Human epigenomic data has been generated by large consortia for thousands of cell types to be used as a reference map of normal and disease chromatin states. Since epigenetic data contains potentially identifiable information, similarly to genetic data, most raw files generated by these consortia are stored in controlled-access databases. It is important to protect identifiable information, but this should not hinder secure sharing of these valuable datasets.
View Article and Find Full Text PDFHumans display remarkable interindividual variation in their immune response to identical challenges. Yet, our understanding of the genetic and epigenetic factors contributing to such variation remains limited. Here we performed in-depth genetic, epigenetic and transcriptional profiling on primary macrophages derived from individuals of European and African ancestry before and after infection with influenza A virus.
View Article and Find Full Text PDFMotivation: Human epigenomic data has been generated by large consortia for thousands of cell types to be used as a reference map of normal and disease chromatin states. Since epigenetic data contains potentially identifiable information, similarly to genetic data, most raw files generated by these consortia are stored in controlled-access databases. It is important to protect identifiable information, but this should not hinder secure sharing of these valuable datasets.
View Article and Find Full Text PDFWe present the Canadian Open Neuroscience Platform (CONP) portal to answer the research community's need for flexible data sharing resources and provide advanced tools for search and processing infrastructure capacity. This portal differs from previous data sharing projects as it integrates datasets originating from a number of already existing platforms or databases through DataLad, a file level data integrity and access layer. The portal is also an entry point for searching and accessing a large number of standardized and containerized software and links to a computing infrastructure.
View Article and Find Full Text PDFSummary: Large-scale sharing of genomic quantification data requires standardized access interfaces. In this Global Alliance for Genomics and Health project, we developed RNAget, an API for secure access to genomic quantification data in matrix form. RNAget provides for slicing matrices to extract desired subsets of data and is applicable to all expression matrix-format data, including RNA sequencing and microarrays.
View Article and Find Full Text PDFThe Global Alliance for Genomics and Health (GA4GH) aims to accelerate biomedical advances by enabling the responsible sharing of clinical and genomic data through both harmonized data aggregation and federated approaches. The decreasing cost of genomic sequencing (along with other genome-wide molecular assays) and increasing evidence of its clinical utility will soon drive the generation of sequence data from tens of millions of humans, with increasing levels of diversity. In this perspective, we present the GA4GH strategies for addressing the major challenges of this data revolution.
View Article and Find Full Text PDFWe present the Canadian Distributed Infrastructure for Genomics (CanDIG) platform, which enables federated querying and analysis of human genomics and linked biomedical data. CanDIG leverages the standards and frameworks of the Global Alliance for Genomics and Health (GA4GH) and currently hosts data for five pan-Canadian projects. We describe CanDIG's key design decisions and features as a guide for other federated data systems.
View Article and Find Full Text PDFBackground: Québec was the Canadian province most impacted by COVID-19, with 401,462 cases as of September 24th, 2021, and 11,347 deaths due mostly to a very severe first pandemic wave. In April 2020, we assembled the Coronavirus Sequencing in Québec (CoVSeQ) consortium to sequence SARS-CoV-2 genomes in Québec to track viral introduction events and transmission within the province.
Methods: Using genomic epidemiology, we investigated the arrival of SARS-CoV-2 to Québec.
In the past decade, there has been a surge in the number of sensitive human genomic and health datasets available to researchers via Data Access Agreements (DAAs) and managed by Data Access Committees (DACs). As this form of sharing increases, so do the challenges of achieving a reasonable level of data protection, particularly in the context of international data sharing. Here, we consider how excessive variation across DAAs can hinder these goals, and suggest a core set of clauses that could prove useful in future attempts to harmonize data governance.
View Article and Find Full Text PDFSummary: In recent years, major initiatives such as the International Human Epigenome Consortium have generated thousands of high-quality genome-wide datasets for a large variety of assays and cell types. This data can be used as a reference to assess whether the signal from a user-provided dataset corresponds to its expected experiment, as well as to help reveal unexpected biological associations. We have developed the epiGenomic Efficient Correlator (epiGeEC) tool to enable genome-wide comparisons of very large numbers of datasets.
View Article and Find Full Text PDFThe International Human Epigenome Consortium (IHEC) coordinates the production of reference epigenome maps through the characterization of the regulome, methylome, and transcriptome from a wide range of tissues and cell types. To define conventions ensuring the compatibility of datasets and establish an infrastructure enabling data integration, analysis, and sharing, we developed the IHEC Data Portal (http://epigenomesportal.ca/ihec).
View Article and Find Full Text PDFMany common variants have been associated with hematological traits, but identification of causal genes and pathways has proven challenging. We performed a genome-wide association analysis in the UK Biobank and INTERVAL studies, testing 29.5 million genetic variants for association with 36 red cell, white cell, and platelet properties in 173,480 European-ancestry participants.
View Article and Find Full Text PDF