SeqHBase: a big data toolset for family based sequencing data analysis.

J Med Genet

Utah Foundation for Biomedical Research, Provo, Utah, USA Zilkha Neurogenetic Institute, University of Southern California, Los Angeles, California, USA Department of Psychiatry, University of Southern California, Los Angeles, California, USA.

Published: April 2015

Background: Whole-genome sequencing (WGS) and whole-exome sequencing (WES) technologies are increasingly used to identify disease-contributing mutations in human genomic studies. It can be a significant challenge to process such data, especially when a large family or cohort is sequenced. Our objective was to develop a big data toolset to efficiently manipulate genome-wide variants, functional annotations and coverage, together with conducting family based sequencing data analysis.

Methods: Hadoop is a framework for reliable, scalable, distributed processing of large data sets using MapReduce programming models. Based on Hadoop and HBase, we developed SeqHBase, a big data-based toolset for analysing family based sequencing data to detect de novo, inherited homozygous, or compound heterozygous mutations that may contribute to disease manifestations. SeqHBase takes as input BAM files (for coverage at every site), variant call format (VCF) files (for variant calls) and functional annotations (for variant prioritisation).

Results: We applied SeqHBase to a 5-member nuclear family and a 10-member 3-generation family with WGS data, as well as a 4-member nuclear family with WES data. Analysis times were almost linearly scalable with number of data nodes. With 20 data nodes, SeqHBase took about 5 secs to analyse WES familial data and approximately 1 min to analyse WGS familial data.

Conclusions: These results demonstrate SeqHBase's high efficiency and scalability, which is necessary as WGS and WES are rapidly becoming standard methods to study the genetics of familial disorders.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4382803PMC
http://dx.doi.org/10.1136/jmedgenet-2014-102907DOI Listing

Publication Analysis

Top Keywords

data
12
family based
12
based sequencing
12
sequencing data
12
seqhbase big
8
big data
8
data toolset
8
data analysis
8
functional annotations
8
nuclear family
8

Similar Publications

Introduction: Few data on the impact of specific interventions against Emergency Rooms 'or Hospitals overcrowding are available in France.

Methods: In the present report, we retrospectively investigated the impact of the implementation of a short-stay observation unit associated with the admitter-rounder model, especially onto the other in-patient internal medicine units in a French University Hospital.

Results: During the first 100 days, 242 patients were admitted into the short-stay observation unit.

View Article and Find Full Text PDF

Purpose: This study aims to examine the relationships between organizational culture, employee loyalty, trust and job satisfaction within the Lebanese health-care sector. It addresses the critical need to improve employee retention and organizational performance in a context marked by economic instability and political uncertainty. By analyzing data from 270 health-care professionals, the study aims to explore how different aspects of organizational culture - such as transparency, supportiveness and ethical leadership - affect employee trust and satisfaction.

View Article and Find Full Text PDF

Background: Patients undergoing liver transplantation (LT) are at risk of perioperative neurocognitive dysfunction (PND), which significantly affects the patients' prognosis.

Objective: This study used machine learning (ML) algorithms with an aim to extract critical predictors and develop an ML model to predict PND among LT recipients.

Methods: In this retrospective study, data from 958 patients who underwent LT between January 2015 and January 2020 were extracted from the Third Affiliated Hospital of Sun Yat-sen University.

View Article and Find Full Text PDF

Background: Childhood obesity prevalence remains high, especially in racial and ethnic minority populations with low incomes. This epidemic is attributed to various dietary behaviors, including increased consumption of energy-dense foods and sugary beverages and decreased intake of fruits and vegetables. Interactive, technology-based approaches are emerging as promising tools to support health behavior changes.

View Article and Find Full Text PDF

Background: Mobile health apps have shown promising results in improving self-management of several chronic diseases in patients. We have developed a mobile health app (Cardiomeds) dedicated to patients with heart failure (HF). This app includes an interactive medication list; daily self-monitoring of symptoms, weight, blood pressure, and heart rate; and educational information on HF delivered through various formats.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!