Ehapp2: Estimate haplotype frequencies from pooled sequencing data with prior database information.

J Bioinform Comput Biol

1 State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, P. R. China.

Published: August 2016

To reduce the cost of large-scale re-sequencing, multiple individuals are pooled together and sequenced called pooled sequencing. Pooled sequencing could provide a cost-effective alternative to sequencing individuals separately. To facilitate the application of pooled sequencing in haplotype-based diseases association analysis, the critical procedure is to accurately estimate haplotype frequencies from pooled samples. Here we present Ehapp2 for estimating haplotype frequencies from pooled sequencing data by utilizing a database which provides prior information of known haplotypes. We first translate the problem of estimating frequency for each haplotype into finding a sparse solution for a system of linear equations, where the NNREG algorithm is employed to achieve the solution. Simulation experiments reveal that Ehapp2 is robust to sequencing errors and able to estimate the frequencies of haplotypes with less than 3% average relative difference for pooled sequencing of mixture of real Drosophila haplotypes with 50× total coverage even when the sequencing error rate is as high as 0.05. Owing to the strategy that proportions for local haplotypes spanning multiple SNPs are accurately calculated first, Ehapp2 retains excellent estimation for recombinant haplotypes resulting from chromosomal crossover. Comparisons with present methods reveal that Ehapp2 is state-of-the-art for many sequencing study designs and more suitable for current massive parallel sequencing.

Download full-text PDF

Source
http://dx.doi.org/10.1142/S0219720016500177DOI Listing

Publication Analysis

Top Keywords

pooled sequencing
24
haplotype frequencies
12
frequencies pooled
12
sequencing
11
estimate haplotype
8
pooled
8
sequencing data
8
reveal ehapp2
8
ehapp2
5
haplotypes
5

Similar Publications

Robust discrimination between closely related species of salmon based on DNA fragments.

Anal Bioanal Chem

January 2025

Statistical Engineering Division, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, MD, 20899-8980, USA.

Closely related species of Salmonidae, including Pacific and Atlantic salmon, can be distinguished from one another based on nucleotide sequences from the cytochrome c oxidase sub-unit 1 mitochondrial gene (COI), using ensembles of fragments aligned to genetic barcodes that serve as digital proxies for the relevant species. This is accomplished by exploiting both the nucleotide sequences and their quality scores recorded in a FASTQ file obtained via Next Generation (NextGen) Sequencing of mitochondrial DNA extracted from Coho salmon caught with hook and line in the Gulf of Alaska. The alignment is done using MUSCLE (Muscle 5.

View Article and Find Full Text PDF

Intramammary dry-off treatment is widely considered an effective method for preventing and curing intramammary infection (IMI) in lactating cows; however, it is not commonly used in small ruminants like goats. Therefore, this study was designed to evaluate the effect of an approved cefazolin-based intramammary treatment on the milk microbiota of Alpine dairy goats during the dry and early lactation periods. Sixty goats were randomly selected based on bacteriological results and randomly allocated into the control group (CG) or the treatment group (TG).

View Article and Find Full Text PDF

Protein-Protein Interactions (PPIs) are a key interface between virus and host, and these interactions are important to both viral reprogramming of the host and to host restriction of viral infection. In particular, viral-host PPI networks can be used to further our understanding of the molecular mechanisms of tissue specificity, host range, and virulence. At higher scales, viral-host PPI screening could also be used to screen for small-molecule antivirals that interfere with essential viral-host interactions, or to explore how the PPI networks between interacting viral and host genomes co-evolve.

View Article and Find Full Text PDF

Ticks continue to invade new regions spreading pathogens of zoonotic and veterinary importance. Diverse tick species have been reported in Ghana due to the continuous trade of livestock. In this study, ticks were collected from cattle in three sites within Southern Ghana.

View Article and Find Full Text PDF

Comprehensive genome-scale CRISPR knockout screening of CHO cells.

Sci Data

January 2025

Department of Molecular Science and Technology, Ajou University, Suwon, 16499, Republic of Korea.

Chinese hamster ovary (CHO) cells play a pivotal role in the production of recombinant therapeutics. In the present study, we conducted a genome-scale pooled CRISPR knockout (KO) screening using a virus-free, recombinase-mediated cassette exchange-based platform in CHO-K1 host and CHO-K1 derived recombinant cells. Genome-wide guide RNA (gRNA) amplicon sequencing data were generated from cell libraries, as well as short- and long-term KO libraries, and validated through phenotypic assessment and gRNA read count distribution.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!