Decoding Genetic Variations: Communications-Inspired Haplotype Assembly.

IEEE/ACM Trans Comput Biol Bioinform

Published: September 2017

High-throughput DNA sequencing technologies allow fast and affordable sequencing of individual genomes and thus enable unprecedented studies of genetic variations. Information about variations in the genome of an individual is provided by haplotypes, ordered collections of single nucleotide polymorphisms. Knowledge of haplotypes is instrumental in finding genes associated with diseases, drug development, and evolutionary studies. Haplotype assembly from high-throughput sequencing data is challenging due to errors and limited lengths of sequencing reads. The key observation made in this paper is that the minimum error-correction formulation of the haplotype assembly problem is identical to the task of deciphering a coded message received over a noisy channel-a classical problem in the mature field of communication theory. Exploiting this connection, we develop novel haplotype assembly schemes that rely on the bit-flipping and belief propagation algorithms often used in communication systems. The latter algorithm is then adapted to the haplotype assembly of polyploids. We demonstrate on both simulated and experimental data that the proposed algorithms compare favorably with state-of-the-art haplotype assembly methods in terms of accuracy, while being scalable and computationally efficient.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TCBB.2015.2462367DOI Listing

Publication Analysis

Top Keywords

haplotype assembly
24
genetic variations
8
assembly high-throughput
8
haplotype
6
assembly
6
decoding genetic
4
variations communications-inspired
4
communications-inspired haplotype
4
high-throughput dna
4
sequencing
4

Similar Publications

The genome sequence of the tawny cockroach, Ectobius (Ectobius) pallidus (Olivier, 1789).

Wellcome Open Res

January 2025

Entomology Section, World Museum, Liverpool, England, UK.

We present a genome assembly from a specimen of (tawny cockroach; Arthropoda; Insecta; Blattodea; Ectobiidae). The assembly contains two haplotypes with total lengths of 2,087.55 megabases and 2,124.

View Article and Find Full Text PDF

More than 50% of families with suspected rare monogenic diseases remain unsolved after whole-genome analysis by short-read sequencing (SRS). Long-read sequencing (LRS) could help bridge this diagnostic gap by capturing variants inaccessible to SRS, facilitating long-range mapping and phasing and providing haplotype-resolved methylation profiling. To evaluate LRS's additional diagnostic yield, we sequenced a rare-disease cohort of 98 samples from 41 families, using nanopore sequencing, achieving per sample ∼36× average coverage and 32-kb read N50 from a single flow cell.

View Article and Find Full Text PDF

Some unique asexual species persist over time and contradict the consensus that sex is a prerequisite for long-term evolutionary survival. How they escape the dead-end fate remains enigmatic. Here, we generated a haplotype-resolved genome assembly on the basis of a single individual and collected genomic data from worldwide populations of the parthenogenetic diploid oribatid mite to identify signatures of persistence without sex.

View Article and Find Full Text PDF

Haplotyped-resolved phased assemblies aim to capture the full allelic diversity in heterozygous and polyploid species to enable accurate genetic analyses. However, building non-collapsed references still presents a challenge. Here, we used long-range interaction Hi-C reads (high-throughput chromatin conformation capture) and HiFi PacBio reads to assemble the genome of the apomictic cultivar Basilisks from Urochloa decumbens (2n = 4x = 36), an outcrossed tetraploid Paniceae grass widely cropped to feed livestock in the tropics.

View Article and Find Full Text PDF

is a well-known edible and medicinal fungus with significant economic value. However, the available whole-genome information is lacking for this species. The chromosome-scale reference genome (Monop) and two haploid genomes (Hap1 and Hap2) of , each assembled into 11 pseudochromosomes, were constructed using Illumina, PacBio-HiFi long-read sequencing, and Hi-C technology.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!