Publications by authors named "Bernhard Haubold"

Article Synopsis
  • The paper identifies long unique genomic regions in mammals that are highly enriched with developmental genes, using a fast string matching method for detection.
  • It quantifies the method’s efficiency and accuracy, applying it to the genomes of 18 mammals and annotating unique regions of at least 10 kb.
  • The study highlights a significant anonymous unique region in the Tasmanian devil, containing an essential gene, suggesting that these unique regions should be prioritized in mammalian genome annotations.
View Article and Find Full Text PDF

Motivation: Markers for diagnostic polymerase chain reactions are routinely constructed by taking regions common to the genomes of a target organism and subtracting the regions found in the targets' closest relatives, their neighbors. This approach is implemented in the published package Fur, which originally required memory proportional to the number of nucleotides in the neighborhood. This does not scale well.

View Article and Find Full Text PDF

Non-invasive clinical diagnostics of bladder cancer is feasible via a set of chemically distinct molecules including macromolecular tumor markers such as polypeptides and nucleic acids. In terms of tumor-related aberrant gene expression, RNA transcripts are the primary indicator of tumor-specific gene expression as for polypeptides and their metabolic products occur subsequently. Thus, in case of bladder cancer, urine RNA represents an early potentially useful diagnostic marker.

View Article and Find Full Text PDF

By tracking pathogen outbreaks using whole genome sequencing, medical microbiology is currently being transformed into genomic epidemiology. This change in technology is leading to the rapid accumulation of large samples of closely related genome sequences. Summarizing such samples into phylogenies can be computationally challenging.

View Article and Find Full Text PDF

Motivation: Unique marker sequences are highly sought after in molecular diagnostics. Nevertheless, there are only few programs available to search for marker sequences, compared to the many programs for similarity search. We therefore wrote the program Fur for Finding Unique genomic Regions.

View Article and Find Full Text PDF

Motivation: Tracking disease outbreaks by whole-genome sequencing leads to the collection of large samples of closely related sequences. Five years ago, we published a method to accurately compute all pairwise distances for such samples by indexing each sequence. Since indexing is slow, we now ask whether it is possible to achieve similar accuracy when indexing only a single sequence.

View Article and Find Full Text PDF

With up to millions of nearly neutral polymorphisms now being routinely sampled in population-genomic surveys, it is possible to estimate the site-frequency spectrum of such sites with high precision. Each frequency class reflects a mixture of potentially unique demographic histories, which can be revealed using theory for the probability distributions of the starting and ending points of branch segments over all possible coalescence trees. Such distributions are completely independent of past population history, which only influences the segment lengths, providing the basis for estimating average population sizes separating tree-wide coalescence events.

View Article and Find Full Text PDF

Motivation: Unique sequence regions are associated with genetic function in vertebrate genomes. However, measuring uniqueness, or absence of long repeats, along a genome is conceptually and computationally difficult. Here we use a variant of the Lempel-Ziv complexity, the match complexity, Cm, and augment it by deriving its null distribution for random sequences.

View Article and Find Full Text PDF

Motivation: In many organisms, including humans, recombination clusters within recombination hotspots. The standard method for de novo detection of recombinants at hotspots is sperm typing. This relies on allele-specific PCR at single nucleotide polymorphisms.

View Article and Find Full Text PDF

We have recently developed a distance metric for efficiently estimating the number of substitutions per site between unaligned genome sequences. These substitution rates are called "anchor distances" and can be used for phylogeny reconstruction. Most phylogenies come with bootstrap support values, which are computed by resampling with replacement columns of homologous residues from the original alignment.

View Article and Find Full Text PDF

Wild house mice form social hierarchies with aggressive males defending territories, in which females, young mice and submissive adult males share nests. In contrast, socially excluded males are barred from breeding groups, have numerous bite wounds and patches of thinning fur. Since their feeding times are often disrupted, we investigated whether social exclusion leads to changes in epigenetic marks of metabolic genes in liver tissue.

View Article and Find Full Text PDF

Motivation: A standard approach to classifying sets of genomes is to calculate their pairwise distances. This is difficult for large samples. We have therefore developed an algorithm for rapidly computing the evolutionary distances between closely related genomes.

View Article and Find Full Text PDF

Although the analysis of linkage disequilibrium (LD) plays a central role in many areas of population genetics, the sampling variance of LD is known to be very large with high sensitivity to numbers of nucleotide sites and individuals sampled. Here we show that a genome-wide analysis of the distribution of heterozygous sites within a single diploid genome can yield highly informative patterns of LD as a function of physical distance. The proposed statistic, the correlation of zygosity, is closely related to the conventional population-level measure of LD, but is agnostic with respect to allele frequencies and hence likely less prone to outlier artifacts.

View Article and Find Full Text PDF

In mammals, exposure to toxic or disease-causing environments can change epigenetic marks that are inherited independently of the intrauterine environment. Such inheritance of molecular phenotypes may be adaptive. However, studies demonstrating molecular evidence for epigenetic inheritance have so far relied on extreme treatments, and are confined to inbred animals.

View Article and Find Full Text PDF

Phylogenetics and population genetics are central disciplines in evolutionary biology. Both are based on comparative data, today usually DNA sequences. These have become so plentiful that alignment-free sequence comparison is of growing importance in the race between scientists and sequencing machines.

View Article and Find Full Text PDF

Motivation: Why recombination? is one of the central questions in biology. This has led to a host of methods for quantifying recombination from sequence data. These methods are usually based on aligned DNA sequences.

View Article and Find Full Text PDF

The origins of crop diseases are linked to domestication of plants. Most crops were domesticated centuries--even millennia--ago, thus limiting opportunity to understand the concomitant emergence of disease. Kiwifruit (Actinidia spp.

View Article and Find Full Text PDF

Comparative sequencing contributes critically to the functional annotation of genomes. One prerequisite for successful analysis of the increasingly abundant comparative sequencing data is the availability of efficient computational tools. We present here a strategy for comparing unaligned genomes based on a coalescent approach combined with advanced algorithms for indexing sequences.

View Article and Find Full Text PDF

Under neutrality, polymorphisms are maintained through the balance between mutation and drift. Under selection, a variety of mechanisms may be involved in the maintenance of polymorphisms, for example, sexual selection or host-parasite coevolution on the population level or heterozygote advantage in diploid individuals. Here we address the emergence of polymorphisms in a population of interacting haploid individuals.

View Article and Find Full Text PDF

Bacterial epidemics are often caused by strains that have acquired their increased virulence through horizontal gene transfer. Due to this association with disease, the detection of horizontal gene transfer continues to receive attention from microbiologists and bioinformaticians alike. Most software for detecting transfer events is based on alignments of sets of genes or of entire genomes.

View Article and Find Full Text PDF

Understanding the processes and conditions under which populations diverge to give rise to distinct species is a central question in evolutionary biology. Since recently diverged populations have high levels of shared polymorphisms, it is challenging to distinguish between recent divergence with no (or very low) inter-population gene flow and older splitting events with subsequent gene flow. Recently published methods to infer speciation parameters under the isolation-migration framework are based on summarizing polymorphism data at multiple loci in two species using the joint site-frequency spectrum (JSFS).

View Article and Find Full Text PDF

Motivation: Bacterial and viral genomes are often affected by horizontal gene transfer observable as abrupt switching in local homology. In addition to the resulting mosaic genome structure, they frequently contain regions not found in close relatives, which may play a role in virulence mechanisms. Due to this connection to medical microbiology, there are numerous methods available to detect horizontal gene transfer.

View Article and Find Full Text PDF

Motivation: Sequencing capacity is currently growing more rapidly than CPU speed, leading to an analysis bottleneck in many genome projects. Alignment-free sequence analysis methods tend to be more efficient than their alignment-based counterparts. They may, therefore, be important in the long run for keeping sequence analysis abreast with sequencing.

View Article and Find Full Text PDF

Improvements in sequencing technology over the past 5 years are leading to routine application of shotgun sequencing in the fields of ecology and evolution. However, the theory to estimate evolutionary parameters from these data is still being worked out. Here we present an extension and implementation of part of this theory, mlRho.

View Article and Find Full Text PDF