Plant mitogenomes can be difficult to assemble because they are structurally dynamic and prone to intergenomic DNA transfers, leading to the unusual situation where an organelle genome is far outnumbered by its nuclear counterparts. As a result, comparative mitogenome studies are in their infancy and some key aspects of genome evolution are still known mainly from pregenomic, qualitative methods. To help address these limitations, we combined machine learning and in silico enrichment of mitochondrial-like long reads to assemble the bacterial-sized mitogenome of Norway spruce (Pinaceae: Picea abies).
View Article and Find Full Text PDFMol Phylogenet Evol
October 2019
The oomycetes are filamentous eukaryotic microorganisms, distinct from true fungi, many of which act as crop or fish pathogens that cause devastating losses in agriculture and aquaculture. Chitin is present in all true fungi, but it occurs in only small amounts in some Saprolegniomycetes and it is absent in Peronosporomycetes. However, the growth of several oomycetes is severely impacted by competitive chitin synthase (CHS) inhibitors.
View Article and Find Full Text PDFMotivation: A reconciliation is an annotation of the nodes of a gene tree with evolutionary events-for example, speciation, gene duplication, transfer, loss, etc.-along with a mapping onto a species tree. Many algorithms and software produce or use reconciliations but often using different reconciliation formats, regarding the type of events considered or whether the species tree is dated or not.
View Article and Find Full Text PDFBMC Bioinformatics
February 2017
Background: MCMC-based methods are important for Bayesian inference of phylogeny and related parameters. Although being computationally expensive, MCMC yields estimates of posterior distributions that are useful for estimating parameter values and are easy to use in subsequent analysis. There are, however, sometimes practical difficulties with MCMC, relating to convergence assessment and determining burn-in, especially in large-scale analyses.
View Article and Find Full Text PDFBackground: Lateral gene transfer (LGT) is an evolutionary process that has an important role in biology. It challenges the traditional binary tree-like evolution of species and is attracting increasing attention of the molecular biologists due to its involvement in antibiotic resistance. A number of attempts have been made to model LGT in the presence of gene duplication and loss, but reliably placing LGT events in the species tree has remained a challenge.
View Article and Find Full Text PDFReads from paired-end and mate-pair libraries are often utilized to find structural variation in genomes, and one common approach is to use their fragment length for detection. After aligning read pairs to the reference, read pair distances are analyzed for statistically significant deviations. However, previously proposed methods are based on a simplified model of observed fragment lengths that does not agree with data.
View Article and Find Full Text PDFBackground: Homology inference is pivotal to evolutionary biology and is primarily based on significant sequence similarity, which, in general, is a good indicator of homology. Algorithms have also been designed to utilize conservation in gene order as an indication of homologous regions. We have developed GenFamClust, a method based on quantification of both gene order conservation and sequence similarity.
View Article and Find Full Text PDFMotivation: Scaffolding is often an essential step in a genome assembly process, in which contigs are ordered and oriented using read pairs from a combination of paired-end libraries and longer-range mate-pair libraries. Although a simple idea, scaffolding is unfortunately hard to get right in practice. One source of problems is so-called PE-contamination in mate-pair libraries, in which a non-negligible fraction of the read pairs get the wrong orientation and a much smaller insert size than what is expected.
View Article and Find Full Text PDFOver the last decade, methods have been developed for the reconstruction of gene trees that take into account the species tree. Many of these methods have been based on the probabilistic duplication-loss model, which describes how a gene-tree evolves over a species-tree with respect to duplication and losses, as well as extension of this model, e.g.
View Article and Find Full Text PDFBackground: The use of short reads from High Throughput Sequencing (HTS) techniques is now commonplace in de novo assembly. Yet, obtaining contiguous assemblies from short reads is challenging, thus making scaffolding an important step in the assembly pipeline. Different algorithms have been proposed but many of them use the number of read pairs supporting a linking of two contigs as an indicator of reliability.
View Article and Find Full Text PDFThe discriminatory power of the noncoding control region (CR) of domestic dog mitochondrial DNA alone is relatively low. The extent to which the discriminatory power could be increased by analyzing additional highly variable coding regions of the mitochondrial genome (mtGenome) was therefore investigated. Genetic variability across the mtGenome was evaluated by phylogenetic analysis, and the three most variable ~1 kb coding regions identified.
View Article and Find Full Text PDFBackground: Clustering sequences into families has long been an important step in characterization of genes and proteins. There are many algorithms developed for this purpose, most of which are based on either direct similarity between gene pairs or some sort of network structure, where weights on edges of constructed graphs are based on similarity. However, conserved synteny is an important signal that can help distinguish homology and it has not been utilized to its fullest potential.
View Article and Find Full Text PDFLateral gene transfer (LGT)--which transfers DNA between two non-vertically related individuals belonging to the same or different species--is recognized as a major force in prokaryotic evolution, and evidence of its impact on eukaryotic evolution is ever increasing. LGT has attracted much public attention for its potential to transfer pathogenic elements and antibiotic resistance in bacteria, and to transfer pesticide resistance from genetically modified crops to other plants. In a wider perspective, there is a growing body of studies highlighting the role of LGT in enabling organisms to occupy new niches or adapt to environmental changes.
View Article and Find Full Text PDFBackground: Distance methods are ubiquitous tools in phylogenetics. Their primary purpose may be to reconstruct evolutionary history, but they are also used as components in bioinformatic pipelines. However, poor computational efficiency has been a constraint on the applicability of distance methods on very large problem instances.
View Article and Find Full Text PDFGenetic markers, defined as variable regions of DNA, can be utilized for distinguishing individuals or populations. As long as markers are independent, it is easy to combine the information they provide. For nonrecombinant sequences like mtDNA, choosing the right set of markers for forensic applications can be difficult and requires careful consideration.
View Article and Find Full Text PDFBackground: In recent years more than 20 assemblers have been proposed to tackle the hard task of assembling NGS data. A common heuristic when assembling a genome is to use several assemblers and then select the best assembly according to some criteria. However, recent results clearly show that some assemblers lead to better statistics than others on specific regions but are outperformed on other regions or on different evaluation measures.
View Article and Find Full Text PDFBackground: PrIME-GenPhyloData is a suite of tools for creating realistic simulated phylogenetic trees, in particular for families of homologous genes. It supports generation of trees based on a birth-death process and--perhaps more interestingly--also supports generation of gene family trees guided by a known (synthetic or biological) species tree while accounting for events such as gene duplication, gene loss, and lateral gene transfer (LGT). The suite also supports a wide range of branch rate models enabling relaxation of the molecular clock.
View Article and Find Full Text PDFSummary: PrIME-DLRS (or colloquially: 'Delirious') is a phylogenetic software tool to simultaneously infer and reconcile a gene tree given a species tree. It accounts for duplication and loss events, a relaxed molecular clock and is intended for the study of homologous gene families, for example in a comparative genomics setting involving multiple species. PrIME-DLRS uses a Bayesian MCMC framework, where the input is a known species tree with divergence times and a multiple sequence alignment, and the output is a posterior distribution over gene trees and model parameters.
View Article and Find Full Text PDFMotivation: One of the important steps of genome assembly is scaffolding, in which contigs are linked using information from read-pairs. Scaffolding provides estimates about the order, relative orientation and distance between contigs. We have found that contig distance estimates are generally strongly biased and based on false assumptions.
View Article and Find Full Text PDFThe cystatin family comprises a group of generally broadly expressed protease inhibitors. The Cres/Testatin subgroup (CTES) genes within the type 2 cystatins differs from the classical type 2 cystatins in having a strikingly reproductive tissue-specific expression, and putative functions in reproduction have therefore been discussed. We have performed evolutionary studies of the CTES genes based on gene searches in genomes from 11 species.
View Article and Find Full Text PDFMotivation: New generation sequencing technologies producing increasingly complex datasets demand new efficient and specialized sequence analysis algorithms. Often, it is only the 'novel' sequences in a complex dataset that are of interest and the superfluous sequences need to be removed.
Results: A novel algorithm, fast and accurate classification of sequences (FACSs), is introduced that can accurately and rapidly classify sequences as belonging or not belonging to a reference sequence.
Cellulose biosynthesis is a vital but yet poorly understood biochemical process in Oomycetes. Here, we report the identification and characterization of the cellulose synthase genes (CesA) from Saprolegnia monoica. Southern blot experiments revealed the occurrence of three CesA homologues in this species and phylogenetic analyses confirmed that Oomycete CesAs form a clade of their own.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
April 2009
We present GSR, a probabilistic model integrating gene duplication, sequence evolution, and a relaxed molecular clock for substitution rates, that enables genomewide analysis of gene families. The gene duplication and loss process is a major cause for incongruence between gene and species tree, and deterministic methods have been developed to explain such differences through tree reconciliations. Although probabilistic methods for phylogenetic inference have been around for decades, probabilistic reconciliation methods are far less established.
View Article and Find Full Text PDFWe have identified a gene, denoted PttMAP20, which is strongly up-regulated during secondary cell wall synthesis and tightly coregulated with the secondary wall-associated CESA genes in hybrid aspen (Populus tremula x tremuloides). Immunolocalization studies with affinity-purified antibodies specific for PttMAP20 revealed that the protein is found in all cell types in developing xylem and that it is most abundant in cells forming secondary cell walls. This PttMAP20 protein sequence contains a highly conserved TPX2 domain first identified in a microtubule-associated protein (MAP) in Xenopus laevis.
View Article and Find Full Text PDF