Recent studies have demonstrated that mass spectrometry-based variant detection is feasible. Typically, either genomic variant databases or transcript data are used to construct customized target databases for the identification of single-amino acid variants in mass spectrometry data. However, both approaches require additional data to perform the identification of SAAVs.
View Article and Find Full Text PDFUnlabelled: Ongoing advances in high-throughput technologies have facilitated accurate proteomic measurements and provide a wealth of information on genomic and transcript level. In proteogenomics, this multi-omics data is combined to analyze unannotated organisms and to allow more accurate sample-specific predictions. Existing analysis methods still mainly depend on six-frame translations or reference protein databases that are extended by transcriptomic information or known single nucleotide polymorphisms (SNPs).
View Article and Find Full Text PDFBackground: Gene prediction is a challenging but crucial part in most genome analysis pipelines. Various methods have evolved that predict genes ab initio on reference sequences or evidence based with the help of additional information, such as RNA-Seq reads or EST libraries. However, none of these strategies is bias-free and one method alone does not necessarily provide a complete set of accurate predictions.
View Article and Find Full Text PDFWhile initial phylogenetic analyses concluded to Guinea 2014 EBOV falling outside the Zaïre lineage (ZEBOV), a recent re-analysis of the same dataset by Dudas and Rambaut (2014) suggested that Guinea 2014 EBOV actually is ZEBOV. Under the same hypothesis as used by these authors (the molecular clock hypothesis), we reinforce their conclusion by providing a statistical assessment of the location of the root of the Zaïre lineage. Our analysis unambiguously supports Guinea 2014 EBOV as a member of the Zaïre lineage.
View Article and Find Full Text PDFMotivation: The reliable identification of genes is a major challenge in genome research, as further analysis depends on the correctness of this initial step. With high-throughput RNA-Seq data reflecting currently expressed genes, a particularly meaningful source of information has become commonly available for gene finding. However, practical application in automated gene identification is still not the standard case.
View Article and Find Full Text PDFMotivation: Accurate estimation, comparison and evaluation of read mapping error rates is a crucial step in the processing of next-generation sequencing data, as further analysis steps and interpretation assume the correctness of the mapping results. Current approaches are either focused on sensitivity estimation and thereby disregard specificity or are based on read simulations. Although continuously improving, read simulations are still prone to introduce a bias into the mapping error quantitation and cannot capture all characteristics of an individual dataset.
View Article and Find Full Text PDFMotivation: Genome coverage, the number of sequencing reads mapped to a position in a genome, is an insightful indicator of irregularities within sequencing experiments. While the average genome coverage is frequently used within algorithms in computational genomics, the complete information available in coverage profiles (i.e.
View Article and Find Full Text PDFCurrently, the reliable identification of peptides and proteins is only feasible when thoroughly annotated sequence databases are available. Although sequencing capacities continue to grow, many organisms remain without reliable, fully annotated reference genomes required for proteomic analyses. Standard database search algorithms fail to identify peptides that are not exactly contained in a protein database.
View Article and Find Full Text PDFMotivation: In systematic biology, one is often faced with the task of comparing different phylogenetic trees, in particular in multi-gene analysis or cospeciation studies. One approach is to use a tanglegram in which two rooted phylogenetic trees are drawn opposite each other, using auxiliary lines to connect matching taxa. There is an increasing interest in using rooted phylogenetic networks to represent evolutionary history, so as to explicitly represent reticulate events, such as horizontal gene transfer, hybridization or reassortment.
View Article and Find Full Text PDF