Background: During the last decade, a great number of extremely valuable large-scale genomics and proteomics datasets have become available to the research community. In addition, dropping costs for conducting high-throughput sequencing experiments and the option to outsource them considerably contribute to an increasing number of researchers becoming active in this field. Even though various computational approaches have been developed to analyze these data, it is still a laborious task involving prudent integration of many heterogeneous and frequently updated data sources, creating a barrier for interested scientists to accomplish their own analysis.
Results: We have implemented Dintor, a data integration framework that provides a set of over 30 tools to assist researchers in the exploration of genomics and proteomics datasets. Each of the tools solves a particular task and several tools can be combined into data processing pipelines. Dintor covers a wide range of frequently required functionalities, from gene identifier conversions and orthology mappings to functional annotation of proteins and genetic variants up to candidate gene prioritization and Gene Ontology-based gene set enrichment analysis. Since the tools operate on constantly changing datasets, we provide a mechanism to unambiguously link tools with different versions of archived datasets, which guarantees reproducible results for future tool invocations. We demonstrate a selection of Dintor's capabilities by analyzing datasets from four representative publications. The open source software can be downloaded and installed on a local Unix machine. For reasons of data privacy it can be configured to retrieve local data only. In addition, the Dintor tools are available on our public Galaxy web service at http://dintor.eurac.edu .
Conclusions: Dintor is a computational annotation framework for the analysis of genomic and proteomic datasets, providing a rich set of tools that cover the most frequently encountered tasks. A major advantage is its capability to consistently handle multiple versions of tool-associated datasets, supporting the researcher in delivering reproducible results.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4687148 | PMC |
http://dx.doi.org/10.1186/s12864-015-2279-5 | DOI Listing |
Bioinformatics
January 2025
Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion and Reproduction, Faculty of Medicine, Imperial College London, London, W12 0NN, United Kingdom.
Unlabelled: Metabolomics extensively utilizes Nuclear Magnetic Resonance (NMR) spectroscopy due to its excellent reproducibility and high throughput. Both one-dimensional (1D) and two-dimensional (2D) NMR spectra provide crucial information for metabolite annotation and quantification, yet present complex overlapping patterns which may require sophisticated machine learning algorithms to decipher. Unfortunately, the limited availability of labeled spectra can hamper application of machine learning, especially deep learning algorithms which require large amounts of labelled data.
View Article and Find Full Text PDFViruses
January 2025
Biological Sciences Department, University of Pittsburgh, Pittsburgh, PA 15260, USA.
Six novel phages belonging to the family were isolated using as a host. Phages MuffinTheCat, Badulia, DesireeRose, Bee17, SCoupsA, and LuzDeMundo were purified from environmental samples by students participating in the Science Education Alliance Phage Hunters Advancing Genomics and Evolutionary Science (SEA-PHAGES) program at Alliance University, New York. The phages have linear dsDNA genomes 15,438-15,636 bp with 112-120 bp inverted terminal repeats.
View Article and Find Full Text PDFPathogens
January 2025
Research and Production Center for Microbiology and Virology, Almaty 050010, Kazakhstan.
While studying the prevalence and profile of antibiotic resistance among isolated from the feces of calves with signs of colibacillosis, a strain with a wide spectrum of drug resistance was isolated. Whole-genome sequencing, followed by bioinformatic processing and the annotation of genes of this strain, showed that the genome has a total length of 4,803,482 bp and contains 4986 genes, including 122 RNA genes. A total of 31% of the genes are functionally significant and represent 26 functional groups.
View Article and Find Full Text PDFPathogens
January 2025
Centro de Biología Molecular Severo Ochoa, Consejo Superior de Investigaciones Científicas, Universidad Autónoma de Madrid, Cantoblanco, 28049 Madrid, Spain.
is the causative agent of Chagas disease, a neglected tropical disease, and one of the most important parasitic diseases worldwide. The first genome of was sequenced in 2005, and its complexity made assembly and annotation challenging. Nowadays, new sequencing methods have improved some strains' genome sequence and annotation, revealing this parasite's extensive genetic diversity and complexity.
View Article and Find Full Text PDFPathogens
January 2025
Department of Clinical Laboratory, Beijing Chest Hospital, Beijing Tuberculosis and Thoracic Tumor Institute, Capital Medical University, Beijing 101100, China.
The aim of this study was to reveal diagnostic biomarkers of considerable importance for common pathogenic , utilizing pan-genomic and comparative genome analysis to accurately characterize clinical infections. In this study, complete or assembled genome sequences of common pathogenic and closely related species were obtained from NCBI as discovery and validation sets, respectively. Genome annotation was performed using Prokka software, and pan-genomic analysis and extraction of core genes were performed using BPGA software.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!