Many publicly available databases provide disease related data, that makes it possible to link genomic data to medical and meta-data. The cancer genome atlas (TCGA), for example, compiles tens of thousand of datasets covering a wide array of cancer types. Here we introduce an interactive and highly automatized TCGA-based workflow that links and analyses epigenomic and transcriptomic data with treatment and survival data in order to identify possible biomarkers that indicate treatment success.
View Article and Find Full Text PDFMitochondrial tRNAs have acquired a diverse portfolio of aberrant structures throughout metazoan evolution. With the availability of more than 12,500 mitogenome sequences, it is essential to compile a comprehensive overview of the pattern changes with regard to mitochondrial tRNA repertoire and structural variations. This, of course, requires reanalysis of the sequence data of more than 250,000 mitochondrial tRNAs with a uniform workflow.
View Article and Find Full Text PDFThe Neuropeptide Y/RFamide-like receptors belong to the Rhodopsin-like G protein-coupled receptors G protein-coupled receptors (GPCRs) and are involved in functions such as locomotion, feeding and reproduction. With 41 described receptors they form the best-studied group of neuropeptide GPCRs in . In order to understand the expansion of the Neuropeptide Y/RFamide-like receptor family in nematodes, we started from the sequences of selected receptor paralogs in as query and surveyed the corresponding orthologous sequences in another 159 representative nematode target genomes.
View Article and Find Full Text PDFPurpose: Reaction databases are a key resource for a wide variety of applications in computational chemistry and biochemistry, including Computer-aided Synthesis Planning (CASP) and the large-scale analysis of metabolic networks. The full potential of these resources can only be realized if datasets are accurate and complete. Missing co-reactants and co-products, i.
View Article and Find Full Text PDFHepatitis C virus (HCV) is a plus-stranded RNA virus that often chronically infects liver hepatocytes and causes liver cirrhosis and cancer. These viruses replicate their genomes employing error-prone replicases. Thereby, they routinely generate a large 'cloud' of RNA genomes (quasispecies) which-by trial and error-comprehensively explore the sequence space available for functional RNA genomes that maintain the ability for efficient replication and immune escape.
View Article and Find Full Text PDFExtrinsic, experimental information can be incorporated into thermodynamics-based RNA folding algorithms in the form of pseudo-energies. Evolutionary conservation of RNA secondary structure elements is detectable in alignments of phylogenetically related sequences and provides evidence for the presence of certain base pairs that can also be converted into pseudo-energy contributions. We show that the centroid base pairs computed from a consensus folding model such as RNAalifold result in a substantial improvement of the prediction accuracy for single sequences.
View Article and Find Full Text PDFG-protein-coupled receptors (GPCRs) activate heterotrimeric G proteins by promoting guanine nucleotide exchange. Here, we investigate the coupling of G proteins with GPCRs and describe the events that ultimately lead to the ejection of GDP from its binding pocket in the Gα subunit, the rate-limiting step during G-protein activation. Using molecular dynamics simulations, we investigate the temporal progression of structural rearrangements of GDP-bound G protein (G·GDP; hereafter G) upon coupling to the β-adrenergic receptor (βAR) in atomic detail.
View Article and Find Full Text PDFOver the last quarter of a century it has become clear that RNA is much more than just a boring intermediate in protein expression. Ancient RNAs still appear in the core information metabolism and comprise a surprisingly large component in bacterial gene regulation. A common theme with these types of mostly small RNAs is their reliance of conserved secondary structures.
View Article and Find Full Text PDFMost genes are part of larger families of evolutionary-related genes. The history of gene families typically involves duplications and losses of genes as well as horizontal transfers into other organisms. The reconstruction of detailed gene family histories, i.
View Article and Find Full Text PDFStructural changes in RNAs are an important contributor to controlling gene expression not only at the posttranscriptional stage but also during transcription. A subclass of riboswitches and RNA thermometers located in the 5' region of the primary transcript regulates the downstream functional unit - usually an ORF - through premature termination of transcription. Not only such elements occur naturally, but they are also attractive devices in synthetic biology.
View Article and Find Full Text PDFGraphs have become widely used to represent and study social, biological, and technological systems. Statistical methods to analyze empirical graphs were proposed based on the graph's spectral density. However, their running time is cubic in the number of vertices, precluding direct application to large instances.
View Article and Find Full Text PDFThe accurate classification of non-coding RNA (ncRNA) sequences is pivotal for advanced non-coding genome annotation and analysis, a fundamental aspect of genomics that facilitates understanding of ncRNA functions and regulatory mechanisms in various biological processes. While traditional machine learning approaches have been employed for distinguishing ncRNA, these often necessitate extensive feature engineering. Recently, deep learning algorithms have provided advancements in ncRNA classification.
View Article and Find Full Text PDFBiological relatedness is a key consideration in studies of behavior, population structure, and trait evolution. Except for parent-offspring dyads, pedigrees capture relatedness imperfectly. The number and length of DNA segments that are identical-by-descent (IBD) yield the most precise estimates of relatedness.
View Article and Find Full Text PDFProteinortho is a widely used tool to predict (co)-orthologous groups of genes for any set of species. It finds application in comparative and functional genomics, phylogenomics, and evolutionary reconstructions. With a rapidly increasing number of available genomes, the demand for large-scale predictions is also growing.
View Article and Find Full Text PDFSeveral computational frameworks and workflows that recover genomes from prokaryotes, eukaryotes and viruses from metagenomes exist. Yet, it is difficult for scientists with little bioinformatics experience to evaluate quality, annotate genes, dereplicate, assign taxonomy and calculate relative abundance and coverage of genomes belonging to different domains. MuDoGeR is a user-friendly tool tailored for those familiar with Unix command-line environment that makes it easy to recover genomes of prokaryotes, eukaryotes and viruses from metagenomes, either alone or in combination.
View Article and Find Full Text PDFSummary: RNA molecules play crucial roles in various biological processes. They mediate their function mainly by interacting with other RNAs or proteins. At present, information about these interactions is distributed over different resources, often providing the data in simple tab-delimited formats that differ between the databases.
View Article and Find Full Text PDFAlgorithms Mol Biol
November 2023
Background: Evolutionary scenarios describing the evolution of a family of genes within a collection of species comprise the mapping of the vertices of a gene tree T to vertices and edges of a species tree S. The relative timing of the last common ancestors of two extant genes (leaves of T) and the last common ancestors of the two species (leaves of S) in which they reside is indicative of horizontal gene transfers (HGT) and ancient duplications. Orthologous gene pairs, on the other hand, require that their last common ancestors coincides with a corresponding speciation event.
View Article and Find Full Text PDFAnim Microbiome
October 2023
Background: Metagenomic data can shed light on animal-microbiome relationships and the functional potential of these communities. Over the past years, the generation of metagenomics data has increased exponentially, and so has the availability and reusability of data present in public repositories. However, identifying which datasets and associated metadata are available is not straightforward.
View Article and Find Full Text PDFJ Integr Bioinform
September 2023
The differentiation of regions with coding potential from non-coding regions remains a key task in computational biology. Methods such as RNAcode that exploit patterns of sequence conservation for this task have a substantial advantage in classification accuracy in particular for short coding sequences, compared to methods that rely on a single input sequence. However, they require sequence alignments as input.
View Article and Find Full Text PDFThe prediction of non-coding and protein-coding genetic loci has received considerable attention in comparative genomics aiming in particular at the identification of properties of nucleotide sequences that are informative of their biological role in the cell. We present here a software framework for the alignment-based training, evaluation and application of machine learning models with user-defined parameters. Instead of focusing on the one-size-fits-all approach of pervasive annotation pipelines, we offer a framework for the structured generation and evaluation of models based on arbitrary features and input data, focusing on stable and explainable results.
View Article and Find Full Text PDFRooted acyclic graphs appear naturally when the phylogenetic relationship of a set X of taxa involves not only speciations but also recombination, horizontal transfer, or hybridization that cannot be captured by trees. A variety of classes of such networks have been discussed in the literature, including phylogenetic, level-1, tree-child, tree-based, galled tree, regular, or normal networks as models of different types of evolutionary processes. Clusters arise in models of phylogeny as the sets [Formula: see text] of descendant taxa of a vertex v.
View Article and Find Full Text PDFMost of the functional RNA elements located within large transcripts are local. Local folding therefore serves a practically useful approximation to global structure prediction. Due to the sensitivity of RNA secondary structure prediction to the exact definition of sequence ends, accuracy can be increased by averaging local structure predictions over multiple, overlapping sequence windows.
View Article and Find Full Text PDFBackground: RNA features a highly negatively charged phosphate backbone that attracts a cloud of counter-ions that reduce the electrostatic repulsion in a concentration dependent manner. Ion concentrations thus have a large influence on folding and stability of RNA structures. Despite their well-documented effects, salt effects are not handled consistently by currently available secondary structure prediction algorithms.
View Article and Find Full Text PDF