Motivation: Clustering of genetic sequences is one of the key parts of bioinformatics analyses. Resulting phylogenetic trees are beneficial for solving many research questions, including tracing the history of species, studying migration in the past, or tracing a source of a virus outbreak. At the same time, biologists provide more data in the raw form of reads or only on contig-level assembly.
View Article and Find Full Text PDFBackground: Identification of non-trivial and meaningful patterns in omics data is one of the most important biological tasks. The patterns help to better understand biological systems and interpret experimental outcomes. A well-established method serving to explain such biological data is Gene Set Enrichment Analysis.
View Article and Find Full Text PDFBackground: One of the major challenges in the analysis of gene expression data is to identify local patterns composed of genes showing coherent expression across subsets of experimental conditions. Such patterns may provide an understanding of underlying biological processes related to these conditions. This understanding can further be improved by providing concise characterizations of the genes and situations delimiting the pattern.
View Article and Find Full Text PDFBMC Bioinformatics
October 2015
Background: Set-level classification of gene expression data has received significant attention recently. In this setting, high-dimensional vectors of features corresponding to genes are converted into lower-dimensional vectors of features corresponding to biologically interpretable gene sets. The dimensionality reduction brings the promise of a decreased risk of overfitting, potentially resulting in improved accuracy of the learned classifiers.
View Article and Find Full Text PDFBackground: Delayed graft function (DGF) caused by ischemia/reperfusion injury (I/RI) negatively influences the outcome of kidney transplantation. This prospective single-center study characterized the intrarenal transcriptome during I/RI as a means of identifying genes associated with DGF development.
Methods: Characterization of the intrarenal transcription profile associated with I/RI was carried out on three sequential graft biopsies from respective allografts before and during transplantation.
Background: The process of protein-DNA binding has an essential role in the biological processing of genetic information. We use relational machine learning to predict DNA-binding propensity of proteins from their structures. Automatically discovered structural features are able to capture some characteristic spatial configurations of amino acids in proteins.
View Article and Find Full Text PDFWe contribute a novel, ball-histogram approach to DNA-binding propensity prediction of proteins. Unlike state-of-the-art methods based on constructing an ad-hoc set of features describing physicochemical properties of the proteins, the ball-histogram technique enables a systematic, Monte-Carlo exploration of the spatial distribution of amino acids complying with automatically selected properties. This exploration yields a model for the prediction of DNA binding propensity.
View Article and Find Full Text PDFBackground: Analysis of gene expression data in terms of a priori-defined gene sets has recently received significant attention as this approach typically yields more compact and interpretable results than those produced by traditional methods that rely on individual genes. The set-level strategy can also be adopted with similar benefits in predictive classification tasks accomplished with machine learning algorithms. Initial studies into the predictive performance of set-level classifiers have yielded rather controversial results.
View Article and Find Full Text PDFBackground: Induction therapy is associated with excellent short-term kidney graft outcome. The aim of this study was to evaluate differences in the intragraft transcriptome after successful induction therapy using two rabbit antithymocyte globulins.
Methods: The expression of 376 target genes involved in tolerance, inflammation, T- and B-cell immune response, and apoptosis was evaluated using the quantitative real-time reverse-transcriptase polymerase chain reaction (2(-ΔΔCt)) method in kidney graft biopsies with normal histological findings and stable renal function, 3 months posttransplantation after induction therapy with Thymoglobulin, ATG-Fresenius S (ATG-F), and a control group without induction therapy.
Finding disease markers (classifiers) from gene expression data by machine learning algorithms is characterized by a high risk of overfitting the data due the abundance of attributes (simultaneously measured gene expression values) and shortage of available examples (observations). To avoid this pitfall and achieve predictor robustness, state-of-the-art approaches construct complex classifiers that combine relatively weak contributions of up to thousands of genes (attributes) to classify a disease. The complexity of such classifiers limits their transparency and consequently the biological insights they can provide.
View Article and Find Full Text PDF