To optimize proteins for particular traits holds great promise for industrial and pharmaceutical purposes. Machine Learning is increasingly applied in this field to predict properties of proteins, thereby guiding the experimental optimization process. A natural question is: How much progress are we making with such predictions, and how important is the choice of regressor and representation? In this paper, we demonstrate that different assessment criteria for regressor performance can lead to dramatically different conclusions, depending on the choice of metric, and how one defines generalization.
View Article and Find Full Text PDFThe mining of genomes from non-cultivated microorganisms using metagenomics is a powerful tool to discover novel proteins and other valuable biomolecules. However, function-based metagenome searches are often limited by the time-consuming expression of the active proteins in various heterologous host systems. We here report the initial characterization of novel single-subunit bacteriophage RNA polymerase, EM1 RNAP, identified from a metagenome data set obtained from an elephant dung microbiome.
View Article and Find Full Text PDFMotivation: Solubility and expression levels of proteins can be a limiting factor for large-scale studies and industrial production. By determining the solubility and expression directly from the protein sequence, the success rate of wet-lab experiments can be increased.
Results: In this study, we focus on predicting the solubility and usability for purification of proteins expressed in Escherichia coli directly from the sequence.
A crucial process in the production of industrial enzymes is recombinant gene expression, which aims to induce enzyme overexpression of the genes in a host microbe. Current approaches for securing overexpression rely on molecular tools such as adjusting the recombinant expression vector, adjusting cultivation conditions, or performing codon optimizations. However, such strategies are time-consuming, and an alternative strategy would be to select genes for better compatibility with the recombinant host.
View Article and Find Full Text PDFA phylogenetic and metagenomic study of elephant feces samples (derived from a three-weeks-old and a six-years-old Asian elephant) was conducted in order to describe the microbiota inhabiting this large land-living animal. The microbial diversity was examined via 16S rRNA gene analysis. We generated more than 44,000 GS-FLX+454 reads for each animal.
View Article and Find Full Text PDFmicroRNAs are small regulatory RNAs that are currently emerging as new biomarkers for cancer and other diseases. In order for biomarkers to be useful in clinical settings, they should be accurately and reliably detected in clinical samples such as formalin fixed paraffin embedded (FFPE) sections and blood serum or plasma. These types of samples represent a challenge in terms of microRNA quantification.
View Article and Find Full Text PDFRecently, next-generation sequencing has been introduced as a promising, new platform for assessing the copy number of transcripts, while the existing microarray technology is considered less reliable for absolute, quantitative expression measurements. Nonetheless, so far, results from the two technologies have only been compared based on biological data, leading to the conclusion that, although they are somewhat correlated, expression values differ significantly. Here, we use synthetic RNA samples, resembling human microRNA samples, to find that microarray expression measures actually correlate better with sample RNA content than expression measures obtained from sequencing data.
View Article and Find Full Text PDFThroughout time functional immunology has accumulated vast amounts of quantitative and qualitative data relevant to the design and discovery of vaccines. Such data includes, but is not limited to, components of the host and pathogen genome (including antigens and virulence factors), T- and B-cell epitopes and other components of the antigen presentation pathway and allergens. In this review the authors discuss a range of databases that archive such data.
View Article and Find Full Text PDFBMC Bioinformatics
November 2006
Background: Modelling the interaction between potentially antigenic peptides and Major Histocompatibility Complex (MHC) molecules is a key step in identifying potential T-cell epitopes. For Class II MHC alleles, the binding groove is open at both ends, causing ambiguity in the positional alignment between the groove and peptide, as well as creating uncertainty as to what parts of the peptide interact with the MHC. Moreover, the antigenic peptides have variable lengths, making naive modelling methods difficult to apply.
View Article and Find Full Text PDF