Biomolecular condensates are membraneless organelles that can concentrate hundreds of different proteins in cells to operate essential biological functions. However, accurate identification of their components remains challenging and biased towards proteins with high structural disorder content with focus on self-phase separating (driver) proteins. Here, we present a machine learning algorithm, PICNIC (Proteins Involved in CoNdensates In Cells) to classify proteins that localize to biomolecular condensates regardless of their role in condensate formation.
View Article and Find Full Text PDFMotivation: Errors in the processing of genetic information during protein synthesis can lead to phenotypic mutations, such as amino acid substitutions, e.g. by transcription or translation errors.
View Article and Find Full Text PDFRobust large-scale sequence analysis is a major challenge in modern genomic science, where biologists are frequently trying to characterize many millions of sequences. Here, we describe a new Java-based architecture for the widely used protein function prediction software package InterProScan. Developments include improvements and additions to the outputs of the software and the complete reimplementation of the software framework, resulting in a flexible and stable system that is able to use both multiprocessor machines and/or conventional clusters to achieve scalable distributed data analysis.
View Article and Find Full Text PDFMetagenomics is a relatively recently established but rapidly expanding field that uses high-throughput next-generation sequencing technologies to characterize the microbial communities inhabiting different ecosystems (including oceans, lakes, soil, tundra, plants and body sites). Metagenomics brings with it a number of challenges, including the management, analysis, storage and sharing of data. In response to these challenges, we have developed a new metagenomics resource (http://www.
View Article and Find Full Text PDFSeveral thousand metagenomes have already been sequenced, and this number is set to grow rapidly in the forthcoming years as the uptake of high-throughput sequencing technologies continues. Hand-in-hand with this data bonanza comes the computationally overwhelming task of analysis. Herein, we describe some of the bioinformatic approaches currently used by metagenomics researchers to analyze their data, the issues they face and the steps that could be taken to help overcome these challenges.
View Article and Find Full Text PDF