Motivation: Increasing efforts are being made in the field of machine learning to advance the learning of robust and accurate models from experimentally measured data and enable more efficient drug discovery processes. The prediction of binding affinity is one of the most frequent tasks of compound bioactivity modelling. Learned models for binding affinity prediction are assessed by their average performance on unseen samples, but point predictions are typically not provided with a rigorous confidence assessment.
View Article and Find Full Text PDFDue to sedentary lifestyle and harsh environmental conditions, gorgonian coral extracts are recognized as a rich source of novel compounds with various biological activities, of interest to the pharmaceutical and cosmetic industries. The presented study aimed to perform chemical screening of organic extracts and semi-purified fractions obtained from the common Adriatic gorgonian, sea fan, (Koch, 1887) and explore its abilities to exert different biological effects in vitro. Qualitative chemical evaluation revealed the presence of several classes of secondary metabolites extended with mass spectrometry analysis and tentative dereplication by using Global Natural Product Social Molecular Networking online platform (GNPS).
View Article and Find Full Text PDFIn this work we introduce a novel filtering and molecular modeling pipeline based on a fingerprint and descriptor similarity procedure, coupled with molecular docking and molecular dynamics (MD), to select potential novel quoinone outside inhibitors (QoI) of cytochrome bc1 with the aim of determining the same or different chromophores to usual. The study was carried out using the yeast cytochrome bc1 complex with its docked ligand (stigmatellin), using all the fungicides from FRAC code C3 mode of action, 8617 Drugbank compounds and 401,624 COCONUT compounds. The introduced drug repurposing pipeline consists of compound similarity with C3 fungicides and molecular docking (MD) simulations with final QM/MM binding energy determination, while aiming for potential novel chromophores and perserving at least an amide (R1HN(C=O)R2) or ester functional group of almost all up to date C3 fungicides.
View Article and Find Full Text PDFThe limited number of medicinal products available to treat of fungal infections makes control of fungal pathogens problematic, especially since the number of fungal resistance incidents increases. Given the high costs and slow development of new antifungal treatment options, repurposing of already known compounds is one of the proposed strategies. The objective of this study was to perform in vitro experimental tests of already identified lead compounds in our previous in silico drug repurposing study, which had been conducted on the known Drugbank database using a seven-step procedure which includes machine learning and molecular docking.
View Article and Find Full Text PDFWidespread use of herbicides results in the global increase in weed resistance. The rotational use of herbicides according to their modes of action (MoAs) and discovery of novel phytotoxic molecules are the two strategies used against the weed resistance. Herein, Random Forest modeling was used to build predictive models and establish comprehensive characterization of structure-activity relationships underlying herbicide classifications according to their MoAs and weed selectivity.
View Article and Find Full Text PDFMachines usually employ a guess-and-check strategy to analyze data: they take the data, make a guess, check the answer, adjust it with regard to the correct one if necessary, and try again on a new data set. An active learning environment guarantees better performance while training on less, but carefully chosen, data which reduces the costs of both annotating and analyzing large data sets. This issue becomes even more critical for deep learning applications.
View Article and Find Full Text PDFNovel machine learning and molecular modelling filtering procedures for drug repurposing have been carried out for the recognition of the novel fungicide targets of Cyp51 and Erg2. Classification and regression approaches on molecular descriptors have been performed using stepwise multilinear regression (FS-MLR), uninformative-variable elimination partial-least square regression, and a non-linear method called Forward Stepwise Limited Correlation Random Forest (FS-LM-RF). Altogether, 112 prediction models from two different approaches have been built for the descriptor recognition of fungicide hit compounds.
View Article and Find Full Text PDFGenes with similar roles in the cell cluster on chromosomes, thus benefiting from coordinated regulation. This allows gene function to be inferred by transferring annotations from genomic neighbors, following the guilt-by-association principle. We performed a systematic search for co-occurrence of >1000 gene functions in genomic neighborhoods across 1669 prokaryotic, 49 fungal and 80 metazoan genomes, revealing prevalent patterns that cannot be explained by clustering of functionally similar genes.
View Article and Find Full Text PDFBackground: The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function.
Results: Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes.
Recent research in machine learning pointed to the core problem of state-of-the-art models which impedes their widespread adoption in different domains. The models' inability to differentiate between noise and subtle, yet significant variation in data leads to their vulnerability to adversarial perturbations that cause wrong predictions with high confidence. The study is aimed at identifying whether the algorithms inspired by biological evolution may achieve better results in cases where brittle robustness properties are highly sensitive to the slight noise.
View Article and Find Full Text PDFBackground: The function of many genes is still not known even in model organisms. An increasing availability of microbiome DNA sequencing data provides an opportunity to infer gene function in a systematic manner.
Results: We evaluated if the evolutionary signal contained in metagenome phyletic profiles (MPP) is predictive of a broad array of gene functions.
Based on a set of subjects and a collection of attributes obtained from the Alzheimer's Disease Neuroimaging Initiative database, we used redescription mining to find interpretable rules revealing associations between those determinants that provide insights about the Alzheimer's disease (AD). We extended the CLUS-RM redescription mining algorithm to a constraint-based redescription mining (CBRM) setting, which enables several modes of targeted exploration of specific, user-constrained associations. Redescription mining enabled finding specific constructs of clinical and biological attributes that describe many groups of subjects of different size, homogeneity and levels of cognitive impairment.
View Article and Find Full Text PDFBacteria and Archaea display a variety of phenotypic traits and can adapt to diverse ecological niches. However, systematic annotation of prokaryotic phenotypes is lacking. We have therefore developed ProTraits, a resource containing ∼545 000 novel phenotype inferences, spanning 424 traits assigned to 3046 bacterial and archaeal species.
View Article and Find Full Text PDFMotivation: The number of sequenced genomes rises steadily but we still lack the knowledge about the biological roles of many genes. Automated function prediction (AFP) is thus a necessity. We hypothesized that AFP approaches that draw on distinct genome features may be useful for predicting different types of gene functions, motivating a systematic analysis of the benefits gained by obtaining and integrating such predictions.
View Article and Find Full Text PDFDetection of patient zero can give new insights to epidemiologists about the nature of first transmissions into a population. In this Letter, we study the statistical inference problem of detecting the source of epidemics from a snapshot of spreading on an arbitrary network structure. By using exact analytic calculations and Monte Carlo estimators, we demonstrate the detectability limits for the susceptible-infected-recovered model, which primarily depend on the spreading process characteristics.
View Article and Find Full Text PDFMotivated by recent financial crises, significant research efforts have been put into studying contagion effects and herding behaviour in financial markets. Much less has been said regarding the influence of financial news on financial markets. We propose a novel measure of collective behaviour based on financial news on the Web, the News Cohesiveness Index (NCI), and we demonstrate that the index can be used as a financial market volatility indicator.
View Article and Find Full Text PDFP-glycoprotein (P-gp, MDR1) is a promiscuous drug efflux pump of substantial pharmacological importance. Taking advantage of large-scale cytotoxicity screening data involving 60 cancer cell lines, we correlated the differential biological activities of ∼13,000 compounds against cellular P-gp levels. We created a large set of 934 high-confidence P-gp substrates or nonsubstrates by enforcing agreement with an orthogonal criterion involving P-gp overexpressing ADR-RES cells.
View Article and Find Full Text PDFAutomated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high.
View Article and Find Full Text PDFNew microbial genomes are sequenced at a high pace, allowing insight into the genetics of not only cultured microbes, but a wide range of metagenomic collections such as the human microbiome. To understand the deluge of genomic data we face, computational approaches for gene functional annotation are invaluable. We introduce a novel model for computational annotation that refines two established concepts: annotation based on homology and annotation based on phyletic profiling.
View Article and Find Full Text PDFOutcomes of high-throughput biological experiments are typically interpreted by statistical testing for enriched gene functional categories defined by the Gene Ontology (GO). The resulting lists of GO terms may be large and highly redundant, and thus difficult to interpret.REVIGO is a Web server that summarizes long, unintelligible lists of GO terms by finding a representative subset of the terms using a simple clustering algorithm that relies on semantic similarity measures.
View Article and Find Full Text PDF18-crown-6 ethers are known to exert their biological activity by transporting K(+) ions across cell membranes. Using non-linear Support Vector Machines regression, we searched for structural features that influence antiproliferative activity in a diverse set of 19 known oxa-, monoaza- and diaza-18-crown-6 ethers. Here, we show that the logP of the molecule is the most important molecular descriptor, among ∼1300 tested descriptors, in determining biological potency (R(2)(cv) = 0.
View Article and Find Full Text PDFBackground: Prokaryotic environmental adaptations occur at different levels within cells to ensure the preservation of genome integrity, proper protein folding and function as well as membrane fluidity. Although specific composition and structure of cellular components suitable for the variety of extreme conditions has already been postulated, a systematic study describing such adaptations has not yet been performed. We therefore explored whether the environmental niche of a prokaryote could be deduced from the sequence of its proteome.
View Article and Find Full Text PDFSix recently synthesized cyano-substituted heteroaryles, which do not bind to DNA but are highly cytotoxic against the human tumor cell line HeLa, were analyzed for their antitumor mechanisms of action (MOA). They did not interfere with the expression of human papillomavirus oncogenes integrated in the HeLa cell genome, but they did induce strong G1 arrest and result in the activation of caspase-3 and apoptosis. A computational analysis was performed that compared the antiproliferative activities of our compounds in 13 different tumor cell lines with those of compounds listed in the National Cancer Institute database.
View Article and Find Full Text PDFCodon usage bias in prokaryotic genomes is largely a consequence of background substitution patterns in DNA, but highly expressed genes may show a preference towards codons that enable more efficient and/or accurate translation. We introduce a novel approach based on supervised machine learning that detects effects of translational selection on genes, while controlling for local variation in nucleotide substitution patterns represented as sequence composition of intergenic DNA. A cornerstone of our method is a Random Forest classifier that outperformed previous distance measure-based approaches, such as the codon adaptation index, in the task of discerning the (highly expressed) ribosomal protein genes by their codon frequencies.
View Article and Find Full Text PDFA recent investigation concluded that codon bias did not affect expression of green fluorescent protein (GFP) variants in Escherichia coli, while stability of an mRNA secondary structure near the 5' end played a dominant role. We demonstrate that combining the two variables using regression trees or support vector regression yields a biologically plausible model with better support in the GFP data set and in other experimental data: codon usage is relevant for protein levels if the 5' mRNA structures are not strong. Natural E.
View Article and Find Full Text PDF