The function of most genes is unknown. The best results in automated function prediction are obtained with machine learning-based methods that combine multiple data sources, typically sequence derived features, protein structure and interaction data. Even though there is ample evidence showing that a gene's function is not independent of its location, the few available examples of gene function prediction based on gene location rely on sequence identity between genes of different organisms and are thus subjected to the limitations of the relationship between sequence and function.
View Article and Find Full Text PDFBackground: Assembly and function of neuronal synapses require the coordinated expression of a yet undetermined set of genes. Previously, we had trained an ensemble machine learning model to assign a probability of having synaptic function to every protein-coding gene in Drosophila melanogaster. This approach resulted in the publication of a catalogue of 893 genes which we postulated to be very enriched in genes with a still undocumented synaptic function.
View Article and Find Full Text PDFSummary: Genes sharing functions, expression patterns or quantitative traits are not randomly distributed along eukaryotic genomes. In order to study the distribution of genes that share a given feature, we present Cluster Locator, an online analysis and visualization tool. Cluster Locator determines the number, size and position of all the clusters formed by the protein-coding genes on a list according to a given maximum gap, the percentage of gene clustering of the list and its statistical significance.
View Article and Find Full Text PDFBackground: Assembly and function of neuronal synapses require the coordinated expression of a yet undetermined set of genes. Although roughly a thousand genes are expected to be important for this function in Drosophila melanogaster, just a few hundreds of them are known so far.
Results: In this work we trained three learning algorithms to predict a "synaptic function" for genes of Drosophila using data from a whole-body developmental transcriptome published by others.