Background: The ability to sequence the transcriptomes of single cells using single-cell RNA-seq sequencing technologies presents a shift in the scientific paradigm where scientists, now, are able to concurrently investigate the complex biology of a heterogeneous population of cells, one at a time. However, till date, there has not been a suitable computational methodology for the analysis of such intricate deluge of data, in particular techniques which will aid the identification of the unique transcriptomic profiles difference between the different cellular subtypes. In this paper, we describe the novel methodology for the analysis of single-cell RNA-seq data, obtained from neocortical cells and neural progenitor cells, using machine learning algorithms (Support Vector machine (SVM) and Random Forest (RF)).
View Article and Find Full Text PDFProtein-protein interactions (PPIs) are important for understanding the cellular mechanisms of biological functions, but the reliability of PPIs extracted by high-throughput assays is known to be low. To address this, many current methods use multiple evidence from different sources of information to compute reliability scores for such PPIs. However, they often combine the evidence without taking into account the uncertainty of the evidence values, potential dependencies between the information sources used and missing values from some information sources.
View Article and Find Full Text PDFAn increasing number of genes have been experimentally confirmed in recent years as causative genes to various human diseases. The newly available knowledge can be exploited by machine learning methods to discover additional unknown genes that are likely to be associated with diseases. In particular, positive unlabeled learning (PU learning) methods, which require only a positive training set P (confirmed disease genes) and an unlabeled set U (the unknown candidate genes) instead of a negative training set N, have been shown to be effective in uncovering new disease genes in the current scenario.
View Article and Find Full Text PDFBackground: Many biological processes are carried out by proteins interacting with each other in the form of protein complexes. However, large-scale detection of protein complexes has remained constrained by experimental limitations. As such, computational detection of protein complexes by applying clustering algorithms on the abundantly available protein-protein interaction (PPI) networks is an important alternative.
View Article and Find Full Text PDFJ Bioinform Comput Biol
December 2013
While high-throughput technologies are expected to play a critical role in clinical translational research for complex disease diagnosis, the ability to accurately and consistently discriminate disease phenotypes by determining the gene and protein expression patterns as signatures of different clinical conditions remains a challenge in translational bioinformatics. In this study, we propose a novel feature selection algorithm: Multi-Resolution-Test (MRT-test) that can produce significantly accurate and consistent phenotype discrimination across a series of omics data. Our algorithm can capture those features contributing to subtle data behaviors instead of selecting the features contributing to global data behaviors, which seems to be essential in achieving clinical level diagnosis for different expression data.
View Article and Find Full Text PDFMethods Mol Biol
June 2013
Many important biological processes, such as the signaling pathways, require protein-protein interactions (PPIs) that are designed for fast response to stimuli. These interactions are usually transient, easily formed, and disrupted, yet specific. Many of these transient interactions involve the binding of a protein domain to a short stretch (3-10) of amino acid residues, which can be characterized by a sequence pattern, i.
View Article and Find Full Text PDFBackground: Identifying disease genes from human genome is an important but challenging task in biomedical research. Machine learning methods can be applied to discover new disease genes based on the known ones. Existing machine learning methods typically use the known disease genes as the positive training set P and the unknown genes as the negative training set N (non-disease gene set does not exist) to build classifiers to identify new disease genes from the unknown genes.
View Article and Find Full Text PDFJ Bioinform Comput Biol
October 2012
Living cells are realized by complex gene expression programs that are moderated by regulatory proteins called transcription factors (TFs). The TFs control the differential expression of target genes in the context of transcriptional regulatory networks (TRNs), either individually or in groups. Deciphering the mechanisms of how the TFs control the differential expression of a target gene in a TRN is challenging, especially when multiple TFs collaboratively participate in the transcriptional regulation.
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
February 2012
Appetitive operant conditioning in Aplysia for feeding behavior via the electrical stimulation of the esophageal nerve contingently reinforces each spontaneous bite during the feeding process. This results in the acquisition of operant memory by the contingently reinforced animals. Analysis of the cellular and molecular mechanisms of the feeding motor circuitry revealed that activity-dependent neuronal modulation occurs at the interneurons that mediate feeding behaviors.
View Article and Find Full Text PDFMany biologically important protein-protein interactions (PPIs) have been found to be mediated by short linear motifs (SLiMs). These interactions are mediated by the binding of a protein domain, often with a nonlinear interaction interface, to a SLiM. We propose a method called D-SLIMMER to mine for SLiMs in PPI data on the basis of the interaction density between a nonlinear motif (i.
View Article and Find Full Text PDFBackground: Phenotypically similar diseases have been found to be caused by functionally related genes, suggesting a modular organization of the genetic landscape of human diseases that mirrors the modularity observed in biological interaction networks. Protein complexes, as molecular machines that integrate multiple gene products to perform biological functions, express the underlying modular organization of protein-protein interaction networks. As such, protein complexes can be useful for interrogating the networks of phenome and interactome to elucidate gene-phenotype associations of diseases.
View Article and Find Full Text PDFMany cellular functions involve protein complexes that are formed by multiple interacting proteins. Tandem Affinity Purification (TAP) is a popular experimental method for detecting such multi-protein interactions. However, current computational methods that predict protein complexes from TAP data require converting the co-complex relationships in TAP data into binary interactions.
View Article and Find Full Text PDFBackground: Protein-protein interactions (PPIs) play important roles in various cellular processes. However, the low quality of current PPI data detected from high-throughput screening techniques has diminished the potential usefulness of the data. We need to develop a method to address the high data noise and incompleteness of PPI data, namely, to filter out inaccurate protein interactions (false positives) and predict putative protein interactions (false negatives).
View Article and Find Full Text PDFBackground: Most proteins form macromolecular complexes to perform their biological functions. However, experimentally determined protein complex data, especially of those involving more than two protein partners, are relatively limited in the current state-of-the-art high-throughput experimental techniques. Nevertheless, many techniques (such as yeast-two-hybrid) have enabled systematic screening of pairwise protein-protein interactions en masse.
View Article and Find Full Text PDFMotivation: An important class of protein interactions involves the binding of a protein's domain to a short linear motif (SLiM) on its interacting partner. Extracting such motifs, either experimentally or computationally, is challenging because of their weak binding and high degree of degeneracy. Recent rapid increase of available protein structures provides an excellent opportunity to study SLiMs directly from their 3D structures.
View Article and Find Full Text PDFBMC Bioinformatics
June 2009
Background: How to detect protein complexes is an important and challenging task in post genomic era. As the increasing amount of protein-protein interaction (PPI) data are available, we are able to identify protein complexes from PPI networks. However, most of current studies detect protein complexes based solely on the observation that dense regions in PPI networks may correspond to protein complexes, but fail to consider the inherent organization within protein complexes.
View Article and Find Full Text PDFTraumatic brain injury is a major socioeconomic burden, and the use of statistical models to predict outcomes after head injury can help to allocate limited health resources. Earlier prediction models analyzing admission data have been used to achieve prediction accuracies of up to 80%. Our aim was to design statistical models utilizing a combination of both physiological and biochemical variables obtained from multimodal monitoring in the neurocritical care setting as a complement to earlier models.
View Article and Find Full Text PDFThe protein-protein subnetwork prediction challenge presented at the 2nd Dialogue for Reverse Engineering Assessments and Methods (DREAM2) conference is an important computational problem essential to proteomic research. Given a set of proteins from the Saccharomyces cerevisiae (baker's yeast) genome, the task is to rank all possible interactions between the proteins from the most likely to the least likely. To tackle this task, we adopt a graph-based strategy to combine multiple sources of biological data and computational predictions.
View Article and Find Full Text PDFParkinson's disease (PD) is the second most common neurodegenerative disorder affecting millions of people. Both environmental and genetic factors play important roles in its causation and development. Genetic analysis has shown that over 100 genes are correlated with the etiology and pathology of PD.
View Article and Find Full Text PDFJ Bioinform Comput Biol
June 2008
The biological mechanisms through which proteins interact with one another are best revealed by studying the structural interfaces between interacting proteins. Protein-protein interfaces can be extracted from three-dimensional (3D) structural data of protein complexes and then clustered to derive biological insights. However, conventional protein interface clustering methods lack computational scalability and statistical support.
View Article and Find Full Text PDFInt J Data Min Bioinform
May 2008
We propose a domain-based classification method to predict protein-protein interactions using probabilities of putative interacting domain pairs derived from both experimentally-determined interacting protein pairs and carefully-chosen non-interacting protein pairs. Multi-species comparative results for protein interaction prediction show that such careful generation of biologically meaningful negative training data can improve classification performance.
View Article and Find Full Text PDFInt J Comput Biol Drug Des
February 2010
Interactions between Transcription Factors (TFs) are necessary for deciphering the complex mechanisms of transcription regulation in eukaryotes. We proposed a novel HV-kernel based SVM classifier to classify TF-TF pairs based on their protein domains and GO annotations. Two types of pairwise kernels, namely, a horizontal kernel and a vertical kernel, were combined to evaluate the similarity between a pair of TFs, and a Genetic Algorithm was used to obtain kernel and feature weights to optimise the classifier's performance.
View Article and Find Full Text PDFComput Syst Bioinformatics Conf
December 2007
Multiprotein complexes play central roles in many cellular pathways. Although many high-throughput experimental techniques have already enabled systematic screening of pairwise protein-protein interactions en masse, the amount of experimentally determined protein complex data has remained relatively lacking. As such, researchers have begun to exploit the vast amount of pairwise interaction data to help discover new protein complexes.
View Article and Find Full Text PDF