Colorectal cancer (CRC) is the third most common type of cancer. In recent decades, genomic analysis has played an increasingly important role in understanding the molecular mechanisms of CRC. However, its pathogenesis has not been fully uncovered.
View Article and Find Full Text PDFBiochim Biophys Acta Mol Basis Dis
June 2018
Lung cancer is a serious disease that threatens an affected individual's life. Its pathogenesis has not yet to be fully described, thereby impeding the development of effective treatments and preventive measures. "Cancer driver" theory considers that tumor initiation can be associated with a number of specific mutations in genes called cancer driver genes.
View Article and Find Full Text PDFBackground: To address the challenging problem of selecting distinguished genes from cancer gene expression datasets, this paper presents a gene subset selection algorithm based on the Kolmogorov-Smirnov (K-S) test and correlation-based feature selection (CFS) principles. The algorithm selects distinguished genes first using the K-S test, and then, it uses CFS to select genes from those selected by the K-S test.
Results: We adopted support vector machines (SVM) as the classification tool and used the criteria of accuracy to evaluate the performance of the classifiers on the selected gene subsets.
Background: Choroidal neovascularization (CNV) is a serious eye disease that may cause visual loss, especially for older people. Many factors have been proven to induce this disease including age, gender, obesity, and so on. However, until now, we have had limited knowledge on CNV's pathogenic mechanism.
View Article and Find Full Text PDFHow to correctly and efficiently map small molecule to its possible metabolic pathway is a meaningful topic to metabonomics research. In this work, a novel approach to address this problem was introduced to encode physicochemical properties of small molecules. Based on this encoding method, a two stage feature selection method called mRMR-FFSAdaBoost was adopted to map small molecules to their corresponding metabolic pathways possible.
View Article and Find Full Text PDFKnowledge of the mechanism of HIV protease cleavage specificity is critical to the design of specific and effective HIV inhibitors. Searching for an accurate, robust, and rapid method to correctly predict the cleavage sites in proteins is crucial when searching for possible HIV inhibitors. In this article, HIV-1 protease specificity was studied using the correlation-based feature subset (CfsSubset) selection method combined with Genetic Algorithms method.
View Article and Find Full Text PDFComputational approaches are able to analyze protein-protein interactions (PPIs) from a different angle of view by complementing the experimental ones. And they are very efficient in determining whether two proteins can interact with each other. In this paper, KNNs (K-nearest neighbors) is applied to predict the PPIs by coding each protein with the physical and chemical properties of its residues, predicted secondary structures and amino acid compositions.
View Article and Find Full Text PDFProtein subcellular localization aims at predicting the location of a protein within a cell using computational methods. Knowledge of subcellular localization of viral proteins in a host cell or virus-infected cell is important because it is closely related to their destructive tendencies and consequences. Prediction of viral protein subcellular localization is an important but challenging problem, particularly when proteins may simultaneously exist at, or move between, two or more different subcellular location sites.
View Article and Find Full Text PDFIt is of great use to find out and clear up the interactions between enzymes and small molecules, for understanding the molecular and cellular functions of organisms. In this study, we developed a novel method for the prediction of enzyme-small molecules interactions based on machine learning approach. The biochemical and physicochemical description of proteins and the functional group composition of small molecules are used for representing enzyme-small molecules pairs.
View Article and Find Full Text PDFAs many diseases like high cholesterol are referred to lipid metabolism, studying the lipid metabolic pathway has a positive effect on finding the knowledge about interactions between different elements within high complex living systems. Here, we employed a typical ensemble learning method, Bagging learner, to study and predict the possible sub lipid metabolic pathway of small molecules based on physical and chemical features of the compounds. As a result, jackknife cross validation test and independent set test on the model reached 89.
View Article and Find Full Text PDFBackground: With the huge amount of uncharacterized protein sequences generated in the post-genomic age, it is highly desirable to develop effective computational methods for quickly and accurately predicting their functions. The information thus obtained would be very useful for both basic research and drug development in a timely manner.
Methodology/principal Findings: Although many efforts have been made in this regard, most of them were based on either sequence similarity or protein-protein interaction (PPI) information.
It is important to identify which proteins can interact with nucleic acids for the purpose of protein annotation, since interactions between nucleic acids and proteins involve in numerous cellular processes such as replication, transcription, splicing, and DNA repair. This research tries to identify proteins that can interact with DNA, RNA, and rRNA, respectively. mRMR (Minimum redundancy and maximum relevance), with its elegant mathematical formulation, has been applied widely in processing biological data and feature analysis since its introduction in 2005.
View Article and Find Full Text PDFProtein Pept Lett
October 2009
How to correctly and efficiently determine small molecules' biological function is a challenge and has a positive effect on further metabonomics analysis. Here, we introduce a computational approach to address this problem. The new approach is based on AdaBoost method and featured by function group composition to the metabolic pathway analysis, which can fast and automatically map the small chemical molecules back to the possible metabolic pathway that they belong to.
View Article and Find Full Text PDFProtein sumoylation is one of the most important post-translational modifications. Accurate prediction of sumoylation sites is very useful for the analysis of proteome. Though the putative motif Psi K XE can be used, optimization of prediction models still remains a challenge.
View Article and Find Full Text PDFGalNAc-transferase can catalyze the biosynthesis of O-linked oligosaccharides. The specificity of GalNAc-transferase is composed of nine amino acid residues denoted by R4, R3, R2, R1, R0, R1', R2', R3', R4'. To predict whether the reducing monosaccharide will be covalently linked to the central residue R0(Ser or Thr), a new method based on feature selection has been proposed in our work.
View Article and Find Full Text PDFEfficient in silico screening approaches may provide valuable hints on biological functions of the compound-candidates, which could help to screen functional compounds either in basic researches on metabolic pathways or drug discovery. Here, we introduce a machine learning method (Nearest Neighbor Algorithm) based on functional group composition of compounds to the analysis of metabolic pathways. This method can quickly map small chemical molecules to the metabolic pathway that they likely belong to.
View Article and Find Full Text PDFThe membrane protein type is an important feature in characterizing the overall topological folding type of a protein or its domains therein. Many investigators have put their efforts to the prediction of membrane protein type. Here, we propose a new approach, the bootstrap aggregating method or bragging learner, to address this problem based on the protein amino acid composition.
View Article and Find Full Text PDFBackground: Activities of drug molecules can be predicted by QSAR (quantitative structure activity relationship) models, which overcomes the disadvantages of high cost and long cycle by employing the traditional experimental method. With the fact that the number of drug molecules with positive activity is rather fewer than that of negatives, it is important to predict molecular activities considering such an unbalanced situation.
Results: Here, asymmetric bagging and feature selection are introduced into the problem and asymmetric bagging of support vector machines (asBagging) is proposed on predicting drug activities to treat the unbalanced problem.
In this paper, AdaBoost algorithm, a popular and effective prediction method, is applied to predict the subcellular locations of Prokaryotic and Eukaryotic Proteins-a dataset derived from SWISSPROT 33.0. Its prediction ability was evaluated by re-substitution test, Leave-One-Out Cross validation (LOOCV) and jackknife test.
View Article and Find Full Text PDFKnowledge of the polyprotein cleavage sites by HIV protease will refine our understanding of its specificity, and the information thus acquired is useful for designing specific and efficient HIV protease inhibitors. Recently, several works have approached the HIV-1 protease specificity problem by applying a number of classifier creation and combination methods. The pace in searching for the proper inhibitors of HIV protease will be greatly expedited if one can find an accurate, robust, and rapid method for predicting the cleavage sites in proteins by HIV protease.
View Article and Find Full Text PDFProtein subcellular localization, which tells where a protein resides in a cell, is an important characteristic of a protein, and relates closely to the function of proteins. The prediction of their subcellular localization plays an important role in the prediction of protein function, genome annotation and drug design. Therefore, it is an important and challenging role to predict subcellular localization using bio-informatics approach.
View Article and Find Full Text PDFGuang Pu Xue Yu Guang Pu Fen Xi
December 2007
Int J Comput Biol Drug Des
February 2010
Molecular activities can be predicted by Quantitative Structure Activity Relationship (QSAR). Because of the high cost of experiments, the number of drug molecules with known activity is much less than that of unknown, to predict molecular activities utilising unlabelled instances will be an interesting issue. Here, Semi-Supervised Learning (SSL) is introduced and a SSL method, Co-Training is investigated on predicting drug activities utilising unlabelled instances.
View Article and Find Full Text PDFAim: To discriminate 32 phenethyl-amines between antagonists and agonists, and predict the activities of these compounds.
Methods: The support vector machine (SVM) is employed to investigate the structure-activity relationship (SAR)/quantitative structure-activity relationship (QSAR) of phenethyl-amines based on molecular descriptors.
Results: By using the leave-one-out cross-validation (LOOCV) test, 1 optimal SAR and 2 optimal QSAR models for agonists and antagonists were attained.