Ying Yong Sheng Tai Xue Bao
January 2019
During the past decade, sustainability science has rapidly developed into a globally well-recognized and important new science of the 21st century. However, sustainability science has not received much attention from scientists and practitioners in China where sustainable development and ecological civilization have been prominent themes. To promote the development of sustainability science in China, Wu, et al.
View Article and Find Full Text PDFData sampling is a widely used technique in a broad range of machine learning problems. Traditional sampling approaches generally rely on random resampling from a given dataset. However, these approaches do not take into consideration additional information, such as sample quality and usefulness.
View Article and Find Full Text PDFIEEE/ACM Trans Comput Biol Bioinform
January 2013
A critical component in mass spectrometry (MS)-based proteomics is an accurate protein identification procedure. Database search algorithms commonly generate a list of peptide-spectrum matches (PSMs). The validity of these PSMs is critical for downstream analysis since proteins that are present in the sample are inferred from those PSMs.
View Article and Find Full Text PDFBackground: Complex diseases are commonly caused by multiple genes and their interactions with each other. Genome-wide association (GWA) studies provide us the opportunity to capture those disease associated genes and gene-gene interactions through panels of SNP markers. However, a proper filtering procedure is critical to reduce the search space prior to the computationally intensive gene-gene interaction identification step.
View Article and Find Full Text PDFBMC Bioinformatics
October 2010
Background: It has now become clear that gene-gene interactions and gene-environment interactions are ubiquitous and fundamental mechanisms for the development of complex diseases. Though a considerable effort has been put into developing statistical models and algorithmic strategies for identifying such interactions, the accurate identification of those genetic interactions has been proven to be very challenging.
Methods: In this paper, we propose a new approach for identifying such gene-gene and gene-environment interactions underlying complex diseases.
Existing phylogenetic methods cannot realistically model the evolutionary process. It has become a serious issue for real-life applications which demand accurate phylogenetic results. It is desirable to have an integrative approach which can effectively incorporate multi-disciplinary analyses and synthesise results from various sources.
View Article and Find Full Text PDFBackground: Feature selection techniques are critical to the analysis of high dimensional datasets. This is especially true in gene selection from microarray data which are commonly with extremely high feature-to-sample ratio. In addition to the essential objectives such as to reduce data noise, to reduce data redundancy, to improve sample classification accuracy, and to improve model generalization property, feature selection also helps biologists to focus on the selected genes to further validate their biological hypotheses.
View Article and Find Full Text PDFBackground: Medical and biological data are commonly with small sample size, missing values, and most importantly, imbalanced class distribution. In this study we propose a particle swarm based hybrid system for remedying the class imbalance problem in medical and biological data mining. This hybrid system combines the particle swarm optimization (PSO) algorithm with multiple classifiers and evaluation metrics for evaluation fusion.
View Article and Find Full Text PDFBackground: In this paper, we introduce a novel inter-range interaction integrated approach for protein domain boundary prediction. It involves (1) the design of modular kernel algorithm, which is able to effectively exploit the information of non-local interactions in amino acids, and (2) the development of a novel profile that can provide suitable information to the algorithm. One of the key features of this profiling technique is the use of multiple structural alignments of remote homologues to create an extended sequence profile and combines the structural information with suitable chemical information that plays an important role in protein stability.
View Article and Find Full Text PDFBackground: Post-translational modifications have a substantial influence on the structure and functions of protein. Post-translational phosphorylation is one of the most common modification that occur in intracellular proteins. Accurate prediction of protein phosphorylation sites is of great importance for the understanding of diverse cellular signalling processes in both the human body and in animals.
View Article and Find Full Text PDFBMC Bioinformatics
April 2008
Background: Protein domains present some of the most useful information that can be used to understand protein structure and functions. Recent research on protein domain boundary prediction has been mainly based on widely known machine learning techniques, such as Artificial Neural Networks and Support Vector Machines. In this study, we propose a new machine learning model (IGRN) that can achieve accurate and reliable classification, with significantly reduced computations.
View Article and Find Full Text PDFTwo hypotheses account for the evolution of the inner antenna light-harvesting proteins of oxygenic photosynthesis in cyanobacteria, algae, and plants: one in which the CP43 protein of photosytem II gave rise to the extrinsic CP43-like antennas of cyanobacteria (i.e. IsiA and Pcb proteins), as a late development, and the other in which CP43 and CP43-like proteins derive from an ancestral protein.
View Article and Find Full Text PDF