In Machine Learning, feature selection is an important step in classifier design. It consists of finding a subset of features that is optimum for a given cost function. One possibility to solve feature selection is to organize all possible feature subsets into a Boolean lattice and to exploit the fact that the costs of chains in that lattice describe U-shaped curves.
View Article and Find Full Text PDFNon-coding RNAs (ncRNA) have an essential role in the complex landscape of human genetic regulatory networks. One area that is poorly explored is the effect of genetic variations on the interaction between ncRNA and their targets. By integrating a significant amount of public data, the present study cataloged the vast landscape of the regulatory effect of microRNAs (miRNA) and long intergenic noncoding RNAs (lincRNA) in the human genome.
View Article and Find Full Text PDFThe aim of this study was to define a method for evaluating a player's decisions during a game based on the success probability of his actions and for analyzing the player strategy inferred from game actions. There were developed formal definitions of i) the stochastic process of player decisions in game situations and ii) the inference process of player strategy based on his game decisions. The method was applied to the context of soccer goalkeepers.
View Article and Find Full Text PDFThis paper uses a classical approach to feature selection: minimization of a cost function applied on estimated joint distributions. However, in this new formulation, the optimization search space is extended. The original search space is the Boolean lattice of features sets (BLFS), while the extended one is a collection of Boolean lattices of ordered pairs (CBLOP), that is (features, associated value), indexed by the elements of the BLFS.
View Article and Find Full Text PDFWe present in this article a methodology for designing kinetic models of molecular signaling networks, which was exemplarily applied for modeling one of the Ras/MAPK signaling pathways in the mouse Y1 adrenocortical cell line. The methodology is interdisciplinary, that is, it was developed in a way that both dry and wet lab teams worked together along the whole modeling process.
View Article and Find Full Text PDFCancer cells have anomalous development and proliferation due to disturbances in their control systems. The study of the behavior of cellular control system requires high-throughput dynamical data. Unfortunately, this type of data is not largely available.
View Article and Find Full Text PDFPatterns have been widely used in Computer Science. A pattern describes a generic solution to an existing problem in a more readable and accessible form. A pattern-oriented process specification consists of a generic and abstract description of a process.
View Article and Find Full Text PDFThe cell division cycle comprises a sequence of phenomena controlled by a stable and robust genetic network. We applied a probabilistic genetic network (PGN) to construct a hypothetical model with a dynamical behavior displaying the degree of robustness typical of the biological cell cycle. The structure of our PGN model was inspired in well-established biological facts such as the existence of integrator subsystems, negative and positive feedback loops, and redundant signaling pathways.
View Article and Find Full Text PDFJ Bioinform Comput Biol
August 2007
The last 10 years have seen the rise of many technologies that produce an unprecedented amount of genome-scale data from many organisms. Although the research community has been successful in exploring these data, many challenges still persist. One of them is the effective integration of such data sets directly into approaches based on mathematical modeling of biological systems.
View Article and Find Full Text PDFBackground: One goal of gene expression profiling is to identify signature genes that robustly distinguish different types or grades of tumors. Several tumor classifiers based on expression profiling have been proposed using microarray technique. Due to important differences in the probabilistic models of microarray and SAGE technologies, it is important to develop suitable techniques to select specific genes from SAGE measurements.
View Article and Find Full Text PDFWe propose a new algorithm for optimal MAE stack filter design. It is based on three main ingredients. First, we show that the dual of the integer programming formulation of the filter design problem is a minimum cost network flow problem.
View Article and Find Full Text PDFThis paper describes a data mining environment for knowledge discovery in bioinformatics applications. The system has a generic kernel that implements the mining functions to be applied to input primary databases, with a warehouse architecture, of biomedical information. Both supervised and unsupervised classification can be implemented within the kernel and applied to data extracted from the primary database, with the results being suitably stored in a complex object database for knowledge discovery.
View Article and Find Full Text PDFFor small samples, classifier design algorithms typically suffer from overfitting. Given a set of features, a classifier must be designed and its error estimated. For small samples, an error estimator may be unbiased but, owing to a large variance, often give very optimistic estimates.
View Article and Find Full Text PDFThere are many algorithms to cluster sample data points based on nearness or a similarity measure. Often the implication is that points in different clusters come from different underlying classes, whereas those in the same cluster come from the same class. Stochastically, the underlying classes represent different random processes.
View Article and Find Full Text PDF