Phylogenetic comparative methods use random processes, such as the Brownian Motion, to model the evolution of continuous traits on phylogenetic trees. Growing evidence for non-gradual evolution motivated the development of complex models, often based on Lévy processes. However, their statistical inference is computationally intensive and currently relies on approximations, high-dimensional sampling, or numerical integration.
View Article and Find Full Text PDFBeing confounding factors, directional trends are likely to make two quantitative traits appear as spuriously correlated. By determining the probability distributions of independent contrasts when traits evolve following Brownian motions with linear trends, we show that the standard independent contrasts can not be used to test for correlation in this situation. We propose a multiple regression approach which corrects the bias caused by directional evolution.
View Article and Find Full Text PDFThe identification of communities, or modules, is a common operation in the analysis of large biological networks. The established a framework to evaluate clustering approaches in a biomedical context, by testing the association of communities with GWAS-derived common trait and disease genes. We implemented here several extensions of the MolTi software that detects communities by optimizing multiplex (and monoplex) network modularity.
View Article and Find Full Text PDFBull Math Biol
October 2017
The time-dependent-asymmetric-linear parsimony is an ancestral state reconstruction method which extends the standard linear parsimony (a.k.a.
View Article and Find Full Text PDFChoosing an ancestral state reconstruction method among the alternatives available for quantitative characters may be puzzling. We present here a comparison of seven of them, namely the maximum likelihood, restricted maximum likelihood, generalized least squares under Brownian, Brownian-with-trend and Ornstein-Uhlenbeck models, phylogenetic independent contrasts and squared parsimony methods. A review of the relations between these methods shows that the maximum likelihood, the restricted maximum likelihood and the generalized least squares under Brownian model infer the same ancestral states and can only be distinguished by the distributions accounting for the reconstruction uncertainty which they provide.
View Article and Find Full Text PDFVarious biological networks can be constructed, each featuring gene/protein relationships of different meanings (e.g., protein interactions or gene co-expression).
View Article and Find Full Text PDFDespite its intrinsic difficulty, ancestral character state reconstruction is an essential tool for testing evolutionary hypothesis. Two major classes of approaches to this question can be distinguished: parsimony- or likelihood-based approaches. We focus here on the second class of methods, more specifically on approaches based on continuous-time Markov modeling of character evolution.
View Article and Find Full Text PDFUsing the fossil record yields more detailed reconstructions of the evolutionary process than what is obtained from contemporary lineages only. In this work, we present a stochastic process modeling not only speciation and extinction, but also fossil finds. Next, we derive an explicit formula for the likelihood of a reconstructed phylogeny with fossils, which can be used to estimate the speciation and extinction rates.
View Article and Find Full Text PDFWe give a formal study of the relationships between the transition cost parameters and the generalized maximum parsimonious reconstructions of unknown (ancestral) binary character states {0,1} over a phylogenetic tree. As a main result, we show there are two thresholds λ¹n and λ⁰n , generally confounded, associated to each node n of the phylogenetic tree and such that there exists a maximum parsimonious reconstruction associating state 1 to n (resp. state 0 to n) if the ratio "10-cost"/"01-cost" is smaller than λ¹n (resp.
View Article and Find Full Text PDFBackground: While multiple alignment is the first step of usual classification schemes for biological sequences, alignment-free methods are being increasingly used as alternatives when multiple alignments fail. Subword-based combinatorial methods are popular for their low algorithmic complexity (suffix trees ..
View Article and Find Full Text PDFBackground: As public microarray repositories are constantly growing, we are facing the challenge of designing strategies to provide productive access to the available data.
Methodology: We used a modified version of the Markov clustering algorithm to systematically extract clusters of co-regulated genes from hundreds of microarray datasets stored in the Gene Expression Omnibus database (n = 1,484). This approach led to the definition of 18,250 transcriptional signatures (TS) that were tested for functional enrichment using the DAVID knowledgebase.
Background: We present the N-map method, a pairwise and asymmetrical approach which allows us to compare sequences by taking into account evolutionary events that produce shuffled, reversed or repeated elements. Basically, the optimal N-map of a sequence s over a sequence t is the best way of partitioning the first sequence into N parts and placing them, possibly complementary reversed, over the second sequence in order to maximize the sum of their gapless alignment scores.
Results: We introduce an algorithm computing an optimal N-map with time complexity O (|s| x |t| x N) using O (|s| x |t| x N) memory space.
Background: In general, the construction of trees is based on sequence alignments. This procedure, however, leads to loss of informationwhen parts of sequence alignments (for instance ambiguous regions) are deleted before tree building. To overcome this difficulty, one of us previously introduced a new and rapid algorithm that calculates dissimilarity matrices between sequences without preliminary alignment.
View Article and Find Full Text PDFSubword composition plays an important role in a lot of analyses of sequences. Here we define and study the "local decoding of order N of sequences," an alternative that avoids some drawbacks of "subwords of length N" approaches while keeping informations about environments of length N in the sequences ("decoding" is taken here in the sense of hidden Markov modeling, i.e.
View Article and Find Full Text PDFThe number of statistical tools used to analyze transcriptome data is continuously increasing and no one, definitive method has so far emerged. There is a need for comparison and a number of different approaches has been taken to evaluate the effectiveness of the different statistical tools available for microarray analyses. In this paper, we describe a simple and efficient protocol to compare the reliability of different statistical tools available for microarray analyses.
View Article and Find Full Text PDF