As the genome carries the historical information of a species' biotic and environmental interactions, analyzing changes in genome structure over time by using powerful statistical physics methods (such as entropic segmentation algorithms, fluctuation analysis in DNA walks, or measures of compositional complexity) provides valuable insights into genome evolution. Nucleotide frequencies tend to vary along the DNA chain, resulting in a hierarchically patchy chromosome structure with heterogeneities at different length scales that range from a few nucleotides to tens of millions of them. Fluctuation analysis reveals that these compositional structures can be classified into three main categories: (1) short-range heterogeneities (below a few kilobase pairs (Kbp)) primarily attributed to the alternation of coding and noncoding regions, interspersed or tandem repeats densities, etc.
View Article and Find Full Text PDFDetrended Fluctuation Analysis (DFA) has become a standard method to quantify the correlations and scaling properties of real-world complex time series. For a given scale of observation, DFA provides the function F(ℓ), which quantifies the fluctuations of the time series around the local trend, which is substracted (detrended). If the time series exhibits scaling properties, then F(ℓ)∼ℓα asymptotically, and the scaling exponent α is typically estimated as the slope of a linear fitting in the logF(ℓ) vs.
View Article and Find Full Text PDFProgressive evolution, or the tendency towards increasing complexity, is a controversial issue in biology, which resolution entails a proper measurement of complexity. Genomes are the best entities to address this challenge, as they encode the historical information of a species' biotic and environmental interactions. As a case study, we have measured genome sequence complexity in the ancient phylum Cyanobacteria.
View Article and Find Full Text PDFThe observable outputs of many complex dynamical systems consist of time series exhibiting autocorrelation functions of great diversity of behaviors, including long-range power-law autocorrelation functions, as a signature of interactions operating at many temporal or spatial scales. Often, numerical algorithms able to generate correlated noises reproducing the properties of real time series are used to study and characterize such systems. Typically, many of those algorithms produce a Gaussian time series.
View Article and Find Full Text PDFDespite the widespread diffusion of nonlinear methods for heart rate variability (HRV) analysis, the presence and the extent to which nonlinear dynamics contribute to short-term HRV are still controversial. This work aims at testing the hypothesis that different types of nonlinearity can be observed in HRV depending on the method adopted and on the physiopathological state. Two entropy-based measures of time series complexity (normalized complexity index, NCI) and regularity (information storage, IS), and a measure quantifying deviations from linear correlations in a time series (Gaussian linear contrast, GLC), are applied to short HRV recordings obtained in young (Y) and old (O) healthy subjects and in myocardial infarction (MI) patients monitored in the resting supine position and in the upright position reached through head-up tilt.
View Article and Find Full Text PDFJ Neurosci
January 2020
Origin and functions of intermittent transitions among sleep stages, including brief awakenings and arousals, constitute a challenge to the current homeostatic framework for sleep regulation, focusing on factors modulating sleep over large time scales. Here we propose that the complex micro-architecture characterizing sleep on scales of seconds and minutes results from intrinsic non-equilibrium critical dynamics. We investigate θ- and δ-wave dynamics in control rats and in rats where the sleep-promoting ventrolateral preoptic nucleus (VLPO) is lesioned (male Sprague-Dawley rats).
View Article and Find Full Text PDFObjective: In this work we want to analyze differences in nonlinear properties between rest and exercise and also to study the permanent effects of physical exercise on heart rate dynamics.
Approach: It has been shown that physical exercise alters heart dynamics by increasing heart rate and decreasing variability, modifying spectral power and linear correlations, etc. We hypothesize that physical exercise should also reduce nonlinearity in the heartbeat time series.
The correlation properties of the magnitude of a time series are associated with nonlinear and multifractal properties and have been applied in a great variety of fields. Here we have obtained the analytical expression of the autocorrelation of the magnitude series (C_{|x|}) of a linear Gaussian noise as a function of its autocorrelation (C_{x}). For both, models and natural signals, the deviation of C_{|x|} from its expectation in linear Gaussian noises can be used as an index of nonlinearity that can be applied to relatively short records and does not require the presence of scaling in the time series under study.
View Article and Find Full Text PDFSymbolic sequences have been extensively investigated in the past few years within the framework of statistical physics. Paradigmatic examples of such sequences are written texts, and deoxyribonucleic acid (DNA) and protein sequences. In these examples, the spatial distribution of a given symbol (a word, a DNA motif, an amino acid) is a key property usually related to the symbol importance in the sequence: The more uneven and far from random the symbol distribution, the higher the relevance of the symbol to the sequence.
View Article and Find Full Text PDFThe 2017 update of NGSmethDB stores whole genome methylomes generated from short-read data sets obtained by bisulfite sequencing (WGBS) technology. To generate high-quality methylomes, stringent quality controls were integrated with third-part software, adding also a two-step mapping process to exploit the advantages of the new genome assembly models. The samples were all profiled under constant parameter settings, thus enabling comparative downstream analyses.
View Article and Find Full Text PDFWe systematically study the scaling properties of the magnitude and sign of the fluctuations in correlated time series, which is a simple and useful approach to distinguish between systems with different dynamical properties but the same linear correlations. First, we decompose artificial long-range power-law linearly correlated time series into magnitude and sign series derived from the consecutive increments in the original series, and we study their correlation properties. We find analytical expressions for the correlation exponent of the sign series as a function of the exponent of the original series.
View Article and Find Full Text PDFSegmentation is a standard method of data analysis to identify change-points dividing a nonstationary time series into homogeneous segments. However, for long-range fractal correlated series, most of the segmentation techniques detect spurious change-points which are simply due to the heterogeneities induced by the correlations and not to real nonstationarities. To avoid this oversegmentation, we present a segmentation algorithm which takes as a reference for homogeneity, instead of a random i.
View Article and Find Full Text PDFPhys Rev E Stat Nonlin Soft Matter Phys
January 2012
A key quantity describing the dynamics of complex systems is the first-passage time (FPT). The statistical properties of FPT depend on the specifics of the underlying system dynamics. We present a unified approach to account for the diversity of statistical behaviors of FPT observed in real-world systems.
View Article and Find Full Text PDFRelevant words in literary texts (key words) are known to be clustered, while common words are randomly distributed. Given the clustered distribution of many functional genome elements, we hypothesize that the biological text per excellence, the DNA sequence, might behave in the same way: k-length words (k-mers) with a clear function may be spatially clustered along the one-dimensional chromosome sequence, while less-important, non-functional words may be randomly distributed. To explore this linguistic analogy, we calculate a clustering coefficient for each k-mer (k=2-9bp) in human and mouse chromosome sequences, then checking if clustered words are enriched in the functional part of the genome.
View Article and Find Full Text PDFWe investigate how various coarse-graining (signal quantization) methods affect the scaling properties of long-range power-law correlated and anti-correlated signals, quantified by the detrended fluctuation analysis. Specifically, for coarse-graining in the magnitude of a signal, we consider (i) the Floor, (ii) the Symmetry and (iii) the Centro-Symmetry coarse-graining methods. We find that for anti-correlated signals coarse-graining in the magnitude leads to a crossover to random behavior at large scales, and that with increasing the width of the coarse-graining partition interval Δ, this crossover moves to intermediate and small scales.
View Article and Find Full Text PDFPhys Rev E Stat Nonlin Soft Matter Phys
March 2011
Human DNA shows a complex structure with compositional features at many scales; the isochores--long DNA segments (~10⁵ bp) of relatively homogeneous guanine-cytosine (G + C) content--are the largest well-documented and well-analyzed compositional structures. However, we report here on the existence of a high-level compositional organization of isochores in the human genome. By using a segmentation algorithm incorporating the long-range correlations existing in human DNA, we find that every chromosome is composed of a few huge segments (~ 10⁷ bp) of relatively homogeneous G + C content, which become the largest compositional organization of the genome.
View Article and Find Full Text PDFBackground: Many k-mers (or DNA words) and genomic elements are known to be spatially clustered in the genome. Well established examples are the genes, TFBSs, CpG dinucleotides, microRNA genes and ultra-conserved non-coding regions. Currently, no algorithm exists to find these clusters in a statistically comprehensible way.
View Article and Find Full Text PDFPhys Rev E Stat Nonlin Soft Matter Phys
March 2010
Detrended fluctuation analysis (DFA) is an improved method of classical fluctuation analysis for nonstationary signals where embedded polynomial trends mask the intrinsic correlation properties of the fluctuations. To better identify the intrinsic correlation properties of real-world signals where a large amount of data is missing or removed due to artifacts, we investigate how extreme data loss affects the scaling behavior of long-range power-law correlated and anticorrelated signals. We introduce a segmentation approach to generate surrogate signals by randomly removing data segments from stationary signals with different types of long-range correlations.
View Article and Find Full Text PDFPhys Rev E Stat Nonlin Soft Matter Phys
March 2009
Using a generalization of the level statistics analysis of quantum disordered systems, we present an approach able to extract automatically keywords in literary texts. Our approach takes into account not only the frequencies of the words present in the text but also their spatial distribution along the text, and is based on the fact that relevant words are significantly clustered (i.e.
View Article and Find Full Text PDFBackground: The phylogenetic distribution of large-scale genome structure (i.e. mosaic compositional patchiness) has been explored mainly by analytical ultracentrifugation of bulk DNA.
View Article and Find Full Text PDFPhys Rev E Stat Nonlin Soft Matter Phys
March 2007
The scale-free, long-range correlations detected in DNA sequences contrast with characteristic lengths of genomic elements, being particularly incompatible with the isochores (long, homogeneous DNA segments). By computing the local behavior of the scaling exponent alpha of detrended fluctuation analysis (DFA), we discriminate between sequences with and without true scaling, and we find that no single scaling exists in the human genome. Instead, human chromosomes show a common compositional structure with two characteristic scales, the large one corresponding to the isochores and the other to small and medium scale genomic elements.
View Article and Find Full Text PDFAlu retrotransposons do not show a homogeneous distribution over the human genome but have a higher density in GC-rich (H) than in AT-rich (L) isochores. However, since they preferentially insert into the L isochores, the question arises: What is the evolutionary mechanism that shifts the Alu density maximum from L to H isochores? To disclose the role played by each of the potential mechanisms involved in such biased distribution, we carried out a genome-wide analysis of the density of the Alus as a function of their evolutionary age, isochore membership, and intron vs. intergene location.
View Article and Find Full Text PDFPhys Rev E Stat Nonlin Soft Matter Phys
January 2005
When investigating the dynamical properties of complex multiple-component physical and physiological systems, it is often the case that the measurable system's output does not directly represent the quantity we want to probe in order to understand the underlying mechanisms. Instead, the output signal is often a linear or nonlinear function of the quantity of interest. Here, we investigate how various linear and nonlinear transformations affect the correlation and scaling properties of a signal, using the detrended fluctuation analysis (DFA) which has been shown to accurately quantify power-law correlations in nonstationary signals.
View Article and Find Full Text PDFWe study the properties of the level statistics of 1D disordered systems with long-range spatial correlations. We find a threshold value in the degree of correlations below which in the limit of large system size the level statistics follows a Poisson distribution (as expected for 1D uncorrelated-disordered systems), and above which the level statistics is described by a new class of distribution functions. At the threshold, we find that with increasing system size, the standard deviation of the function describing the level statistics converges to the standard deviation of the Poissonian distribution as a power law.
View Article and Find Full Text PDFIsochores are long genome segments homogeneous in G+C. Here, we describe an algorithm (IsoFinder) running on the web (http://bioinfo2.ugr.
View Article and Find Full Text PDF