Tree shape statistics based on peripheral structures have been utilized to study evolutionary mechanisms and inference methods. Partially motivated by a recent study by Pouryahya and Sankoff on modeling the accumulation of subgenomes in the evolution of polyploids, we present the distribution of subtree patterns with four or fewer leaves for the unrooted Proportional to Distinguishable Arrangements (PDA) model. We derive a recursive formula for computing the joint distributions, as well as a Strong Law of Large Numbers and a Central Limit Theorem for the joint distributions.
View Article and Find Full Text PDFNature-inspired swarm-based algorithms are increasingly applied to tackle high-dimensional and complex optimization problems across disciplines. They are general purpose optimization algorithms, easy to implement and assumption-free. Some common drawbacks of these algorithms are their premature convergence and the solution found may not be a global optimum.
View Article and Find Full Text PDFDistributional properties of tree shape statistics under random phylogenetic tree models play an important role in investigating the evolutionary forces underlying the observed phylogenies. In this paper, we study two subtree counting statistics, the number of cherries and that of pitchforks for the Ford model, the alpha model introduced by Daniel Ford. It is a one-parameter family of random phylogenetic tree models which includes the proportional to distinguishable arrangement (PDA) and the Yule models, two tree models commonly used in phylogenetics.
View Article and Find Full Text PDFTree shape statistics provide valuable quantitative insights into evolutionary mechanisms underpinning phylogenetic trees, a commonly used graph representation of evolutionary relationships among taxonomic units ranging from viruses to species. We study two subtree counting statistics, the number of cherries and the number of pitchforks, for random phylogenetic trees generated by two widely used null tree models: the proportional to distinguishable arrangements (PDA) and the Yule-Harding-Kingman (YHK) models. By developing limit theorems for a version of extended Pólya urn models in which negative entries are permitted for their replacement matrices, we deduce the strong laws of large numbers and the central limit theorems for the joint distributions of these two counting statistics for the PDA and the YHK models.
View Article and Find Full Text PDFWe conducted untargeted metabolomics analysis of plasma samples from a cross-sectional case-control study with 30 healthy controls, 30 patients with diabetes mellitus and normal renal function (DM-N), and 30 early diabetic nephropathy (DKD) patients using liquid chromatography-mass spectrometry (LC-MS). We employed two different modes of MS acquisition on a high-resolution MS instrument for identification and semi-quantification, and analyzed data using an advanced multivariate method for prioritizing differentially abundant metabolites. We obtained semi-quantification data for 1088 unique compounds (~55% lipids), excluding compounds that may be either exogenous compounds or treated as medication.
View Article and Find Full Text PDFBMC Med Genomics
October 2020
Background: Understanding the mechanisms underlying the malignant progression of cancer cells is crucial for early diagnosis and therapeutic treatment for cancer. Mutational heterogeneity of breast cancer suggests that about a dozen of cancer genes consistently mutate, together with many other genes mutating occasionally, in patients.
Methods: Using the whole-exome sequences and clinical information of 468 patients in the TCGA project data portal, we analyzed mutated protein domains and signaling pathway alterations in order to understand how infrequent mutations contribute aggregately to tumor progression in different stages.
Tree shape statistics are important for investigating evolutionary mechanisms mediating phylogenetic trees. As a step towards bridging shape statistics between rooted and unrooted trees, we present a comparison study on two subtree statistics known as numbers of cherries and pitchforks for the proportional to distinguishable arrangements (PDA) and the Yule-Harding-Kingman (YHK) models. Based on recursive formulas on the joint distribution of the number of cherries and that of pitchforks, it is shown that cherry distributions are log-concave for both rooted and unrooted trees under these two models.
View Article and Find Full Text PDFComputational tools for multiomics data integration have usually been designed for unsupervised detection of multiomics features explaining large phenotypic variations. To achieve this, some approaches extract latent signals in heterogeneous data sets from a joint statistical error model, while others use biological networks to propagate differential expression signals and find consensus signatures. However, few approaches directly consider molecular interaction as a data feature, the essential linker between different omics data sets.
View Article and Find Full Text PDFIn population and evolutionary biology, hypotheses about micro-evolutionary and macro-evolutionary processes are commonly tested by comparing the shape indices of empirical evolutionary trees with those predicted by neutral models. A key ingredient in this approach is the ability to compute and quantify distributions of various tree shape indices under random models of interest. As a step to meet this challenge, in this paper we investigate the joint distribution of cherries and pitchforks (that is, subtrees with two and three leaves) under two widely used null models: the Yule-Harding-Kingman (YHK) model and the proportional to distinguishable arrangements (PDA) model.
View Article and Find Full Text PDFTranscript-level quantification is often measured across two groups of patients to aid the discovery of biomarkers and detection of biological mechanisms involving these biomarkers. Statistical tests lack power and false discovery rate is high when sample size is small. Yet, many experiments have very few samples (≤ 5).
View Article and Find Full Text PDFBackground. The use of spot urine protein to creatinine ratios in estimating 24 hr urine protein excretion rates for diagnosing and managing chronic kidney disease (CKD) predated the standardization of creatinine assays. The comparative predictive performance of spot urine ratios and 24 hr urine collections (of albumin or protein) for the clinical outcomes of CKD progression, end-stage renal disease (ESRD), and mortality in Asians is unclear.
View Article and Find Full Text PDFNucleic Acids Res
November 2014
Neph et al. (2012) (Circuitry and dynamics of human transcription factor regulatory networks. Cell, 150: 1274-1286) reported the transcription factor (TF) regulatory networks of 41 human cell types using the DNaseI footprinting technique.
View Article and Find Full Text PDFIEEE/ACM Trans Comput Biol Bioinform
September 2014
Evolutionary history of protein-protein interaction (PPI) networks provides valuable insight into molecular mechanisms of network growth. In this paper, we study how to infer the evolutionary history of a PPI network from its protein duplication relationship. We show that for a plausible evolutionary history of a PPI network, its relative quality, measured by the so-called loss number, is independent of the growth parameters of the network and can be computed efficiently.
View Article and Find Full Text PDFComplex networks abound in physical, biological and social sciences. Quantifying a network's topological structure facilitates network exploration and analysis, and network comparison, clustering and classification. A number of Wiener type indices have recently been incorporated as distance-based descriptors of complex networks, such as the R package QuACN.
View Article and Find Full Text PDFSmall over-represented motifs in biological networks often form essential functional units of biological processes. A natural question is to gauge whether a motif occurs abundantly or rarely in a biological network. Here we develop an accurate method to estimate the occurrences of a motif in the entire network from noisy and incomplete data, and apply it to eukaryotic interactomes and cell-specific transcription factor regulatory networks.
View Article and Find Full Text PDFBackground: Most stroke research has studied rehabilitation effectiveness and rehabilitation efficiency separately and not investigated the potential trade-offs between these two indices of rehabilitation.
Aims: To determine whether there is a trade-off between independent factors of rehabilitation effectiveness and rehabilitation efficiency.
Methods: Using a retrospective cohort study design, we studied all stroke patients (n = 2810) from two sub-acute rehabilitation hospitals from 1996 to 2005, representing 87·5% of national bed-years during the same period.
Background: False discovery rate (FDR) control is commonly accepted as the most appropriate error control in multiple hypothesis testing problems. The accuracy of FDR estimation depends on the accuracy of the estimation of p-values from each test and validity of the underlying assumptions of the distribution. However, in many practical testing problems such as in genomics, the p-values could be under-estimated or over-estimated for many known or unknown reasons.
View Article and Find Full Text PDFReplication of their DNA genomes is a central step in the reproduction of many viruses. Procedures to find replication origins, which are initiation sites of the DNA replication process, are therefore of great importance for controlling the growth and spread of such viruses. Existing computational methods for viral replication origin prediction have mostly been tested within the family of herpesviruses.
View Article and Find Full Text PDFBackground: Current miRNA target prediction tools have the common problem that their false positive rate is high. This renders identification of co-regulating groups of miRNAs and target genes unreliable. In this study, we describe a procedure to identify highly probable co-regulating miRNAs and the corresponding co-regulated gene groups.
View Article and Find Full Text PDFA novel approach to the detection of genomic repeats is presented in this paper. The technique, dubbed SAGRI (Spectrum Assisted Genomic Repeat Identifier), is based on the spectrum (set of sequence k-mers, for some k) of the genomic sequence. Specifically, the genome is scanned twice.
View Article and Find Full Text PDFBackground: Replication origins are considered important sites for understanding the molecular mechanisms involved in DNA replication. Many computational methods have been developed for predicting their locations in archaeal, bacterial and eukaryotic genomes. However, a prediction method designed for a particular kind of genomes might not work well for another.
View Article and Find Full Text PDFWe identified a set of transcriptional elements that are conserved and overrepresented within the promoters of human, mouse, and rat GRIAs by comparing these promoters against a collection of 10,741 gene promoters. Cells regulate functional groups of genes by coordinating the transcriptional and/or posttranscriptional mRNA levels of interacting genes. As such, it is expected that functional groups of genes share the same transcriptional features within their promoters.
View Article and Find Full Text PDFIt has been observed that in homology search gapped seeds have better sensitivity than ungapped ones for the same cost (weight). In this paper, we propose a probability leakage model (a dissipative Markov system) to elucidate the mechanism that confers power to spaced seeds. Based on this model, we identify desirable features of gapped search seeds and formulate an extremely efficient procedure for seed design: it samples from the set of spaced seed exhibiting those features, evaluates their sensitivity, and then selects the best.
View Article and Find Full Text PDFNucleic Acids Res
September 2005
The broad applicability of gene expression profiling to genomic analyses has generated huge demand for mass production of microarrays and hence for improving the cost effectiveness of microarray fabrication. We developed a post-processing method for deriving a good synthesis strategy. In this paper, we assessed all the known efficient methods and our post-processing method for reducing the number of synthesis cycles for manufacturing a DNA-chip of a given set of oligos.
View Article and Find Full Text PDF