SEQUENTIAL IMPORTANCE SAMPLING FOR MULTIRESOLUTION KINGMAN-TAJIMA COALESCENT COUNTING.

Ann Appl Stat

Published: June 2020

Statistical inference of evolutionary parameters from molecular sequence data relies on coalescent models to account for the shared genealogical ancestry of the samples. However, inferential algorithms do not scale to available data sets. A strategy to improve computational efficiency is to rely on simpler coalescent and mutation models, resulting in smaller hidden state spaces. An estimate of the cardinality of the state-space of genealogical trees at different resolutions is essential to decide the best modeling strategy for a given dataset. To our knowledge, there is neither an exact nor approximate method to determine these cardinalities. We propose a sequential importance sampling algorithm to estimate the cardinality of the sample space of genealogical trees under different coalescent resolutions. Our sampling scheme proceeds sequentially across the set of combinatorial constraints imposed by the data, which in this work are completely linked sequences of DNA at a non recombining segment. We analyze the cardinality of different genealogical tree spaces on simulations to study the settings that favor coarser resolutions. We apply our method to estimate the cardinality of genealogical tree spaces from mtDNA data from the 1000 genomes and a sample from a Melanesian population at the -globin locus.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8118586	PMC
http://dx.doi.org/10.1214/19-AOAS1313	DOI Listing

Publication Analysis

Top Keywords

estimate cardinality

sequential sampling

genealogical trees

cardinality genealogical

genealogical tree

tree spaces

genealogical

sampling multiresolution

multiresolution kingman-tajima

coalescent

Similar Publications

DC algorithm for estimation of sparse Gaussian graphical models.

PLoS One

December 2024

Institute of Systems and Information Engineering, University of Tsukuba, Tsukuba, Ibaraki, Japan.

Tomokaze Shiratori Yuichi Takano

Sparse estimation of a Gaussian graphical model (GGM) is an important technique for making relationships between observed variables more interpretable. Various methods have been proposed for sparse GGM estimation, including the graphical lasso that uses the ℓ1 norm regularization term, and other methods that use nonconvex regularization terms. Most of these methods approximate the ℓ0 (pseudo) norm by more tractable functions; however, to estimate more accurate solutions, it is preferable to directly use the ℓ0 norm for counting the number of nonzero elements.

View Article and Find Full Text PDF

Similar Publications

Cardinality matching versus propensity score matching for addressing cluster-level residual confounding in implantable medical device and surgical epidemiology: a parametric and plasmode simulation study.

BMC Med Res Methodol

November 2024

Pharmaco- and Device Epidemiology Group, Health Data Sciences, Botnar Research Centre, NDORMS, University of Oxford, Windmill Road, Oxford, OX3 7LD, UK.

Mike Du Stephen Johnston Paul M Coplan Victoria Y Strauss Sara Khalid

Background: Rapid innovation and new regulations lead to an increased need for post-marketing surveillance of implantable devices. However, complex multi-level confounding related not only to patient-level but also to surgeon or hospital covariates hampers observational studies of risks and benefits. We conducted parametric and plasmode simulations to compare the performance of cardinality matching (CM) vs propensity score matching (PSM) to reduce confounding bias in the presence of cluster-level confounding.

View Article and Find Full Text PDF

Similar Publications

Theoretical Analysis of Learned Database Operations under Distribution Shift through Distribution Learnability.

Proc Mach Learn Res

July 2024

University of Southern California.

Sepanta Zeighami Cyrus Shahabi

Use of machine learning to perform database operations, such as indexing, cardinality estimation, and sorting, is shown to provide substantial performance benefits. However, when datasets change and data distribution shifts, empirical results also show performance degradation for learned models, possibly to worse than non-learned alternatives. This, together with a lack of theoretical understanding of learned methods undermines their practical applicability, since there are no guarantees on how well the models will perform after deployment.

View Article and Find Full Text PDF

Similar Publications

How many tooth colors are there?

Dent Mater

January 2025

School of Design, University of Leeds, Leeds, UK.

Sascha Hein Ján Morovič Peter Morovič Omnia Saleh Jörg Lüchtenborg

Objectives: This study aimed to estimate the number of distinct tooth colors using a large dataset of in-vivo CIELAB measurements. It further assessed the coverage error (CE) and coverage error percentage (CEP) of commonly used shade guides and determined the number of shades needed for an ideal guide, using the Euclidean distance (ΔEab) and thresholds for clinical perceptibility (PT) and acceptability (AT) as evaluation criteria.

Methods: A total of 8153 untreated maxillary and mandibular anterior teeth were measured in vivo using calibrated dental photography.

View Article and Find Full Text PDF

Similar Publications

Tensor product algorithms for inference of contact network from epidemiological data.

BMC Bioinformatics

September 2024

University of Essex, Wivenhoe Park, Colchester, CO4 3SQ, UK.

Sergey Dolgov Dmitry Savostyanov

We consider a problem of inferring contact network from nodal states observed during an epidemiological process. In a black-box Bayesian optimisation framework this problem reduces to a discrete likelihood optimisation over the set of possible networks. The cardinality of this set grows combinatorially with the number of network nodes, which makes this optimisation computationally challenging.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!