Statistical inference of evolutionary parameters from molecular sequence data relies on coalescent models to account for the shared genealogical ancestry of the samples. However, inferential algorithms do not scale to available data sets. A strategy to improve computational efficiency is to rely on simpler coalescent and mutation models, resulting in smaller hidden state spaces. An estimate of the cardinality of the state-space of genealogical trees at different resolutions is essential to decide the best modeling strategy for a given dataset. To our knowledge, there is neither an exact nor approximate method to determine these cardinalities. We propose a sequential importance sampling algorithm to estimate the cardinality of the sample space of genealogical trees under different coalescent resolutions. Our sampling scheme proceeds sequentially across the set of combinatorial constraints imposed by the data, which in this work are completely linked sequences of DNA at a non recombining segment. We analyze the cardinality of different genealogical tree spaces on simulations to study the settings that favor coarser resolutions. We apply our method to estimate the cardinality of genealogical tree spaces from mtDNA data from the 1000 genomes and a sample from a Melanesian population at the -globin locus.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8118586PMC
http://dx.doi.org/10.1214/19-AOAS1313DOI Listing

Publication Analysis

Top Keywords

estimate cardinality
12
sequential sampling
8
genealogical trees
8
cardinality genealogical
8
genealogical tree
8
tree spaces
8
genealogical
5
sampling multiresolution
4
multiresolution kingman-tajima
4
coalescent
4

Similar Publications

DC algorithm for estimation of sparse Gaussian graphical models.

PLoS One

December 2024

Institute of Systems and Information Engineering, University of Tsukuba, Tsukuba, Ibaraki, Japan.

Sparse estimation of a Gaussian graphical model (GGM) is an important technique for making relationships between observed variables more interpretable. Various methods have been proposed for sparse GGM estimation, including the graphical lasso that uses the ℓ1 norm regularization term, and other methods that use nonconvex regularization terms. Most of these methods approximate the ℓ0 (pseudo) norm by more tractable functions; however, to estimate more accurate solutions, it is preferable to directly use the ℓ0 norm for counting the number of nonzero elements.

View Article and Find Full Text PDF

Background: Rapid innovation and new regulations lead to an increased need for post-marketing surveillance of implantable devices. However, complex multi-level confounding related not only to patient-level but also to surgeon or hospital covariates hampers observational studies of risks and benefits. We conducted parametric and plasmode simulations to compare the performance of cardinality matching (CM) vs propensity score matching (PSM) to reduce confounding bias in the presence of cluster-level confounding.

View Article and Find Full Text PDF

Use of machine learning to perform database operations, such as indexing, cardinality estimation, and sorting, is shown to provide substantial performance benefits. However, when datasets change and data distribution shifts, empirical results also show performance degradation for learned models, possibly to worse than non-learned alternatives. This, together with a lack of theoretical understanding of learned methods undermines their practical applicability, since there are no guarantees on how well the models will perform after deployment.

View Article and Find Full Text PDF

Objectives: This study aimed to estimate the number of distinct tooth colors using a large dataset of in-vivo CIELAB measurements. It further assessed the coverage error (CE) and coverage error percentage (CEP) of commonly used shade guides and determined the number of shades needed for an ideal guide, using the Euclidean distance (ΔEab) and thresholds for clinical perceptibility (PT) and acceptability (AT) as evaluation criteria.

Methods: A total of 8153 untreated maxillary and mandibular anterior teeth were measured in vivo using calibrated dental photography.

View Article and Find Full Text PDF

We consider a problem of inferring contact network from nodal states observed during an epidemiological process. In a black-box Bayesian optimisation framework this problem reduces to a discrete likelihood optimisation over the set of possible networks. The cardinality of this set grows combinatorially with the number of network nodes, which makes this optimisation computationally challenging.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!