Statistical inference of evolutionary parameters from molecular sequence data relies on coalescent models to account for the shared genealogical ancestry of the samples. However, inferential algorithms do not scale to available data sets. A strategy to improve computational efficiency is to rely on simpler coalescent and mutation models, resulting in smaller hidden state spaces. An estimate of the cardinality of the state-space of genealogical trees at different resolutions is essential to decide the best modeling strategy for a given dataset. To our knowledge, there is neither an exact nor approximate method to determine these cardinalities. We propose a sequential importance sampling algorithm to estimate the cardinality of the sample space of genealogical trees under different coalescent resolutions. Our sampling scheme proceeds sequentially across the set of combinatorial constraints imposed by the data, which in this work are completely linked sequences of DNA at a non recombining segment. We analyze the cardinality of different genealogical tree spaces on simulations to study the settings that favor coarser resolutions. We apply our method to estimate the cardinality of genealogical tree spaces from mtDNA data from the 1000 genomes and a sample from a Melanesian population at the -globin locus.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8118586 | PMC |
http://dx.doi.org/10.1214/19-AOAS1313 | DOI Listing |
PLoS One
December 2024
Institute of Systems and Information Engineering, University of Tsukuba, Tsukuba, Ibaraki, Japan.
Sparse estimation of a Gaussian graphical model (GGM) is an important technique for making relationships between observed variables more interpretable. Various methods have been proposed for sparse GGM estimation, including the graphical lasso that uses the ℓ1 norm regularization term, and other methods that use nonconvex regularization terms. Most of these methods approximate the ℓ0 (pseudo) norm by more tractable functions; however, to estimate more accurate solutions, it is preferable to directly use the ℓ0 norm for counting the number of nonzero elements.
View Article and Find Full Text PDFBMC Med Res Methodol
November 2024
Pharmaco- and Device Epidemiology Group, Health Data Sciences, Botnar Research Centre, NDORMS, University of Oxford, Windmill Road, Oxford, OX3 7LD, UK.
Background: Rapid innovation and new regulations lead to an increased need for post-marketing surveillance of implantable devices. However, complex multi-level confounding related not only to patient-level but also to surgeon or hospital covariates hampers observational studies of risks and benefits. We conducted parametric and plasmode simulations to compare the performance of cardinality matching (CM) vs propensity score matching (PSM) to reduce confounding bias in the presence of cluster-level confounding.
View Article and Find Full Text PDFUse of machine learning to perform database operations, such as indexing, cardinality estimation, and sorting, is shown to provide substantial performance benefits. However, when datasets change and data distribution shifts, empirical results also show performance degradation for learned models, possibly to worse than non-learned alternatives. This, together with a lack of theoretical understanding of learned methods undermines their practical applicability, since there are no guarantees on how well the models will perform after deployment.
View Article and Find Full Text PDFDent Mater
January 2025
School of Design, University of Leeds, Leeds, UK.
Objectives: This study aimed to estimate the number of distinct tooth colors using a large dataset of in-vivo CIELAB measurements. It further assessed the coverage error (CE) and coverage error percentage (CEP) of commonly used shade guides and determined the number of shades needed for an ideal guide, using the Euclidean distance (ΔEab) and thresholds for clinical perceptibility (PT) and acceptability (AT) as evaluation criteria.
Methods: A total of 8153 untreated maxillary and mandibular anterior teeth were measured in vivo using calibrated dental photography.
BMC Bioinformatics
September 2024
University of Essex, Wivenhoe Park, Colchester, CO4 3SQ, UK.
We consider a problem of inferring contact network from nodal states observed during an epidemiological process. In a black-box Bayesian optimisation framework this problem reduces to a discrete likelihood optimisation over the set of possible networks. The cardinality of this set grows combinatorially with the number of network nodes, which makes this optimisation computationally challenging.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!