Background: Alignment-free methods are a popular approach for comparing biological sequences, including complete genomes. The methods range from probability distributions of sequence composition to first and higher-order Markov chains, where a k-th order Markov chain over DNA has [Formula: see text] formal parameters. To circumvent this exponential growth in parameters, variable-length Markov chains (VLMCs) have gained popularity for applications in molecular biology and other areas. VLMCs adapt the depth depending on sequence context and thus curtail excesses in the number of parameters. The scarcity of available fast, or even parallel software tools, prompted the development of a parallel implementation using lazy suffix trees and a hash-based alternative.

Results: An extensive evaluation was performed on genomes ranging from 12Mbp to 22Gbp. Relevant learning parameters were chosen guided by the Bayesian Information Criterion (BIC) to avoid over-fitting. Our implementation greatly improves upon the state-of-the-art even in serial execution. It exhibits very good parallel scaling with speed-ups for long sequences close to the optimum indicated by Amdahl's law of 3 for 4 threads and about 6 for 16 threads, respectively.

Conclusions: Our parallel implementation released as open-source under the GPLv3 license provides a practically useful alternative to the state-of-the-art which allows the construction of VLMCs even for very large genomes significantly faster than previously possible. Additionally, our parameter selection based on BIC gives guidance to end-users comparing genomes.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8501649PMC
http://dx.doi.org/10.1186/s12859-021-04387-yDOI Listing

Publication Analysis

Top Keywords

markov chains
12
fast parallel
8
variable-length markov
8
parallel implementation
8
parallel construction
4
construction variable-length
4
markov
4
chains background
4
background alignment-free
4
alignment-free methods
4

Similar Publications

Background: The management of multiple sclerosis (MS) during pregnancy poses significant challenges. This study aimed to evaluate the cost-effectiveness of three natalizumab treatment strategies during pregnancy from the UK healthcare system's perspective.

Methods: A Markov model was developed to assess the health outcomes and costs associated with three treatment strategies: continuous natalizumab treatment throughout pregnancy, treatment until the first trimester followed by discontinuation, and discontinuation at conception with resumption post-pregnancy.

View Article and Find Full Text PDF

γ-Aminobutyric acid type A (GABA) receptors are ligand-gated ion channels in the central nervous system with largely inhibitory function. Despite being a target for drugs including general anesthetics and benzodiazepines, experimental structures have yet to capture an open state of classical synaptic α1β2γ2 GABA receptors. Here, we use a goal-oriented adaptive sampling strategy in molecular dynamics simulations followed by Markov state modeling to capture an energetically stable putative open state of the receptor.

View Article and Find Full Text PDF

Background/objectives: Child malnutrition is a critical public health concern that significantly hampers children's physical and mental development and imposes serious economic burdens. The World Health Organization (WHO) estimates that malnutrition is responsible for half of all deaths among children under five, leading to long-term consequences such as lower educational achievement, decreased productivity, and deepened poverty. This study aims to estimate the burden of child malnutrition in Colombia for children up to four years old, assessing both direct and indirect costs from a societal perspective.

View Article and Find Full Text PDF

The COVID-19 pandemic has highlighted the crucial role of health sector decision-makers in establishing and evaluating effective treatment and prevention policies. To inform sound decisions, it is essential to simultaneously monitor multiple pandemic characteristics, including transmission rates, infection rates, recovery rates (which indicate treatment efficacy), and fatality rates. This study introduces an innovative application of existing methodologies: the Multivariate Exponentially Weighted Moving Average (MEWMA) and Multivariate Cumulative Sum (MCUSUM) control charts (CCs), used for monitoring the parameters of the Susceptible, Exposed, Infected, Recovered, Death, and Vaccination (SEIRDV) model.

View Article and Find Full Text PDF

Objectives: To evaluate the cost-utility of botulinum toxin A (BoNT-A) for treating upper limb (UL) and lower limb (LL) post-stroke spasticity.

Design: Using a Markov model, adopting a societal perspective and a lifetime horizon with a 3% annual discount rate, the cost-utility analysis was conducted to compare BoNT-A combined with standard of care (SoC) with SoC alone. Costs, utilities, transitional probabilities and treatment efficacy were derived from 5-year retrospective data from tertiary hospitals and meta-analysis.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!