Publications by authors named "Jotun Hein"

Motivation: A main challenge in molecular evolution is to find computationally efficient mutation models with flexible assumptions that properly reflect genetic variation. The infinite sites model assumes that each mutation event occurs at a site never previously mutant, i.e.

View Article and Find Full Text PDF

Modern phylogeography aims at reconstructing the geographic movement of organisms based on their genomic sequences and spatial information. Phylogeographic approaches are often applied to pathogen sequences and therefore tend to neglect the possibility of recombination, which decouples the evolutionary and geographic histories of different parts of the genome. Genomic regions of recombining or reassorting pathogens often originate and evolve at different times and locations, which characterize their unique spatial histories.

View Article and Find Full Text PDF

Recombination is a powerful evolutionary process that shapes the genetic diversity observed in the populations of many species. Reconstructing genealogies in the presence of recombination from sequencing data is a very challenging problem, as this relies on mutations having occurred on the correct lineages in order to detect the recombination and resolve the ordering of coalescence events in the local trees. We investigate the probability of reconstructing the true topology of ancestral recombination graphs (ARGs) under the coalescent with recombination and gene conversion.

View Article and Find Full Text PDF

Background: Mandatory COVID-19 certification, showing proof of vaccination, negative test, or recent infection to access to public venues, was introduced at different times in the four countries of the UK. We aim to study its effects on the incidence of cases and hospital admissions.

Methods: We performed Negative binomial segmented regression and ARIMA analyses for four countries (England, Northern Ireland, Scotland and Wales), and fitted Difference-in-Differences models to compare the latter three to England, as a negative control group, since it was the last country where COVID-19 certification was introduced.

View Article and Find Full Text PDF

Gene expression is controlled by pathways of regulatory factors often involving the activity of protein kinases on transcription factor proteins. Despite this well established mechanism, the number of well described pathways that include the regulatory role of protein kinases on transcription factors is surprisingly scarce in eukaryotes. To address this, PhosTF was developed to infer functional regulatory interactions and pathways in both simulated and real biological networks, based on linear cyclic causal models with latent variables.

View Article and Find Full Text PDF

The evolution of cooperation in cellular groups is threatened by lineages of cheaters that proliferate at the expense of the group. These cell lineages occur within microbial communities, and multicellular organisms in the form of tumours and cancer. In contrast to an earlier study, here we show how the evolution of pleiotropic genetic architectures-which link the expression of cooperative and private traits-can protect against cheater lineages and allow cooperation to evolve.

View Article and Find Full Text PDF

The evolutionary process of genetic recombination has the potential to rapidly change the properties of a viral pathogen, and its presence is a crucial factor to consider in the development of treatments and vaccines. It can also significantly affect the results of phylogenetic analyses and the inference of evolutionary rates. The detection of recombination from samples of sequencing data is a very challenging problem and is further complicated for SARS-CoV-2 by its relatively slow accumulation of genetic diversity.

View Article and Find Full Text PDF

Motivation: The reconstruction of possible histories given a sample of genetic data in the presence of recombination and recurrent mutation is a challenging problem, but can provide key insights into the evolution of a population. We present KwARG, which implements a parsimony-based greedy heuristic algorithm for finding plausible genealogical histories (ancestral recombination graphs) that are minimal or near-minimal in the number of posited recombination and mutation events.

Results: Given an input dataset of aligned sequences, KwARG outputs a list of possible candidate solutions, each comprising a list of mutation and recombination events that could have generated the dataset; the relative proportion of recombinations and recurrent mutations in a solution can be controlled via specifying a set of 'cost' parameters.

View Article and Find Full Text PDF

A main challenge in the enumeration of small-molecule chemical spaces for drug design is to quickly and accurately differentiate between possible and impossible molecules. Current approaches for screening enumerated molecules (e.g.

View Article and Find Full Text PDF

The dynamics of a population exhibiting exponential growth can be modelled as a birth-death process, which naturally captures the stochastic variation in population size over time. In this article, we consider a supercritical birth-death process, started at a random time in the past, and conditioned to have n sampled individuals at the present. The genealogy of individuals sampled at the present time is then described by the reversed reconstructed process (RRP), which traces the ancestry of the sample backwards from the present.

View Article and Find Full Text PDF

A key step in the origin of life is the emergence of a primitive metabolism. This requires the formation of a subset of chemical reactions that is both self-sustaining and collectively autocatalytic. A generic approach to study such processes ('RAF theory') has provided a precise and computationally effective way to address these questions, both on simulated data and in laboratory studies.

View Article and Find Full Text PDF

Pairs of nucleotides within functional nucleic acid secondary structures often display evidence of coevolution that is consistent with the maintenance of base-pairing. Here, we introduce a sequence evolution model, MESSI (Modeling the Evolution of Secondary Structure Interactions), that infers coevolution associated with base-paired sites in DNA or RNA sequence alignments. MESSI can estimate coevolution while accounting for an unknown secondary structure.

View Article and Find Full Text PDF

HERV-H endogenous retroviruses are thought to be essential to stem cell identity in humans. We embrace several decades of HERV-H research in order to relate the transcription of HERV-H loci to their genomic structure. We find that highly transcribed HERV-H loci are younger, more fragmented, and less likely to be present in other primate genomes.

View Article and Find Full Text PDF

Classic alignment algorithms utilize scoring functions which maximize similarity or minimize edit distances. These scoring functions account for both insertion-deletion (indel) and substitution events. In contrast, alignments based on stochastic models aim to explicitly describe the evolutionary dynamics of sequences by inferring relevant probabilistic parameters from input sequences.

View Article and Find Full Text PDF

Recently described stochastic models of protein evolution have demonstrated that the inclusion of structural information in addition to amino acid sequences leads to a more reliable estimation of evolutionary parameters. We present a generative, evolutionary model of protein structure and sequence that is valid on a local length scale. The model concerns the local dependencies between sequence and structure evolution in a pair of homologous proteins.

View Article and Find Full Text PDF

To understand the population genetics of structural variants and their effects on phenotypes, we developed an approach to mapping structural variants that segregate in a population sequenced at low coverage. We avoid calling structural variants directly. Instead, the evidence for a potential structural variant at a locus is indicated by variation in the counts of short-reads that map anomalously to that locus.

View Article and Find Full Text PDF

Finding causal relationships between genotypic and phenotypic variation is a key focus of evolutionary biology, human genetics and plant breeding. To identify genome-wide patterns underlying trait diversity, we assembled a high-quality reference genome of Cardamine hirsuta, a close relative of the model plant Arabidopsis thaliana. We combined comparative genome and transcriptome analyses with the experimental tools available in C.

View Article and Find Full Text PDF

About 8% of the human genome is made up of endogenous retroviruses (ERVs). Though most human endogenous retroviruses (HERVs) are thought to be irrelevant to our biology notable exceptions include members of the HERV-H family that are necessary for the correct functioning of stem cells. ERVs are commonly found in two forms, the full-length proviral form, and the more numerous solo-LTR form, thought to result from homologous recombination events.

View Article and Find Full Text PDF

Human immunodeficiency virus (HIV) is a rapidly evolving pathogen that causes chronic infections, so genetic diversity within a single infection can be very high. High-throughput "deep" sequencing can now measure this diversity in unprecedented detail, particularly since it can be performed at different time points during an infection, and this offers a potentially powerful way to infer the evolutionary dynamics of the intrahost viral population. However, population genomic inference from HIV sequence data is challenging because of high rates of mutation and recombination, rapid demographic changes, and ongoing selective pressures.

View Article and Find Full Text PDF

Background: Endogenous retroviruses (ERVs) are often viewed as selfish DNA that do not contribute to host phenotype. Yet ERVs have also been co-opted to play important roles in the maintenance of stem cell identity and placentation, amongst other things. This has led to debate over whether the typical ERV confers a cost or benefit upon the host.

View Article and Find Full Text PDF

Background: A standard procedure in many areas of bioinformatics is to use a single multiple sequence alignment (MSA) as the basis for various types of analysis. However, downstream results may be highly sensitive to the alignment used, and neglecting the uncertainty in the alignment can lead to significant bias in the resulting inference. In recent years, a number of approaches have been developed for probabilistic sampling of alignments, rather than simply generating a single optimum.

View Article and Find Full Text PDF

For sequences that are highly divergent, there is often insufficient information to infer accurate alignments, and phylogenetic uncertainty may be high. One way to address this issue is to make use of protein structural information, since structures generally diverge more slowly than sequences. In this work, we extend a recently developed stochastic model of pairwise structural evolution to multiple structures on a tree, analytically integrating over ancestral structures to permit efficient likelihood computations under the resulting joint sequence-structure model.

View Article and Find Full Text PDF

Background: We wish to understand how sex and recombination affect endogenous retroviral insertion and deletion. While theory suggests that the risk of ectopic recombination will limit the accumulation of repetitive DNA in areas of high meiotic recombination, the experimental evidence so far has been inconsistent. Under the assumption of neutrality, we examine the genomes of eighteen species of animal in order to compute the ratio of solo-LTRs that derive from insertions occurring down the male germ line as opposed to the female one (male bias).

View Article and Find Full Text PDF

Background: With the advancement of next-generation sequencing and transcriptomics technologies, regulatory effects involving RNA, in particular RNA structural changes are being detected. These results often rely on RNA secondary structure predictions. However, current approaches to RNA secondary structure modelling produce predictions with a high variance in predictive accuracy, and we have little quantifiable knowledge about the reasons for these variances.

View Article and Find Full Text PDF