Publications by authors named "Nick Goldman"

Protein sequencing is a rapidly evolving field with much progress towards the realization of a new generation of protein sequencers. The early devices, however, may not be able to reliably discriminate all 20 amino acids, resulting in a partial, noisy and possibly error-prone signature of a protein. Rather than achieving sequencing, these devices may aim to identify target proteins by comparing such signatures to databases of known proteins.

View Article and Find Full Text PDF

Motivation: Tracking SARS-CoV-2 variants through genomic sequencing has been an important part of the global response to the pandemic and remains a useful tool for surveillance of the virus. As well as whole-genome sequencing of clinical samples, this surveillance effort has been aided by amplicon sequencing of wastewater samples, which proved effective in real case studies. Because of its relevance to public healthcare decisions, testing and benchmarking wastewater sequencing analysis methods is also crucial, which necessitates a simulator.

View Article and Find Full Text PDF

Cyanobacteria are globally occurring photosynthetic bacteria notable for their contribution to primary production and production of toxins which have detrimental ecosystem impacts. Furthermore, cyanobacteria can form mutualistic symbiotic relationships with a diverse set of eukaryotes, including land plants, aquatic plankton and fungi. Nevertheless, not all cyanobacteria are found in symbiotic associations suggesting symbiotic cyanobacteria have evolved specializations that facilitate host-interactions.

View Article and Find Full Text PDF
Article Synopsis
  • Real-time genomics using nanopore sequencing can quickly predict antibiotic resistance in clinical settings, which is crucial for timely treatment.
  • Despite some accuracy concerns compared to traditional methods, this approach can accurately identify low-abundance resistance factors often missed by conventional diagnostics.
  • The study highlights that real-time genomic analysis can greatly enhance clinical decision-making by revealing hidden resistance profiles, ultimately improving patient outcomes.
View Article and Find Full Text PDF

Myelin, the insulating sheath that surrounds neuronal axons, is produced by oligodendrocytes in the central nervous system (CNS). This evolutionary innovation, which first appears in jawed vertebrates, enabled rapid transmission of nerve impulses, more complex brains, and greater morphological diversity. Here, we report that RNA-level expression of RNLTR12-int, a retrotransposon of retroviral origin, is essential for myelination.

View Article and Find Full Text PDF
Article Synopsis
  • Traditional methods of inferring deep phylogenies have favored protein sequences over DNA, believing proteins are more reliable due to lower rates of homoplasy and saturation.
  • This study challenges that view, showing through simulations that DNA sequences can be just as effective, if not more so, for reconstructing accurate phylogenetic trees.
  • Additionally, DNA analysis offers a significant computational advantage, enabling the use of more advanced models that account for variations in substitution processes, making it a promising method for phylogenetic inference.
View Article and Find Full Text PDF

Phylogenetics has a crucial role in genomic epidemiology. Enabled by unparalleled volumes of genome sequence data generated to study and help contain the COVID-19 pandemic, phylogenetic analyses of SARS-CoV-2 genomes have shed light on the virus's origins, spread, and the emergence and reproductive success of new variants. However, most phylogenetic approaches, including maximum likelihood and Bayesian methods, cannot scale to the size of the datasets from the current pandemic.

View Article and Find Full Text PDF

Nanopore sequencers can select which DNA molecules to sequence, rejecting a molecule after analysis of a small initial part. Currently, selection is based on predetermined regions of interest that remain constant throughout an experiment. Sequencing efforts, thus, cannot be re-focused on molecules likely contributing most to experimental success.

View Article and Find Full Text PDF

Sequence simulators are fundamental tools in bioinformatics, as they allow us to test data processing and inference tools, and are an essential component of some inference methods. The ongoing surge in available sequence data is however testing the limits of our bioinformatics software. One example is the large number of SARS-CoV-2 genomes available, which are beyond the processing power of many methods, and simulating such large datasets is also proving difficult.

View Article and Find Full Text PDF

Phylogenetics plays a crucial role in the interpretation of genomic data. Phylogenetic analyses of SARS-CoV-2 genomes have allowed the detailed study of the virus's origins, of its international and local spread, and of the emergence and reproductive success of new variants, among many applications. These analyses have been enabled by the unparalleled volumes of genome sequence data generated and employed to study and help contain the pandemic.

View Article and Find Full Text PDF

The evolution of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus leads to new variants that warrant timely epidemiological characterization. Here we use the dense genomic surveillance data generated by the COVID-19 Genomics UK Consortium to reconstruct the dynamics of 71 different lineages in each of 315 English local authorities between September 2020 and June 2021. This analysis reveals a series of subepidemics that peaked in early autumn 2020, followed by a jump in transmissibility of the B.

View Article and Find Full Text PDF

The vast scale of SARS-CoV-2 sequencing data has made it increasingly challenging to comprehensively analyze all available data using existing tools and file formats. To address this, we present a database of SARS-CoV-2 phylogenetic trees inferred with unrestricted public sequences, which we update daily to incorporate new sequences. Our database uses the recently proposed mutation-annotated tree (MAT) format to efficiently encode the tree with branches labeled with parsimony-inferred mutations, as well as Nextstrain clade and Pango lineage labels at clade roots.

View Article and Find Full Text PDF

In the absence of effective treatment, COVID-19 is likely to remain a global disease burden. Compounding this threat is the near certainty that novel coronaviruses with pandemic potential will emerge in years to come. Pan-coronavirus drugs-agents active against both SARS-CoV-2 and other coronaviruses-would address both threats.

View Article and Find Full Text PDF

Background: Many important applications in bioinformatics, including sequence alignment and protein family profiling, employ sequence weighting schemes to mitigate the effects of non-independence of homologous sequences and under- or over-representation of certain taxa in a dataset. These schemes aim to assign high weights to sequences that are 'novel' compared to the others in the same dataset, and low weights to sequences that are over-represented.

Results: We formalise this principle by rigorously defining the evolutionary 'novelty' of a sequence within an alignment.

View Article and Find Full Text PDF

The COVID-19 pandemic has seen an unprecedented response from the sequencing community. Leveraging the sequence data from more than 140,000 SARS-CoV-2 genomes, we study mutation rates and selective pressures affecting the virus. Understanding the processes and effects of mutation and selection has profound implications for the study of viral evolution, for vaccine design, and for the tracking of viral spread.

View Article and Find Full Text PDF

The vast scale of SARS-CoV-2 sequencing data has made it increasingly challenging to comprehensively analyze all available data using existing tools and file formats. To address this, we present a database of SARS-CoV-2 phylogenetic trees inferred with unrestricted public sequences, which we update daily to incorporate new sequences. Our database uses the recently-proposed mutation-annotated tree (MAT) format to efficiently encode the tree with branches labeled with parsimony-inferred mutations as well as Nextstrain clade and Pango lineage labels at clade roots.

View Article and Find Full Text PDF

Sequence simulators are fundamental tools in bioinformatics, as they allow us to test data processing and inference tools, as well as being part of some inference methods. The ongoing surge in available sequence data is however testing the limits of our bioinformatics software. One example is the large number of SARS-CoV-2 genomes available, which are beyond the processing power of many methods, and simulating such large datasets is also proving difficult.

View Article and Find Full Text PDF

Many complex genomic rearrangements arise through template switch errors, which occur in DNA replication when there is a transient polymerase switch to an alternate template nearby in three-dimensional space. While typically investigated at kilobase-to-megabase scales, the genomic and evolutionary consequences of this mutational process are not well characterised at smaller scales, where they are often interpreted as clusters of independent substitutions, insertions and deletions. Here we present an improved statistical approach using pair hidden Markov models, and use it to detect and describe short-range template switches underlying clusters of mutations in the multi-way alignment of hominid genomes.

View Article and Find Full Text PDF

The COVID-19 pandemic has seen an unprecedented response from the sequencing community. Leveraging the sequence data from more than 140,000 SARS-CoV-2 genomes, we study mutation rates and selective pressures affecting the virus. Understanding the processes and effects of mutation and selection has profound implications for the study of viral evolution, for vaccine design, and for the tracking of viral spread.

View Article and Find Full Text PDF

Phylogeographic inference allows reconstruction of past geographical spread of pathogens or living organisms by integrating genetic and geographic data. A popular model in continuous phylogeography-with location data provided in the form of latitude and longitude coordinates-describes spread as a Brownian motion (Brownian Motion Phylogeography, BMP) in continuous space and time, akin to similar models of continuous trait evolution. Here, we show that reconstructions using this model can be strongly affected by sampling biases, such as the lack of sampling from certain areas.

View Article and Find Full Text PDF

The SARS-CoV-2 pandemic has led to unprecedented, nearly real-time genetic tracing due to the rapid community sequencing response. Researchers immediately leveraged these data to infer the evolutionary relationships among viral samples and to study key biological questions, including whether host viral genome editing and recombination are features of SARS-CoV-2 evolution. This global sequencing effort is inherently decentralized and must rely on data collected by many labs using a wide variety of molecular and bioinformatic techniques.

View Article and Find Full Text PDF

How can we best learn the history of a protein's evolution? Ideally, a model of sequence evolution should capture both the process that generates genetic variation and the functional constraints determining which changes are fixed. However, in practical terms the most suitable approach may simply be the one that combines the convenience of easily available input data with the ability to return useful parameter estimates. For example, we might be interested in a measure of the strength of selection (typically obtained using a codon model) or an ancestral structure (obtained using structural modeling based on inferred amino acid sequence and side chain configuration).

View Article and Find Full Text PDF

Understanding the molecular basis of adaptation to the environment is a central question in evolutionary biology, yet linking detected signatures of positive selection to molecular mechanisms remains challenging. Here we demonstrate that combining sequence-based phylogenetic methods with structural information assists in making such mechanistic interpretations on a genomic scale. Our integrative analysis shows that positively selected sites tend to colocalize on protein structures and that positively selected clusters are found in functionally important regions of proteins, indicating that positive selection can contravene the well-known principle of evolutionary conservation of functionally important regions.

View Article and Find Full Text PDF