Publications by authors named "Natasha Glover"

The era of biodiversity genomics is characterized by large-scale genome sequencing efforts that aim to represent each living taxon with an assembled genome. Generating knowledge from this wealth of data has not kept up with this pace. We here discuss major challenges to integrating these novel genomes into a comprehensive functional and evolutionary network spanning the tree of life.

View Article and Find Full Text PDF

The exponential increase in sequencing data calls for conceptual and computational advances to extract useful biological insights. One such advance, minimizers, allows for reducing the quantity of data handled while maintaining some of its key properties. We provide a basic introduction to minimizers, cover recent methodological developments, and review the diverse applications of minimizers to analyze genomic data, including de novo genome assembly, metagenomics, read alignment, read correction, and pangenomes.

View Article and Find Full Text PDF

Background: Comparative genomic analyses to delineate gene evolutionary histories inform the understanding of organismal biology by characterising gene and gene family origins, trajectories, and dynamics, as well as enabling the tracing of speciation, duplication, and loss events, and facilitating the transfer of gene functional information across species. Genomic data are available for an increasing number of species from the genus Drosophila, however, a dedicated resource exploiting these data to provide the research community with browsable results from genus-wide orthology delineation has been lacking.

Methods: Using the OMA Orthologous Matrix orthology inference approach and browser deployment framework, we catalogued orthologues across a selected set of Drosophila species with high-quality annotated genomes.

View Article and Find Full Text PDF

In the era of biodiversity genomics, it is crucial to ensure that annotations of protein-coding gene repertoires are accurate. State-of-the-art tools to assess genome annotations measure the completeness of a gene repertoire but are blind to other errors, such as gene overprediction or contamination. We introduce OMArk, a software package that relies on fast, alignment-free sequence comparisons between a query proteome and precomputed gene families across the tree of life.

View Article and Find Full Text PDF

Evolution stands as a foundational pillar within modern biology, shaping our understanding of life. Studies related to evolution, for example constructing phylogenetic trees, are often carried out using DNA or protein sequences. These data, readily accessible from public databases, represent a treasure trove of resources that can be harnessed to create engaging activities with the public.

View Article and Find Full Text PDF

In this update paper, we present the latest developments in the OMA browser knowledgebase, which aims to provide high-quality orthology inferences and facilitate the study of gene families, genomes and their evolution. First, we discuss the addition of new species in the database, particularly an expanded representation of prokaryotic species. The OMA browser now offers Ancestral Genome pages and an Ancestral Gene Order viewer, allowing users to explore the evolutionary history and gene content of ancestral genomes.

View Article and Find Full Text PDF

Background: In every living species, the function of a protein depends on its organization of structural domains, and the length of a protein is a direct reflection of this. Because every species evolved under different evolutionary pressures, the protein length distribution, much like other genomic features, is expected to vary across species but has so far been scarcely studied.

Results: Here we evaluate this diversity by comparing protein length distribution across 2326 species (1688 bacteria, 153 archaea, and 485 eukaryotes).

View Article and Find Full Text PDF

PHYTOCHROME KINASE SUBSTRATE (PKS) proteins are involved in light-modulated changes in growth orientation. They act downstream of phytochromes to control hypocotyl gravitropism in the light and act early in phototropin signaling. Despite their importance for plant development, little is known about their molecular mode of action, except that they belong to a protein complex comprising phototropins at the plasma membrane (PM).

View Article and Find Full Text PDF

Social bees harbor conserved gut microbiotas that may have been acquired in a common ancestor of social bees and subsequently codiversified with their hosts. However, most of this knowledge is based on studies on the gut microbiotas of honey bees and bumblebees. Much less is known about the gut microbiotas of the third and most diverse group of social bees, the stingless bees.

View Article and Find Full Text PDF

Homoeologs are pairs of genes or chromosomes in the same species that originated by speciation and were brought back together in the same genome by allopolyploidization. Bioinformatic methods for accurate homoeology inference are crucial for studying the evolutionary consequences of polyploidization, and homoeology is typically inferred on the basis of bidirectional best hit (BBH) and/or positional conservation (synteny). However, these methods neglect the fact that genes can duplicate and move, both prior to and after the allopolyploidization event.

View Article and Find Full Text PDF

Accurate determination of the evolutionary relationships between genes is a foundational challenge in biology. Homology-evolutionary relatedness-is in many cases readily determined based on sequence similarity analysis. By contrast, whether or not two genes directly descended from a common ancestor by a speciation event (orthologs) or duplication event (paralogs) is more challenging, yet provides critical information on the history of a gene.

View Article and Find Full Text PDF

Gene duplications and novel genes have been shown to play a major role in helminth adaptation to a parasitic lifestyle because they provide the novelty necessary for adaptation to a changing environment, such as living in multiple hosts. Here we present the de novo sequenced and annotated genome of the parasitic trematode Atriophallophorus winterbourni and its comparative genomic analysis to other major parasitic trematodes. First, we reconstructed the species phylogeny, and dated the split of A.

View Article and Find Full Text PDF

OMA is an established resource to elucidate evolutionary relationships among genes from currently 2326 genomes covering all domains of life. OMA provides pairwise and groupwise orthologs, functional annotations, local and global gene order conservation (synteny) information, among many other functions. This update paper describes the reorganisation of the database into gene-, group- and genome-centric pages.

View Article and Find Full Text PDF

The OMA Collection is a resource for users of Orthologous Matrix. In this collection, we provide tutorials and protocols on how to leverage the tools provided by OMA to analyse your data. Here, I explain the motivation for this collection and its published works thus far.

View Article and Find Full Text PDF

Two low-phytate soybean (Glycine max (L.) Merr.) mutant lines- V99-5089 (mips mutation on chromosome 11) and CX-1834 (mrp-l and mrp-n mutations on chromosomes 19 and 3, respectively) have proven to be valuable resources for breeding of low-phytate, high-sucrose, and low-raffinosaccharide soybeans, traits that are highly desirable from a nutritional and environmental standpoint.

View Article and Find Full Text PDF

Knowledge of species phylogeny is critical to many fields of biology. In an era of genome data availability, the most common way to make a phylogenetic species tree is by using multiple protein-coding genes, conserved in multiple species. This methodology is composed of several steps: orthology inference, multiple sequence alignment and inference of the phylogeny with dedicated tools.

View Article and Find Full Text PDF

The identification of orthologs-genes in different species which descended from the same gene in their last common ancestor-is a prerequisite for many analyses in comparative genomics and molecular evolution. Numerous algorithms and resources have been conceived to address this problem, but benchmarking and interpreting them is fraught with difficulties (need to compare them on a common input dataset, absence of ground truth, computational cost of calling orthologs). To address this, the Quest for Orthologs consortium maintains a reference set of proteomes and provides a web server for continuous orthology benchmarking (http://orthology.

View Article and Find Full Text PDF

The Orthologous Matrix (OMA) is a method and database that allows users to identify orthologs among many genomes. OMA provides three different types of orthologs: pairwise orthologs, OMA Groups and Hierarchical Orthologous Groups (HOGs). This Primer is organized in two parts.

View Article and Find Full Text PDF

The distinction between orthologs and paralogs, genes that started diverging by speciation versus duplication, is relevant in a wide range of contexts, most notably phylogenetic tree inference and protein function annotation. In this chapter, we provide an overview of the methods used to infer orthology and paralogy. We survey both graph-based approaches (and their various grouping strategies) and tree-based approaches, which solve the more general problem of gene/species tree reconciliation.

View Article and Find Full Text PDF

Gene families evolve by the processes of speciation (creating orthologs), gene duplication (paralogs), and horizontal gene transfer (xenologs), in addition to sequence divergence and gene loss. Orthologs in particular play an essential role in comparative genomics and phylogenomic analyses. With the continued sequencing of organisms across the tree of life, the data are available to reconstruct the unique evolutionary histories of tens of thousands of gene families.

View Article and Find Full Text PDF
Article Synopsis
  • Genomes and transcriptomes are often difficult to analyze; identifying orthologs (corresponding genes across species) is a critical but challenging step in this process.
  • The Orthologous MAtrix (OMA) database is a key resource for finding orthologs, and the OMA pipeline can be run as a standalone program on Linux and Mac, supporting various job schedulers and scaling up for large data processing.
  • OMA standalone allows users to integrate their own data with public genomic data and offers applications like phylogenetic analysis and identifying gene family changes or potential drug targets, and is available as open-source software.
View Article and Find Full Text PDF

Bacteria that engage in long-standing associations with particular hosts are expected to evolve host-specific adaptations that limit their capacity to thrive in other environments. Consistent with this, many gut symbionts seem to have a limited host range, based on community profiling and phylogenomics. However, few studies have experimentally investigated host specialization of gut symbionts and the underlying mechanisms have largely remained elusive.

View Article and Find Full Text PDF

In polyploid genomes, homoeologs are a specific subtype of homologs, and can be thought of as orthologs between subgenomes. In Orthologous MAtrix, we infer homoeologs in three polyploid plant species: upland cotton (), rapeseed (), and bread wheat (). While we can typically recognize the features of a "good" homoeolog prediction (a consistent evolutionary distance, high synteny, and a one-to-one relationship), none of them is a hard-fast criterion.

View Article and Find Full Text PDF

The Orthologous Matrix (OMA) is a leading resource to relate genes across many species from all of life. In this update paper, we review the recent algorithmic improvements in the OMA pipeline, describe increases in species coverage (particularly in plants and early-branching eukaryotes) and introduce several new features in the OMA web browser. Notable improvements include: (i) a scalable, interactive viewer for hierarchical orthologous groups; (ii) protein domain annotations and domain-based links between orthologous groups; (iii) functionality to retrieve phylogenetic marker genes for a subset of species of interest; (iv) a new synteny dot plot viewer; and (v) an overhaul of the programmatic access (REST API and semantic web), which will facilitate incorporation of OMA analyses in computational pipelines and integration with other bioinformatic resources.

View Article and Find Full Text PDF

Motivation: Accurate orthology inference is a fundamental step in many phylogenetics and comparative analysis. Many methods have been proposed, including OMA (Orthologous MAtrix). Yet substantial challenges remain, in particular in coping with fragmented genes or genes evolving at different rates after duplication, and in scaling to large datasets.

View Article and Find Full Text PDF