Publications by authors named "Salzberg S"

As the number and variety of assembled genomes continues to grow, the number of annotated genomes is falling behind, particularly for eukaryotes. DNA-based mapping tools help to address this challenge, but they are only able to transfer annotation between closely-related species. Here we introduce LiftOn, a homology-based software tool that integrates DNA and protein alignments to enhance the accuracy of genome-scale annotation and to allow mapping between relatively distant species.

View Article and Find Full Text PDF

Several recent studies have presented evidence that the human gene catalogue should be expanded to include thousands of short open reading frames (ORFs) appearing upstream or downstream of existing protein-coding genes, each of which might create an additional bicistronic transcript in humans. Here we explore an alternative hypothesis that would explain the translational and evolutionary evidence for these upstream ORFs without the need to create novel genes or bicistronic transcripts. We examined 2,199 upstream ORFs that have been proposed as high-quality candidates for novel genes, to determine if they could instead represent protein-coding exons that can be added to existing genes.

View Article and Find Full Text PDF
Article Synopsis
  • - Major instances of Guillain-Barré Syndrome (GBS) were noted during the Zika virus outbreaks from 2014 to 2016, raising questions about why some individuals become more susceptible to GBS following Zika infection.
  • - The study focused on analyzing Zika virus (ZIKV) genotypes from urine samples of GBS patients and controls during the 2016 outbreak in Colombia, using advanced genome sequencing techniques.
  • - Results showed no significant genetic differences between ZIKV strains in GBS cases and those without neurological issues, indicating that GBS may be linked more to patient-specific factors rather than specific ZIKV mutations.
View Article and Find Full Text PDF
Article Synopsis
  • The analysis aimed to compare patient outcomes from two large cohorts in Europe and the USA who underwent coronary artery bypass grafting (CABG) to assess the effectiveness of knowledge exchange among cardiovascular surgery societies.
  • Data was collected from the European DuraGraft Registry (2,522 patients) and the US STS database (294,725 patients), with both groups undergoing CABG between 2016 and 2019, and factors were matched using propensity score models to ensure fair comparison of outcomes.
  • Key findings revealed different patient profiles, with European patients more likely to have left main disease and receive arterial grafts, while US patients tended to have more saphenous vein grafts; however, these differences in treatment approaches were
View Article and Find Full Text PDF
Article Synopsis
  • Metagenomic next generation sequencing (mNGS) effectively identified novel and rare pathogens in patients with unexplained acute febrile illness in Uganda, surpassing traditional clinical microbiology methods.
  • The study involved 42 participants, aged around 28, who exhibited symptoms suggestive of viral infections, with 10 of them (23.8%) showing significant viral, bacterial, or fungal signals.
  • This research confirmed the presence of Rickettsia conorii, causing Mediterranean Spotted Fever, marking the first documented case in sub-Saharan Africa, highlighting the potential of mNGS for future disease surveillance.
View Article and Find Full Text PDF

The process of splicing messenger RNA to remove introns plays a central role in creating genes and gene variants. We describe Splam, a novel method for predicting splice junctions in DNA using deep residual convolutional neural networks. Unlike previous models, Splam looks at a 400-base-pair window flanking each splice site, reflecting the biological splicing process that relies primarily on signals within this window.

View Article and Find Full Text PDF

In recent years, a growing number of publications have reported the presence of microbial species in human tumors and of mixtures of microbes that appear to be highly specific to different cancer types. Our recent re-analysis of data from three cancer types revealed that technical errors have caused erroneous reports of numerous microbial species found in sequencing data from The Cancer Genome Atlas (TCGA) project. Here we have expanded our analysis to cover all 5,734 whole-genome sequencing (WGS) data sets currently available from TCGA, covering 25 distinct types of cancer.

View Article and Find Full Text PDF
Article Synopsis
  • The study investigates pathogenic microorganisms present in the corneal epithelial layer of patients with keratoconus, comparing samples from ten keratoconus eyes and three healthy controls.
  • DNA was extracted and analyzed using metagenomic next-generation sequencing (mNGS), which showed low microbial counts in both groups, with no significant differences between them.
  • The predominant microbial group found was Proteobacteria, and the results suggest that a chronic infection is unlikely to contribute to the development of keratoconus, though an acute infection may still play a role in its onset.
View Article and Find Full Text PDF

Stony coral tissue loss disease (SCTLD) has devastated coral reefs off the coast of Florida and continues to spread throughout the Caribbean. Although a number of bacterial taxa have consistently been associated with SCTLD, no pathogen has been definitively implicated in the etiology of SCTLD. Previous studies have predominantly focused on the prokaryotic community through 16S rRNA sequencing of healthy and affected tissues.

View Article and Find Full Text PDF

In 2020 we published Liftoff, which was the first standalone tool specifically designed for transferring gene annotations between genome assemblies of the same or closely related species. While the gene content is expected to be very similar in closely related genomes, the differences may be biologically consequential, and a computational method to extract all gene-related differences should prove useful in the analysis of such genomes. Here we present LiftoffTools, a toolkit to automate the detection and analysis of gene sequence variants, synteny, and gene copy number changes.

View Article and Find Full Text PDF

Unlabelled: Evaluating the accuracy of protein-coding sequences in genome annotations is a challenging problem for which there is no broadly applicable solution. In this manuscript we introduce PSAURON (Protein Sequence Assessment Using a Reference ORF Network), a novel software tool developed to assess the quality of protein-coding gene annotations. Utilizing a machine learning model trained on a diverse dataset from over 1000 plant and animal genomes, PSAURON assigns a score to coding DNA or protein sequence that reflects the likelihood that the sequence is a genuine protein coding region.

View Article and Find Full Text PDF

As the number and variety of assembled genomes continues to grow, the number of annotated genomes is falling behind, particularly for eukaryotes. DNA-based mapping tools help to address this challenge, but they are only able to transfer annotation between closely-related species. Here we introduce LiftOn, a homology-based software tool that integrates DNA and protein alignments to enhance the accuracy of genome-scale annotation and to allow mapping between relatively distant species.

View Article and Find Full Text PDF

The rapid growth in the number of sequenced genomes makes it possible to search for the appearance of entirely new introns in the human lineage. In this study, we compared the genomic sequences for 19,120 human protein-coding genes to a collection of 3493 vertebrate genomes, mapping the patterns of intron alignments onto a phylogenetic tree. This mapping allowed us to trace many intron gain events to precise locations in the tree, corresponding to distinct points in evolutionary history.

View Article and Find Full Text PDF
Article Synopsis
  • Recent studies suggest that thousands of short open reading frames (ORFs) near existing genes should be added to the human gene catalogue, implying they could act as new genes or bicistronic transcripts.
  • We investigated whether these proposed ORFs could instead represent protein-coding exons connected to already existing genes rather than creating new genes.
  • Our analysis of 2,199 upstream ORFs showed that 582 have strong evidence of forming protein coding exons, which could result in proteins with structural quality equal to or better than the currently recognized versions.
View Article and Find Full Text PDF

Whitebark pine (WBP, Pinus albicaulis) is a white pine of subalpine regions in the Western contiguous United States and Canada. WBP has become critically threatened throughout a significant part of its natural range due to mortality from the introduced fungal pathogen white pine blister rust (WPBR, Cronartium ribicola) and additional threats from mountain pine beetle (Dendroctonus ponderosae), wildfire, and maladaptation due to changing climate. Vast acreages of WBP have suffered nearly complete mortality.

View Article and Find Full Text PDF

Differential transcript usage (DTU) plays a crucial role in determining how gene expression differs among cells, tissues, and developmental stages, contributing to the complexity and diversity of biological systems. In abnormal cells, it can also lead to deficiencies in protein function and underpin disease pathogenesis. Analyzing DTU via RNA sequencing (RNA-seq) data is vital, but the genetic heterogeneity in populations with complex diseases presents an intricate challenge due to diverse causal events and undetermined subtypes.

View Article and Find Full Text PDF

Stony coral tissue loss disease (SCTLD) has devastated coral reefs off the coast of Florida and continues to spread throughout the Caribbean. Although a number of bacterial taxa have consistently been associated with SCTLD, no pathogen has been definitively implicated in the etiology of SCTLD. Previous studies have predominantly focused on the prokaryotic community through 16S rRNA sequencing of healthy and affected tissues.

View Article and Find Full Text PDF

Objectives: Patients with diabetes mellitus (DM) undergoing coronary artery bypass grafting (CABG) have been repeatedly demonstrated to have worse clinical outcomes compared to patients without DM. The objective of this study was to evaluate the impact of DM on 1-year clinical outcomes after isolated CABG.

Methods: The European DuraGraft registry included 1130 patients (44.

View Article and Find Full Text PDF

ORFanage is a system designed to assign open reading frames (ORFs) to known and novel gene transcripts while maximizing similarity to annotated proteins. The primary intended use of ORFanage is the identification of ORFs in the assembled results of RNA sequencing experiments, a capability that most transcriptome assembly methods do not have. Our experiments demonstrate how ORFanage can be used to find novel protein variants in RNA-seq datasets, and to improve the annotations of ORFs in tens of thousands of transcript models in the human annotation databases.

View Article and Find Full Text PDF

Despite many improvements over the years, the annotation of the human genome remains imperfect, and different annotations of the human reference genome sometimes contradict one another. The use of evolutionarily conserved sequences provides a strategy for selecting a high-confidence subset of the annotation that is more likely to be related to biological functions, and the rapidly growing number of genomes from other species increases its power. Using the latest whole genome alignment, we found that splice sites from protein-coding genes in the high-quality MANE annotation are consistently conserved across more than 400 species.

View Article and Find Full Text PDF

Introduction: Salivary duct carcinoma (SDC) is an aggressive and rare subtype of salivary gland carcinoma. Surgical excision and radiotherapy are standard of care for early cancer. Chemotherapies with taxanes and platinum show overall response rates between 39% and 50%.

View Article and Find Full Text PDF
Article Synopsis
  • Whitebark pine (WBP) is under threat from disease like white pine blister rust, pests, wildfires, and climate change, leading to severe mortality across its range in the Western US and Canada.
  • Genomic technologies have been utilized to effectively identify disease-resistant and climate-adapted seed sources for restoring WBP, including advanced sequencing techniques that produced a detailed genome assembly.
  • The study identified a significant number of candidate genes for disease resistance, particularly focusing on nucleotide-binding leucine-rich-repeat receptors (NLRs), enhancing the ability to understand and improve WBP’s resilience compared to earlier methods.
View Article and Find Full Text PDF

CHESS 3 represents an improved human gene catalog based on nearly 10,000 RNA-seq experiments across 54 body sites. It significantly improves current genome annotation by integrating the latest reference data and algorithms, machine learning techniques for noise filtering, and new protein structure prediction methods. CHESS 3 contains 41,356 genes, including 19,839 protein-coding genes and 158,377 transcripts, with 14,863 protein-coding transcripts not in other catalogs.

View Article and Find Full Text PDF