Publications by Haft D

Publications by authors named "Haft D"

Page 1 of 6

InterPro: the protein sequence classification resource in 2025.

Matthias Blum Antonina Andreeva Laise Cavalcanti Florentino Sara Rocio Chuguransky Tiago Grego

Nucleic Acids Res

January 2025

InterPro (https://www.ebi.ac.

View Article and Find Full Text PDF

discovery of the myxosortases that process MYXO-CTERM and three novel prokaryotic C-terminal protein-sorting signals that share invariant Cys residues.

Daniel H Haft

J Bacteriol

January 2024

The LPXTG protein-sorting signal, found in surface proteins of various Gram-positive pathogens, was the founding member of a growing panel of prokaryotic small C-terminal sorting domains. Sortase A cleaves LPXTG, exosortases (XrtA and XrtB) cleave the PEP-CTERM sorting signal, archaeosortase A cleaves PGF-CTERM, and rhombosortase cleaves GlyGly-CTERM domains. Four sorting signal domains without previously known processing proteases are the MYXO-CTERM, JDVT-CTERM, Synerg-CTERM, and CGP-CTERM domains.

View Article and Find Full Text PDF

RefSeq and the prokaryotic genome annotation pipeline in the age of metagenomes.

Daniel H Haft Azat Badretdin George Coulouris Michael DiCuccio A Scott Durkin

Nucleic Acids Res

January 2024

The Reference Sequence (RefSeq) project at the National Center for Biotechnology Information (NCBI) contains over 315 000 bacterial and archaeal genomes and 236 million proteins with up-to-date and consistent annotation. In the past 3 years, we have expanded the diversity of the RefSeq collection by including the best quality metagenome-assembled genomes (MAGs) submitted to INSDC (DDBJ, ENA and GenBank), while maintaining its quality by adding validation checks. Assemblies are now more stringently evaluated for contamination and for completeness of annotation prior to acceptance into RefSeq.

View Article and Find Full Text PDF

Folding the unfoldable: using AlphaFold to explore spurious proteins.

Vivian Monzon Daniel H Haft Alex Bateman

Bioinform Adv

January 2022

Motivation: The release of AlphaFold 2.0 has revolutionized our ability to determine protein structures from sequences. This tool also inadvertently opens up many unanticipated opportunities.

View Article and Find Full Text PDF

Eight Unexpected Selenoprotein Families in Organometallic Biochemistry in Clostridium difficile, in ABC Transport, and in Methylmercury Biosynthesis.

Daniel H Haft Marc Gwadz

J Bacteriol

January 2023

The bioinformatics of a nine-gene locus, designated selenocysteine-assisted organometallic (SAO), was investigated after identifying six new selenoprotein families and constructing hidden Markov models (HMMs) that find and annotate members of those families. Four are selenoproteins in most SAO loci, including Clostridium difficile. They include two ABC transporter subunits, namely, permease SaoP, with selenocysteine (U) at the channel-gating position, and substrate-binding subunit SaoB.

View Article and Find Full Text PDF

InterPro in 2022.

Typhaine Paysan-Lafosse Matthias Blum Sara Chuguransky Tiago Grego Beatriz Lázaro Pinto

Nucleic Acids Res

January 2023

The InterPro database (https://www.ebi.ac.

View Article and Find Full Text PDF

Curation of the AMRFinderPlus databases: applications, functionality and impact.

Michael Feldgarden Vyacheslav Brover Boris Fedorov Daniel H Haft Arjun B Prasad

Microb Genom

June 2022

Antimicrobial resistance (AMR) is a significant public health threat. Low-cost whole-genome sequencing, which is often used in surveillance programmes, provides an opportunity to assess AMR gene content in these genomes using approaches. A variety of bioinformatic tools have been developed to identify these genomic elements.

View Article and Find Full Text PDF

Consensus on β-Lactamase Nomenclature.

Patricia A Bradford Robert A Bonomo Karen Bush Alessandra Carattoli Michael Feldgarden

Antimicrob Agents Chemother

April 2022

Assigning names to β-lactamase variants has been inconsistent and has led to confusion in the published literature. The common availability of whole genome sequencing has resulted in an exponential growth in the number of new β-lactamase genes. In November 2021 an international group of β-lactamase experts met virtually to develop a consensus for the way naturally-occurring β-lactamase genes should be named.

View Article and Find Full Text PDF

AMRFinderPlus and the Reference Gene Catalog facilitate examination of the genomic links among antimicrobial resistance, stress response, and virulence.

Michael Feldgarden Vyacheslav Brover Narjol Gonzalez-Escalona Jonathan G Frye Julie Haendiges

Sci Rep

June 2021

Antimicrobial resistance (AMR) is a significant public health threat. With the rise of affordable whole genome sequencing, in silico approaches to assessing AMR gene content can be used to detect known resistance mechanisms and potentially identify novel mechanisms. To enable accurate assessment of AMR gene content, as part of a multi-agency collaboration, NCBI developed a comprehensive AMR gene database, the Bacterial Antimicrobial Resistance Reference Gene Database and the AMR gene detection tool AMRFinder.

View Article and Find Full Text PDF

RefSeq: expanding the Prokaryotic Genome Annotation Pipeline reach with protein family model curation.

Wenjun Li Kathleen R O'Neill Daniel H Haft Michael DiCuccio Vyacheslav Chetvernin

Nucleic Acids Res

January 2021

The Reference Sequence (RefSeq) project at the National Center for Biotechnology Information (NCBI) contains nearly 200 000 bacterial and archaeal genomes and 150 million proteins with up-to-date annotation. Changes in the Prokaryotic Genome Annotation Pipeline (PGAP) since 2018 have resulted in a substantial reduction in spurious annotation. The hierarchical collection of protein family models (PFMs) used by PGAP as evidence for structural and functional annotation was expanded to over 35 000 protein profile hidden Markov models (HMMs), 12 300 BlastRules and 36 000 curated CDD architectures.

View Article and Find Full Text PDF

The InterPro protein families and domains database: 20 years on.

Matthias Blum Hsin-Yu Chang Sara Chuguransky Tiago Grego Swaathi Kandasaamy

Nucleic Acids Res

January 2021

The InterPro database (https://www.ebi.ac.

View Article and Find Full Text PDF

Erratum for Feldgarden et al., "Validating the AMRFinder Tool and Resistance Gene Database by Using Antimicrobial Resistance Genotype-Phenotype Correlations in a Collection of Isolates".

Michael Feldgarden Vyacheslav Brover Daniel H Haft Arjun B Prasad Douglas J Slotta

Antimicrob Agents Chemother

March 2020

View Article and Find Full Text PDF

Evolutionary classification of CRISPR-Cas systems: a burst of class 2 and derived variants.

Kira S Makarova Yuri I Wolf Jaime Iranzo Sergey A Shmakov Omer S Alkhnbashi

Nat Rev Microbiol

February 2020

The number and diversity of known CRISPR-Cas systems have substantially increased in recent years. Here, we provide an updated evolutionary classification of CRISPR-Cas systems and cas genes, with an emphasis on the major developments that have occurred since the publication of the latest classification, in 2015. The new classification includes 2 classes, 6 types and 33 subtypes, compared with 5 types and 16 subtypes in 2015.

View Article and Find Full Text PDF

A Standard Numbering Scheme for Class C β-Lactamases.

Andrew R Mack Melissa D Barnes Magdalena A Taracila Andrea M Hujer Kristine M Hujer

Antimicrob Agents Chemother

February 2020

Unlike for classes A and B, a standardized amino acid numbering scheme has not been proposed for the class C (AmpC) β-lactamases, which complicates communication in the field. Here, we propose a scheme developed through a collaborative approach that considers both sequence and structure, preserves traditional numbering of catalytically important residues (Ser, Lys, Tyr, and Lys), is adaptable to new variants or enzymes yet to be discovered and includes a variation for genetic and epidemiological applications.

View Article and Find Full Text PDF

Validating the AMRFinder Tool and Resistance Gene Database by Using Antimicrobial Resistance Genotype-Phenotype Correlations in a Collection of Isolates.

Michael Feldgarden Vyacheslav Brover Daniel H Haft Arjun B Prasad Douglas J Slotta

Antimicrob Agents Chemother

November 2019

Antimicrobial resistance (AMR) is a major public health problem that requires publicly available tools for rapid analysis. To identify AMR genes in whole-genome sequences, the National Center for Biotechnology Information (NCBI) has produced AMRFinder, a tool that identifies AMR genes using a high-quality curated AMR gene reference database. The Bacterial Antimicrobial Resistance Reference Gene Database consists of up-to-date gene nomenclature, a set of hidden Markov models (HMMs), and a curated protein family hierarchy.

View Article and Find Full Text PDF

Uneven distribution of cobamide biosynthesis and dependence in bacteria predicted by comparative genomics.

Amanda N Shelton Erica C Seth Kenny C Mok Andrew W Han Samantha N Jackson

ISME J

March 2019

The vitamin B family of cofactors known as cobamides are essential for a variety of microbial metabolisms. We used comparative genomics of 11,000 bacterial species to analyze the extent and distribution of cobamide production and use across bacteria. We find that 86% of bacteria in this data set have at least one of 15 cobamide-dependent enzyme families, but only 37% are predicted to synthesize cobamides de novo.

View Article and Find Full Text PDF

InterPro in 2019: improving coverage, classification and access to protein sequence annotations.

Alex L Mitchell Teresa K Attwood Patricia C Babbitt Matthias Blum Peer Bork

Nucleic Acids Res

January 2019

The InterPro database (http://www.ebi.ac.

View Article and Find Full Text PDF

Genome properties in 2019: a new companion database to InterPro for the inference of complete functional attributes.

Lorna J Richardson Neil D Rawlings Gustavo A Salazar Alexandre Almeida David R Haft

Nucleic Acids Res

January 2019

Automatic annotation of protein function is routinely applied to newly sequenced genomes. While this provides a fine-grained view of an organism's functional protein repertoire, proteins, more commonly function in a coordinated manner, such as in pathways or multimeric complexes. Genome Properties (GPs) define such functional entities as a series of steps, originally described by either TIGRFAMs or Pfam entries.

View Article and Find Full Text PDF

Proposal for assignment of allele numbers for mobile colistin resistance (mcr) genes.

Sally R Partridge Vincenzo Di Pilato Yohei Doi Michael Feldgarden Daniel H Haft

J Antimicrob Chemother

October 2018

The initial report of the mcr-1 (mobile colistin resistance) gene has led to many reports of mcr-1 variants and other mcr genes from different bacterial species originating from human, animal and environmental samples in different geographical locations. Resistance gene nomenclature is complex and unfortunately problems such as different names being used for the same gene/protein or the same name being used for different genes/proteins are not uncommon. Registries exist for some families, such as bla (β-lactamase) genes, but there is as yet no agreed nomenclature scheme for mcr genes.

View Article and Find Full Text PDF

Both widespread PEP-CTERM proteins and exopolysaccharides are required for floc formation of Zoogloea resiniphila and other activated sludge bacteria.

Na Gao Ming Xia Jingcheng Dai Dianzhen Yu Weixing An

Environ Microbiol

May 2018

Bacterial floc formation plays a central role in the activated sludge (AS) process, which has been widely utilized for sewage and wastewater treatment. The formation of AS flocs has long been known to require exopolysaccharide biosynthesis. This study demonstrates an additional requirement for a PEP-CTERM protein in Zoogloea resiniphila, a dominant AS bacterium harboring a large exopolysaccharide biosynthesis gene cluster.

View Article and Find Full Text PDF

RefSeq: an update on prokaryotic genome annotation and curation.

Daniel H Haft Michael DiCuccio Azat Badretdin Vyacheslav Brover Vyacheslav Chetvernin

Nucleic Acids Res

January 2018

The Reference Sequence (RefSeq) project at the National Center for Biotechnology Information (NCBI) provides annotation for over 95 000 prokaryotic genomes that meet standards for sequence quality, completeness, and freedom from contamination. Genomes are annotated by a single Prokaryotic Genome Annotation Pipeline (PGAP) to provide users with a resource that is as consistent and accurate as possible. Notable recent changes include the development of a hierarchical evidence scheme, a new focus on curating annotation evidence sources, the addition and curation of protein profile hidden Markov models (HMMs), release of an updated pipeline (PGAP-4), and comprehensive re-annotation of RefSeq prokaryotic genomes.

View Article and Find Full Text PDF

A comprehensive software suite for protein family construction and functional site prediction.

David Renfrew Haft Daniel H Haft

PLoS One

September 2017

In functionally diverse protein families, conservation in short signature regions may outperform full-length sequence comparisons for identifying proteins that belong to a subgroup within which one specific aspect of their function is conserved. The SIMBAL workflow (Sites Inferred by Metabolic Background Assertion Labeling) is a data-mining procedure for finding such signature regions. It begins by using clues from genomic context, such as co-occurrence or conserved gene neighborhoods, to build a useful training set from a large number of uncharacterized but mutually homologous proteins.

View Article and Find Full Text PDF

Mycofactocin-associated mycobacterial dehydrogenases with non-exchangeable NAD cofactors.

Daniel H Haft Phillip G Pierce Stephen J Mayclin Amy Sullivan Anna S Gardberg

Sci Rep

January 2017

Article Synopsis

Mycobacterium tuberculosis (Mtb) survives in the acidic, reactive environment of macrophage phagosomes by utilizing dehydrogenases encoded in its genome, which may help it resist host defenses.
Mycobacterial short chain dehydrogenases/reductases (SDRs) possess a unique insertion at their NAD binding sites that prevents the typical exchange of NAD/NADH, suggesting a different mechanism for their function.
Experiments indicate these SDRs rely on external redox partners instead of cofactor exchange for their catalytic processes, and they are associated with the mftA gene and its corresponding product, which may play a role in this external redox partnership.

View Article and Find Full Text PDF

InterPro in 2017-beyond protein family and domain annotations.

Robert D Finn Teresa K Attwood Patricia C Babbitt Alex Bateman Peer Bork

Nucleic Acids Res

January 2017

InterPro (http://www.ebi.ac.

View Article and Find Full Text PDF

Whole-Genome Sequencing of a Haarlem Extensively Drug-Resistant Mycobacterium tuberculosis Clinical Isolate from Medellín, Colombia.

N Alvarez D Haft U A Hurtado J Robledo F Rouzaud

Genome Announc

June 2016

Colombia is one of the 105 countries that has reported at least one case of extensively drug-resistant tuberculosis (XDR-TB). The Mycobacterium tuberculosis Haarlem genotype is ubiquitous worldwide. Here, we report the high-quality draft genome sequence of a Colombian Haarlem XDR-TB clinical isolate composed of 4,329,127 bp with 4,386 genes.

View Article and Find Full Text PDF