Publications by Marc Gwadz

Publications by authors named "Marc Gwadz"

Page 1 of 1

Eight Unexpected Selenoprotein Families in Organometallic Biochemistry in Clostridium difficile, in ABC Transport, and in Methylmercury Biosynthesis.

J Bacteriol

January 2023

The bioinformatics of a nine-gene locus, designated selenocysteine-assisted organometallic (SAO), was investigated after identifying six new selenoprotein families and constructing hidden Markov models (HMMs) that find and annotate members of those families. Four are selenoproteins in most SAO loci, including Clostridium difficile. They include two ABC transporter subunits, namely, permease SaoP, with selenocysteine (U) at the channel-gating position, and substrate-binding subunit SaoB.

View Article and Find Full Text PDF

The conserved domain database in 2023.

Jiyao Wang Farideh Chitsaz Myra K Derbyshire Noreen R Gonzales Marc Gwadz

Nucleic Acids Res

January 2023

NLM's conserved domain database (CDD) is a collection of protein domain and protein family models constructed as multiple sequence alignments. Its main purpose is to provide annotation for protein and translated nucleotide sequences with the location of domain footprints and associated functional sites, and to define protein domain architecture as a basis for assigning gene product names and putative/predicted function. CDD has been available publicly for over 20 years and has grown substantially during that time.

View Article and Find Full Text PDF

RefSeq: expanding the Prokaryotic Genome Annotation Pipeline reach with protein family model curation.

Wenjun Li Kathleen R O'Neill Daniel H Haft Michael DiCuccio Vyacheslav Chetvernin Marc Gwadz

Nucleic Acids Res

January 2021

The Reference Sequence (RefSeq) project at the National Center for Biotechnology Information (NCBI) contains nearly 200 000 bacterial and archaeal genomes and 150 million proteins with up-to-date annotation. Changes in the Prokaryotic Genome Annotation Pipeline (PGAP) since 2018 have resulted in a substantial reduction in spurious annotation. The hierarchical collection of protein family models (PFMs) used by PGAP as evidence for structural and functional annotation was expanded to over 35 000 protein profile hidden Markov models (HMMs), 12 300 BlastRules and 36 000 curated CDD architectures.

View Article and Find Full Text PDF

CDD/SPARCLE: the conserved domain database in 2020.

Shennan Lu Jiyao Wang Farideh Chitsaz Myra K Derbyshire Renata C Geer Marc Gwadz

Nucleic Acids Res

January 2020

As NLM's Conserved Domain Database (CDD) enters its 20th year of operations as a publicly available resource, CDD curation staff continues to develop hierarchical classifications of widely distributed protein domain families, and to record conserved sites associated with molecular function, so that they can be mapped onto user queries in support of hypothesis-driven biomolecular research. CDD offers both an archive of pre-computed domain annotations as well as live search services for both single protein or nucleotide queries and larger sets of protein query sequences. CDD staff has continued to characterize protein families via conserved domain architectures and has built up a significant corpus of curated domain architectures in support of naming bacterial proteins in RefSeq.

View Article and Find Full Text PDF

RefSeq: an update on prokaryotic genome annotation and curation.

Daniel H Haft Michael DiCuccio Azat Badretdin Vyacheslav Brover Vyacheslav Chetvernin Marc Gwadz

Nucleic Acids Res

January 2018

The Reference Sequence (RefSeq) project at the National Center for Biotechnology Information (NCBI) provides annotation for over 95 000 prokaryotic genomes that meet standards for sequence quality, completeness, and freedom from contamination. Genomes are annotated by a single Prokaryotic Genome Annotation Pipeline (PGAP) to provide users with a resource that is as consistent and accurate as possible. Notable recent changes include the development of a hierarchical evidence scheme, a new focus on curating annotation evidence sources, the addition and curation of protein profile hidden Markov models (HMMs), release of an updated pipeline (PGAP-4), and comprehensive re-annotation of RefSeq prokaryotic genomes.

View Article and Find Full Text PDF

CDD/SPARCLE: functional classification of proteins via subfamily domain architectures.

Aron Marchler-Bauer Yu Bo Lianyi Han Jane He Christopher J Lanczycki Marc Gwadz

Nucleic Acids Res

January 2017

NCBI's Conserved Domain Database (CDD) aims at annotating biomolecular sequences with the location of evolutionarily conserved protein domain footprints, and functional sites inferred from such footprints. An archive of pre-computed domain annotation is maintained for proteins tracked by NCBI's Entrez database, and live search services are offered as well. CDD curation staff supplements a comprehensive collection of protein domain and protein family models, which have been imported from external providers, with representations of selected domain families that are curated in-house and organized into hierarchical classifications of functionally distinct families and sub-families.

View Article and Find Full Text PDF

CDD: NCBI's conserved domain database.

Aron Marchler-Bauer Myra K Derbyshire Noreen R Gonzales Shennan Lu Farideh Chitsaz Marc Gwadz

Nucleic Acids Res

January 2015

NCBI's CDD, the Conserved Domain Database, enters its 15(th) year as a public resource for the annotation of proteins with the location of conserved domain footprints. Going forward, we strive to improve the coverage and consistency of domain annotation provided by CDD. We maintain a live search system as well as an archive of pre-computed domain annotation for sequences tracked in NCBI's Entrez protein database, which can be retrieved for single sequences or in bulk.

View Article and Find Full Text PDF

CDD: conserved domains and protein three-dimensional structure.

Aron Marchler-Bauer Chanjuan Zheng Farideh Chitsaz Myra K Derbyshire Lewis Y Geer Marc Gwadz

Nucleic Acids Res

January 2013

CDD, the Conserved Domain Database, is part of NCBI's Entrez query and retrieval system and is also accessible via http://www.ncbi.nlm.

View Article and Find Full Text PDF

CDD: a Conserved Domain Database for the functional annotation of proteins.

Aron Marchler-Bauer Shennan Lu John B Anderson Farideh Chitsaz Myra K Derbyshire Marc Gwadz

Nucleic Acids Res

January 2011

NCBI's Conserved Domain Database (CDD) is a resource for the annotation of protein sequences with the location of conserved domain footprints, and functional sites inferred from these footprints. CDD includes manually curated domain models that make use of protein 3D structure to refine domain models and provide insights into sequence/structure/function relationships. Manually curated models are organized hierarchically if they describe domain families that are clearly related by common descent.

View Article and Find Full Text PDF

CDD: specific functional annotation with the Conserved Domain Database.

Aron Marchler-Bauer John B Anderson Farideh Chitsaz Myra K Derbyshire Carol DeWeese-Scott Marc Gwadz

Nucleic Acids Res

January 2009

NCBI's Conserved Domain Database (CDD) is a collection of multiple sequence alignments and derived database search models, which represent protein domains conserved in molecular evolution. The collection can be accessed at http://www.ncbi.

View Article and Find Full Text PDF

CDD: a conserved domain database for interactive domain family analysis.

Aron Marchler-Bauer John B Anderson Myra K Derbyshire Carol DeWeese-Scott Noreen R Gonzales Marc Gwadz

Nucleic Acids Res

January 2007

The conserved domain database (CDD) is part of NCBI's Entrez database system and serves as a primary resource for the annotation of conserved domain footprints on protein sequences in Entrez. Entrez's global query interface can be accessed at http://www.ncbi.

View Article and Find Full Text PDF

CDD: a Conserved Domain Database for protein classification.

Aron Marchler-Bauer John B Anderson Praveen F Cherukuri Carol DeWeese-Scott Lewis Y Geer Marc Gwadz

Nucleic Acids Res

January 2005

The Conserved Domain Database (CDD) is the protein classification component of NCBI's Entrez query and retrieval system. CDD is linked to other Entrez databases such as Proteins, Taxonomy and PubMed, and can be accessed at http://www.ncbi.

View Article and Find Full Text PDF

Publications by authors named "Marc Gwadz"

Eight Unexpected Selenoprotein Families in Organometallic Biochemistry in Clostridium difficile, in ABC Transport, and in Methylmercury Biosynthesis.

The conserved domain database in 2023.

RefSeq: expanding the Prokaryotic Genome Annotation Pipeline reach with protein family model curation.

CDD/SPARCLE: the conserved domain database in 2020.

RefSeq: an update on prokaryotic genome annotation and curation.

CDD/SPARCLE: functional classification of proteins via subfamily domain architectures.

CDD: NCBI's conserved domain database.

CDD: conserved domains and protein three-dimensional structure.

CDD: a Conserved Domain Database for the functional annotation of proteins.

CDD: specific functional annotation with the Conserved Domain Database.

CDD: a conserved domain database for interactive domain family analysis.

CDD: a Conserved Domain Database for protein classification.

A PHP Error was encountered

A PHP Error was encountered