Publications by Aron Marchler-Bauer

Publications by authors named "Aron Marchler-Bauer"

Page 1 of 3

InterPro: the protein sequence classification resource in 2025.

Matthias Blum Antonina Andreeva Laise Cavalcanti Florentino Sara Rocio Chuguransky Tiago Grego Aron Marchler-Bauer

Nucleic Acids Res

November 2024

InterPro (https://www.ebi.ac.

View Article and Find Full Text PDF

NCBI RefSeq: reference sequence standards through 25 years of curation and annotation.

Tamara Goldfarb Vamsi K Kodali Shashikant Pujar Vyacheslav Brover Barbara Robbertse Aron Marchler-Bauer

Nucleic Acids Res

November 2024

Reference sequences and annotations serve as the foundation for many lines of research today, from organism and sequence identification to providing a core description of the genes, transcripts and proteins found in an organism's genome. Interpretation of data including transcriptomics, proteomics, sequence variation and comparative analyses based on reference gene annotations informs our understanding of gene function and possible disease mechanisms, leading to new biomedical discoveries. The Reference Sequence (RefSeq) resource created at the National Center for Biotechnology Information (NCBI) leverages both automatic processes and expert curation to create a robust set of reference sequences of genomic, transcript and protein data spanning the tree of life.

View Article and Find Full Text PDF

Database resources of the National Center for Biotechnology Information in 2025.

Eric W Sayers Jeffrey Beck Evan E Bolton J Rodney Brister Jessica Chan Aron Marchler-Bauer

Nucleic Acids Res

November 2024

The National Center for Biotechnology Information (NCBI) provides online information resources for biology, including the GenBank® nucleic acid sequence repository and the PubMed® repository of citations and abstracts published in life science journals. NCBI provides search and retrieval operations for most of these data from 31 distinct repositories and knowledgebases. The E-utilities serve as the programming interface for most of these.

View Article and Find Full Text PDF

Database resources of the National Center for Biotechnology Information.

Eric W Sayers Jeff Beck Evan E Bolton J Rodney Brister Jessica Chan Aron Marchler-Bauer

Nucleic Acids Res

January 2024

The National Center for Biotechnology Information (NCBI) provides online information resources for biology, including the GenBank® nucleic acid sequence database and the PubMed® database of citations and abstracts published in life science journals. NCBI provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for most of these databases.

View Article and Find Full Text PDF

The NIH Comparative Genomics Resource: addressing the promises and challenges of comparative genomics on human health.

Kristin Bornstein Gary Gryan E Sally Chang Aron Marchler-Bauer Valerie A Schneider

BMC Genomics

September 2023

Comparative genomics is the comparison of genetic information within and across organisms to understand the evolution, structure, and function of genes, proteins, and non-coding regions (Sivashankari and Shanmughavel, Bioinformation 1:376-8, 2007). Advances in sequencing technology and assembly algorithms have resulted in the ability to sequence large genomes and provided a wealth of data that are being used in comparative genomic analyses. Comparative analysis can be leveraged to systematically explore and evaluate the biological relationships and evolution between species, aid in understanding the structure and function of genes, and gain a better understanding of disease and potential drug targets.

View Article and Find Full Text PDF

The conserved domain database in 2023.

Jiyao Wang Farideh Chitsaz Myra K Derbyshire Noreen R Gonzales Marc Gwadz Aron Marchler-Bauer

Nucleic Acids Res

January 2023

NLM's conserved domain database (CDD) is a collection of protein domain and protein family models constructed as multiple sequence alignments. Its main purpose is to provide annotation for protein and translated nucleotide sequences with the location of domain footprints and associated functional sites, and to define protein domain architecture as a basis for assigning gene product names and putative/predicted function. CDD has been available publicly for over 20 years and has grown substantially during that time.

View Article and Find Full Text PDF

Database resources of the National Center for Biotechnology Information in 2023.

Eric W Sayers Evan E Bolton J Rodney Brister Kathi Canese Jessica Chan Aron Marchler-Bauer

Nucleic Acids Res

January 2023

View Article and Find Full Text PDF

InterPro in 2022.

Typhaine Paysan-Lafosse Matthias Blum Sara Chuguransky Tiago Grego Beatriz Lázaro Pinto Aron Marchler-Bauer

Nucleic Acids Res

January 2023

The InterPro database (https://www.ebi.ac.

View Article and Find Full Text PDF

A roadmap for the functional annotation of protein families: a community perspective.

Valérie de Crécy-Lagard Rocio Amorin de Hegedus Cecilia Arighi Jill Babor Alex Bateman Aron Marchler-Bauer

Database (Oxford)

August 2022

Over the last 25 years, biology has entered the genomic era and is becoming a science of 'big data'. Most interpretations of genomic analyses rely on accurate functional annotations of the proteins encoded by more than 500 000 genomes sequenced to date. By different estimates, only half the predicted sequenced proteins carry an accurate functional annotation, and this percentage varies drastically between different organismal lineages.

View Article and Find Full Text PDF

Quantifying the immunological distinctiveness of emerging SARS-CoV-2 variants in the context of prior regional herd exposure.

Michiel J M Niesen Karthik Murugadoss Patrick J Lenehan Aron Marchler-Bauer Jiyao Wang

PNAS Nexus

July 2022

The COVID-19 pandemic has seen the persistent emergence of immune-evasive SARS-CoV-2 variants under the selection pressure of natural and vaccination-acquired immunity. However, it is currently challenging to quantify how immunologically distinct a new variant is compared to all the prior variants to which a population has been exposed. Here, we define "Distinctiveness" of SARS-CoV-2 sequences based on a proteome-wide comparison with all prior sequences from the same geographical region.

View Article and Find Full Text PDF

iCn3D: From Web-Based 3D Viewer to Structural Analysis Tool in Batch Mode.

Jiyao Wang Philippe Youkharibache Aron Marchler-Bauer Christopher Lanczycki Dachuan Zhang

Front Mol Biosci

February 2022

iCn3D was initially developed as a web-based 3D molecular viewer. It then evolved from visualization into a full-featured interactive structural analysis software. It became a collaborative research instrument through the sharing of permanent, shortened URLs that encapsulate not only annotated visual molecular scenes, but also all underlying data and analysis scripts in a FAIR manner.

View Article and Find Full Text PDF

Database resources of the national center for biotechnology information.

Eric W Sayers Evan E Bolton J Rodney Brister Kathi Canese Jessica Chan Aron Marchler-Bauer

Nucleic Acids Res

January 2022

The National Center for Biotechnology Information (NCBI) produces a variety of online information resources for biology, including the GenBank® nucleic acid sequence database and the PubMed® database of citations and abstracts published in life science journals. NCBI provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for the most of these databases.

View Article and Find Full Text PDF

RefSeq: expanding the Prokaryotic Genome Annotation Pipeline reach with protein family model curation.

Wenjun Li Kathleen R O'Neill Daniel H Haft Michael DiCuccio Vyacheslav Chetvernin Aron Marchler-Bauer

Nucleic Acids Res

January 2021

The Reference Sequence (RefSeq) project at the National Center for Biotechnology Information (NCBI) contains nearly 200 000 bacterial and archaeal genomes and 150 million proteins with up-to-date annotation. Changes in the Prokaryotic Genome Annotation Pipeline (PGAP) since 2018 have resulted in a substantial reduction in spurious annotation. The hierarchical collection of protein family models (PFMs) used by PGAP as evidence for structural and functional annotation was expanded to over 35 000 protein profile hidden Markov models (HMMs), 12 300 BlastRules and 36 000 curated CDD architectures.

View Article and Find Full Text PDF

The InterPro protein families and domains database: 20 years on.

Matthias Blum Hsin-Yu Chang Sara Chuguransky Tiago Grego Swaathi Kandasaamy Aron Marchler-Bauer

Nucleic Acids Res

January 2021

The InterPro database (https://www.ebi.ac.

View Article and Find Full Text PDF

Database resources of the National Center for Biotechnology Information.

Eric W Sayers Jeffrey Beck Evan E Bolton Devon Bourexis James R Brister Aron Marchler-Bauer

Nucleic Acids Res

January 2021

The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed® database of citations and abstracts published in life science journals. The Entrez system provides search and retrieval operations for most of these data from 34 distinct databases. The E-utilities serve as the programming interface for the Entrez system.

View Article and Find Full Text PDF

Obtaining extremely large and accurate protein multiple sequence alignments from curated hierarchical alignments.

Andrew F Neuwald Christopher J Lanczycki Theresa K Hodges Aron Marchler-Bauer

Database (Oxford)

January 2020

For optimal performance, machine learning methods for protein sequence/structural analysis typically require as input a large multiple sequence alignment (MSA), which is often created using query-based iterative programs, such as PSI-BLAST or JackHMMER. However, because these programs align database sequences using a query sequence as a template, they may fail to detect or may tend to misalign sequences distantly related to the query. More generally, automated MSA programs often fail to align sequences correctly due to the unpredictable nature of protein evolution.

View Article and Find Full Text PDF

Biological Assembly Comparison with VAST.

Thomas Madej Aron Marchler-Bauer Christopher Lanczycki Dachuan Zhang Stephen H Bryant

Methods Mol Biol

January 2021

The VAST+ algorithm is an efficient, simple, and elegant solution to the problem of comparing the atomic structures of biological assemblies. Given two protein assemblies, it takes as input all the pairwise structural alignments of the component proteins. It then clusters the rotation matrices from the pairwise superpositions, with the clusters corresponding to subsets of the two assemblies that may be aligned and well superposed.

View Article and Find Full Text PDF

NCBI's Conserved Domain Database and Tools for Protein Domain Analysis.

Mingzhang Yang Myra K Derbyshire Roxanne A Yamashita Aron Marchler-Bauer

Curr Protoc Bioinformatics

March 2020

The Conserved Domain Database (CDD) is a freely available resource for the annotation of sequences with the locations of conserved protein domain footprints, as well as functional sites and motifs inferred from these footprints. It includes protein domain and protein family models curated in house by CDD staff, as well as imported from a variety of other sources. The latest CDD release (v3.

View Article and Find Full Text PDF

CDD/SPARCLE: the conserved domain database in 2020.

Shennan Lu Jiyao Wang Farideh Chitsaz Myra K Derbyshire Renata C Geer Aron Marchler-Bauer

Nucleic Acids Res

January 2020

As NLM's Conserved Domain Database (CDD) enters its 20th year of operations as a publicly available resource, CDD curation staff continues to develop hierarchical classifications of widely distributed protein domain families, and to record conserved sites associated with molecular function, so that they can be mapped onto user queries in support of hypothesis-driven biomolecular research. CDD offers both an archive of pre-computed domain annotations as well as live search services for both single protein or nucleotide queries and larger sets of protein query sequences. CDD staff has continued to characterize protein families via conserved domain architectures and has built up a significant corpus of curated domain architectures in support of naming bacterial proteins in RefSeq.

View Article and Find Full Text PDF

PubMed Text Similarity Model and its application to curation efforts in the Conserved Domain Database.

Rezarta Islamaj W John Wilbur Natalie Xie Noreen R Gonzales Narmada Thanki Aron Marchler-Bauer

Database (Oxford)

January 2019

This study proposes a text similarity model to help biocuration efforts of the Conserved Domain Database (CDD). CDD is a curated resource that catalogs annotated multiple sequence alignment models for ancient domains and full-length proteins. These models allow for fast searching and quick identification of conserved motifs in protein sequences via Reverse PSI-BLAST.

View Article and Find Full Text PDF

iCn3D, a web-based 3D viewer for sharing 1D/2D/3D representations of biomolecular structures.

Jiyao Wang Philippe Youkharibache Dachuan Zhang Christopher J Lanczycki Renata C Geer Aron Marchler-Bauer

Bioinformatics

January 2020

Motivation: Build a web-based 3D molecular structure viewer focusing on interactive structural analysis.

Results: iCn3D (I-see-in-3D) can simultaneously show 3D structure, 2D molecular contacts and 1D protein and nucleotide sequences through an integrated sequence/annotation browser. Pre-defined and arbitrary molecular features can be selected in any of the 1D/2D/3D windows as sets of residues and these selections are synchronized dynamically in all displays.

View Article and Find Full Text PDF

InterPro in 2019: improving coverage, classification and access to protein sequence annotations.

Alex L Mitchell Teresa K Attwood Patricia C Babbitt Matthias Blum Peer Bork Aron Marchler-Bauer

Nucleic Acids Res

January 2019

The InterPro database (http://www.ebi.ac.

View Article and Find Full Text PDF

Database resources of the National Center for Biotechnology Information.

Eric W Sayers Richa Agarwala Evan E Bolton J Rodney Brister Kathi Canese Aron Marchler-Bauer

Nucleic Acids Res

January 2019

The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed database of citations and abstracts published in life science journals. The Entrez system provides search and retrieval operations for most of these data from 38 distinct databases. The E-utilities serve as the programming interface for the Entrez system.

View Article and Find Full Text PDF

RefSeq: an update on prokaryotic genome annotation and curation.

Daniel H Haft Michael DiCuccio Azat Badretdin Vyacheslav Brover Vyacheslav Chetvernin Aron Marchler-Bauer

Nucleic Acids Res

January 2018

The Reference Sequence (RefSeq) project at the National Center for Biotechnology Information (NCBI) provides annotation for over 95 000 prokaryotic genomes that meet standards for sequence quality, completeness, and freedom from contamination. Genomes are annotated by a single Prokaryotic Genome Annotation Pipeline (PGAP) to provide users with a resource that is as consistent and accurate as possible. Notable recent changes include the development of a hierarchical evidence scheme, a new focus on curating annotation evidence sources, the addition and curation of protein profile hidden Markov models (HMMs), release of an updated pipeline (PGAP-4), and comprehensive re-annotation of RefSeq prokaryotic genomes.

View Article and Find Full Text PDF

CDD/SPARCLE: functional classification of proteins via subfamily domain architectures.

Aron Marchler-Bauer Yu Bo Lianyi Han Jane He Christopher J Lanczycki

Nucleic Acids Res

January 2017

NCBI's Conserved Domain Database (CDD) aims at annotating biomolecular sequences with the location of evolutionarily conserved protein domain footprints, and functional sites inferred from such footprints. An archive of pre-computed domain annotation is maintained for proteins tracked by NCBI's Entrez database, and live search services are offered as well. CDD curation staff supplements a comprehensive collection of protein domain and protein family models, which have been imported from external providers, with representations of selected domain families that are curated in-house and organized into hierarchical classifications of functionally distinct families and sub-families.

View Article and Find Full Text PDF

Publications by authors named "Aron Marchler-Bauer"

InterPro: the protein sequence classification resource in 2025.

NCBI RefSeq: reference sequence standards through 25 years of curation and annotation.

Database resources of the National Center for Biotechnology Information in 2025.

Database resources of the National Center for Biotechnology Information.

The NIH Comparative Genomics Resource: addressing the promises and challenges of comparative genomics on human health.

The conserved domain database in 2023.

Database resources of the National Center for Biotechnology Information in 2023.

InterPro in 2022.

A roadmap for the functional annotation of protein families: a community perspective.

Quantifying the immunological distinctiveness of emerging SARS-CoV-2 variants in the context of prior regional herd exposure.

iCn3D: From Web-Based 3D Viewer to Structural Analysis Tool in Batch Mode.

Database resources of the national center for biotechnology information.

RefSeq: expanding the Prokaryotic Genome Annotation Pipeline reach with protein family model curation.

The InterPro protein families and domains database: 20 years on.

Database resources of the National Center for Biotechnology Information.

Obtaining extremely large and accurate protein multiple sequence alignments from curated hierarchical alignments.

Biological Assembly Comparison with VAST.

NCBI's Conserved Domain Database and Tools for Protein Domain Analysis.

CDD/SPARCLE: the conserved domain database in 2020.

PubMed Text Similarity Model and its application to curation efforts in the Conserved Domain Database.

iCn3D, a web-based 3D viewer for sharing 1D/2D/3D representations of biomolecular structures.

InterPro in 2019: improving coverage, classification and access to protein sequence annotations.

Database resources of the National Center for Biotechnology Information.

RefSeq: an update on prokaryotic genome annotation and curation.

CDD/SPARCLE: functional classification of proteins via subfamily domain architectures.

A PHP Error was encountered

A PHP Error was encountered