Publications by authors named "Siqian He"

Article Synopsis
  • PubChem is an extensive chemical database managed by the NIH, featuring over 119 million compounds and more than 295 million bioactivities.
  • Recently, it underwent major updates, adding data from 130 new sources and introducing new user-friendly interfaces like the consolidated literature and patent knowledge panels for easier access to related information.
  • Enhancements also include support for non-traditional chemical structures with dedicated web pages, as well as expanded capabilities for exploring relationships between entities using semantic web technologies.
View Article and Find Full Text PDF
Article Synopsis
  • PubChem is a well-known chemical information resource that has undergone significant updates in the last two years, expanding its data offerings from over 120 sources.
  • Key enhancements include the addition of Google Patents data for improved patent coverage, the development of new collections for Cell Line and Taxonomy data, and an updated bioassay data model.
  • New features for programmatic access, like target-centric data downloads and a standardization option for chemical structures in PUG-REST, have also been introduced, along with updates to PubChemRDF.
View Article and Find Full Text PDF

The literature knowledge panels developed and implemented in PubChem are described. These help to uncover and summarize important relationships between chemicals, genes, proteins, and diseases by analyzing co-occurrences of terms in biomedical literature abstracts. Named entities in PubMed records are matched with chemical names in PubChem, disease names in Medical Subject Headings (MeSH), and gene/protein names in popular gene/protein information resources, and the most closely related entities are identified using statistical analysis and relevance-based sampling.

View Article and Find Full Text PDF

PubChem's BioAssay database (https://pubchem.ncbi.nlm.

View Article and Find Full Text PDF

Background: PubChem is an open repository for small molecules and their experimental biological activity. PubChem integrates and provides search, retrieval, visualization, analysis, and programmatic access tools in an effort to maximize the utility of contributed information. There are many diverse chemical structures with similar biological efficacies against targets available in PubChem that are difficult to interrelate using traditional 2-D similarity methods.

View Article and Find Full Text PDF
Article Synopsis
  • The NCBI BioSystems database centralizes various biological systems databases, enhancing their usefulness and accessibility.
  • It integrates pathways and systems into NCBI resources, making it easier for users to navigate biological data.
  • Users can categorize proteins, genes, and small molecules by various criteria like metabolic pathways or disease states without needing extensive background research.
View Article and Find Full Text PDF

NCBI's Conserved Domain Database (CDD) is a collection of multiple sequence alignments and derived database search models, which represent protein domains conserved in molecular evolution. The collection can be accessed at http://www.ncbi.

View Article and Find Full Text PDF

The conserved domain database (CDD) is part of NCBI's Entrez database system and serves as a primary resource for the annotation of conserved domain footprints on protein sequences in Entrez. Entrez's global query interface can be accessed at http://www.ncbi.

View Article and Find Full Text PDF

Three-dimensional (3D) structure is now known for a large fraction of all protein families. Thus, it has become rather likely that one will find a homolog with known 3D structure when searching a sequence database with an arbitrary query sequence. Depending on the extent of similarity, such neighbor relationships may allow one to infer biological function and to identify functional sites such as binding motifs or catalytic centers.

View Article and Find Full Text PDF

The Conserved Domain Database (CDD) is the protein classification component of NCBI's Entrez query and retrieval system. CDD is linked to other Entrez databases such as Proteins, Taxonomy and PubMed, and can be accessed at http://www.ncbi.

View Article and Find Full Text PDF

Three-dimensional structures are now known within most protein families and it is likely, when searching a sequence database, that one will identify a homolog of known structure. The goal of Entrez's 3D-structure database is to make structure information and the functional annotation it can provide easily accessible to molecular biologists. To this end, Entrez's search engine provides several powerful features: (i) links between databases, for example between a protein's sequence and structure; (ii) pre-computed sequence and structure neighbors; and (iii) structure and sequence/structure alignment visualization.

View Article and Find Full Text PDF

The Conserved Domain Database (CDD) is now indexed as a separate database within the Entrez system and linked to other Entrez databases such as MEDLINE(R). This allows users to search for domain types by name, for example, or to view the domain architecture of any protein in Entrez's sequence database. CDD can be accessed on the WorldWideWeb at http://www.

View Article and Find Full Text PDF

Three-dimensional structures are now known within many protein families and it is quite likely, in searching a sequence database, that one will encounter a homolog with known structure. The goal of Entrez's 3D-structure database is to make this information, and the functional annotation it can provide, easily accessible to molecular biologists. To this end Entrez's search engine provides three powerful features.

View Article and Find Full Text PDF