SHEPHARD: a modular and extensible software architecture for analyzing and annotating large protein datasets.

Bioinformatics

Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, 660 South Euclid Avenue, Saint Louis, MO 63110, United States.

Published: August 2023

AI Article Synopsis

  • High-throughput experiments and computational predictions have greatly increased protein sequence annotations, but managing and analyzing this complex data is challenging and limits accessibility to bioinformatics.* -
  • SHEPHARD is a Python framework designed to simplify large-scale protein bioinformatics, using an object-oriented structure and database features for efficient data annotation and analysis.* -
  • SHEPHARD is available as standalone software and a Google Colab notebook, making it user-friendly and accessible for exploring large protein datasets and answering key biological questions.*

Article Abstract

Motivation: The emergence of high-throughput experiments and high-resolution computational predictions has led to an explosion in the quality and volume of protein sequence annotations at proteomic scales. Unfortunately, sanity checking, integrating, and analyzing complex sequence annotations remains logistically challenging and introduces a major barrier to entry for even superficial integrative bioinformatics.

Results: To address this technical burden, we have developed SHEPHARD, a Python framework that trivializes large-scale integrative protein bioinformatics. SHEPHARD combines an object-oriented hierarchical data structure with database-like features, enabling programmatic annotation, integration, and analysis of complex datatypes. Importantly SHEPHARD is easy to use and enables a Pythonic interrogation of largescale protein datasets with millions of unique annotations. We use SHEPHARD to examine three orthogonal proteome-wide questions relating protein sequence to molecular function, illustrating its ability to uncover novel biology.

Availability And Implementation: We provided SHEPHARD as both a stand-alone software package (https://github.com/holehouse-lab/shephard), and as a Google Colab notebook with a collection of precomputed proteome-wide annotations (https://github.com/holehouse-lab/shephard-colab).

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10423030PMC
http://dx.doi.org/10.1093/bioinformatics/btad488DOI Listing

Publication Analysis

Top Keywords

protein datasets
8
protein sequence
8
sequence annotations
8
shephard
6
protein
5
shephard modular
4
modular extensible
4
extensible software
4
software architecture
4
architecture analyzing
4

Similar Publications

Chromosome-level reference genome and annotation of the Arctic fish Anisarchus medius.

Sci Data

January 2025

State Key Laboratory of Mariculture Breeding, Key Laboratory of Marine Biotechnology of Fujian Province, Institute of Oceanology, College of Marine Sciences, Haixia Institute of Science and Technology, Fujian Agriculture and Forestry University, Fuzhou, 350002, China.

Anisarchus medius (Reinhardt, 1837) is a widely distributed Arctic fish, serving as an indicator of climate change impacts on coastal Arctic ecosystems. This study presents a chromosome-level genome assembly for A. medius using PacBio sequencing and Hi-C technology.

View Article and Find Full Text PDF

Ovarian cancer (OC) is a malignant gynecological cancer with an extremely poor prognosis. Stress granules (SGs) are non-membrane organelles that respond to stressors; however, the correlation between SG-related genes and the prognosis of OC remains unclear. This systematic analysis aimed to determine the expression levels of SG-related genes between high- and low-risk groups of patients with OC and to explore the prognostic value of these genes.

View Article and Find Full Text PDF

Utilizing network pharmacology and molecular docking, we evaluated the possible pharmacological mechanism of Danggui Sini Decoction (DGSND) for treating erectile dysfunction (ED). DGSND's chemical components and targets were found utilizing the Traditional Chinese Medicine Systems Pharmacology Database and Analysis Platform (TCMSP). Disease-related genes associated with ED were identified through GeneCards, OMIM, TTD, DrugBank, and DisGeNET databases.

View Article and Find Full Text PDF

Progress of machine learning in the application of small molecule druggability prediction.

Eur J Med Chem

January 2025

Institute of Translational Medicine, School of Medicine, Yangzhou University, Yangzhou, 225009, China. Electronic address:

Machine learning (ML) has become an important tool for predicting the pharmaceutical properties of small molecules. Recent advancements in ML algorithms enable the rapid and accurate evaluation of solubility, activity, toxicity, pharmacokinetics, and other molecular properties through ML-based models. By conducting virtual screening of drug targets and elucidating drug-target protein interactions, researchers can conduct preliminary evaluations of the activity and safety of compounds from the ultra-large drug compound libraries, thereby accelerating the screening process for lead compounds.

View Article and Find Full Text PDF

Utilising bioinformatics and systems biology methods to uncover the impact of dermatomyositis on interstitial lung disease.

Clin Exp Rheumatol

January 2025

Department of Oncology and Vascular Interventional Radiology, Zhongshan Hospital Xiamen University, School of Medicine, Xiamen University, Xiamen, Fujian, China.

Objectives: Dermatomyositis (DM) is frequently associated with interstitial lung disease (ILD); however, the molecular mechanisms underlying this association remain unclear. This study aimed to employ bioinformatics approaches to identify potential molecular mechanisms linking DM and ILD.

Methods: GSE46239 and GSE47162 were analysed to identify common differentially expressed genes (DEGs).

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!