Optimizing high performance computing workflow for protein functional annotation.

Concurr Comput

Bioinformatics & High-throughput Analysis Laboratory, SCRI, High-throughput Analysis Core, SCRI, Predicitive Analytics, Seattle Children's Hospital, Departments of Pediatrics and Biomedical Informatics & Medical Education, University of Washington, DELSA Global.

Published: September 2014

Functional annotation of newly sequenced genomes is one of the major challenges in modern biology. With modern sequencing technologies, the protein sequence universe is rapidly expanding. Newly sequenced bacterial genomes alone contain over 7.5 million proteins. The rate of data generation has far surpassed that of protein annotation. The volume of protein data makes manual curation infeasible, whereas a high compute cost limits the utility of existing automated approaches. In this work, we present an improved and optmized automated workflow to enable large-scale protein annotation. The workflow uses high performance computing architectures and a low complexity classification algorithm to assign proteins into existing clusters of orthologous groups of proteins. On the basis of the Position-Specific Iterative Basic Local Alignment Search Tool the algorithm ensures at least 80% specificity and sensitivity of the resulting classifications. The workflow utilizes highly scalable parallel applications for classification and sequence alignment. Using Extreme Science and Engineering Discovery Environment supercomputers, the workflow processed 1,200,000 newly sequenced bacterial proteins. With the rapid expansion of the protein sequence universe, the proposed workflow will enable scientists to annotate big genome data.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4194055PMC
http://dx.doi.org/10.1002/cpe.3264DOI Listing

Publication Analysis

Top Keywords

newly sequenced
12
high performance
8
performance computing
8
functional annotation
8
protein sequence
8
sequence universe
8
sequenced bacterial
8
protein annotation
8
workflow
6
protein
6

Similar Publications

Objectives: We investigated the prevalence of drug resistance mutations (DRMs) in individuals newly diagnosed with HIV-1 in Estonia in 2020 and 2022, and in Ukrainian war refugees living with HIV who arrived in Estonia in 2022.

Methods: HIV-1 genomic RNA was sequenced in protease-reverse transcriptase and integrase regions. DRMs were determined separately by Stanford University CPR Tool and HIVdb Program.

View Article and Find Full Text PDF

Introduction: POT1 tumor predisposition (POT1-TPD) is an autosomal dominant disorder characterized by increased lifetime malignancy risk. Melanoma, angiosarcoma, and chronic lymphocytic leukemia are the most frequently reported malignancies [1]. Protection of telomeres protein 1 (POT1) is part of the shelterin protein complex to maintain/protect telomeres [2].

View Article and Find Full Text PDF

P3 site-directed mutagenesis: An efficient method based on primer pairs with 3'-overhangs.

J Biol Chem

January 2025

Rosalind and Morris Goodman Cancer Institute, McGill University, Montreal, Quebec H3A 1A3, Canada; Department of Medicine, McGill University, Montreal, Quebec H3A 1A3, Canada; Department of Biochemistry, McGill University, Montreal, Quebec H3A 1A3, Canada; McGill University Health Center, Montreal, Quebec H3A 1A3, Canada. Electronic address:

Site-directed mutagenesis is a fundamental tool indispensable for protein and plasmid engineering. An important technological question is how to achieve the efficiency at the ideal level of 100%. Based on complementary primer pairs, the QuickChange method has been widely used, but it requires significant improvements due to its low efficiency and frequent unwanted mutations.

View Article and Find Full Text PDF

Unveiling triclosan biodegradation: Novel metabolic pathways, genomic insights, and global environmental adaptability of Pseudomonas sp. strain W03.

J Hazard Mater

January 2025

Marine Synthetic Ecology Research Center, Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), School of Marine Science, Sun Yat-sen University, Zhuhai 519080, China. Electronic address:

The polychlorinated aromatic antimicrobial agent triclosan (TCS) is widely used to indiscriminately and rapidly kill microorganisms. The global use of TCS has led to widespread environmental contamination, posing significant threats to ecosystem and human health. Here we reported a newly isolated Pseudomonas sp.

View Article and Find Full Text PDF

Genomic sources from China are underrepresented in the population-specific reference database. We performed whole-genome sequencing or genome-wide genotyping on 1,207 individuals from four linguistically diverse groups (1,081 Sinitic, 56 Mongolic, 40 Turkic, and 30 Tibeto-Burman people) living in North China included in the 10K Chinese People Genomic Diversity Project (10K_CPGDP) to characterize the genetic architecture and adaptative history of ethnic groups in the Silk Road Region of China. We observed a population split between Northwest Chinese minorities (NWCMs) and Han Chinese since the Upper Paleolithic and later Neolithic genetic differentiation within NWCMs.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!