Functional annotation of newly sequenced genomes is one of the major challenges in modern biology. With modern sequencing technologies, the protein sequence universe is rapidly expanding. Newly sequenced bacterial genomes alone contain over 7.5 million proteins. The rate of data generation has far surpassed that of protein annotation. The volume of protein data makes manual curation infeasible, whereas a high compute cost limits the utility of existing automated approaches. In this work, we present an improved and optmized automated workflow to enable large-scale protein annotation. The workflow uses high performance computing architectures and a low complexity classification algorithm to assign proteins into existing clusters of orthologous groups of proteins. On the basis of the Position-Specific Iterative Basic Local Alignment Search Tool the algorithm ensures at least 80% specificity and sensitivity of the resulting classifications. The workflow utilizes highly scalable parallel applications for classification and sequence alignment. Using Extreme Science and Engineering Discovery Environment supercomputers, the workflow processed 1,200,000 newly sequenced bacterial proteins. With the rapid expansion of the protein sequence universe, the proposed workflow will enable scientists to annotate big genome data.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4194055 | PMC |
http://dx.doi.org/10.1002/cpe.3264 | DOI Listing |
J Glob Antimicrob Resist
January 2025
Faculty of Medicine, Department of Microbiology, University of Tartu, Tartu, Estonia.
Objectives: We investigated the prevalence of drug resistance mutations (DRMs) in individuals newly diagnosed with HIV-1 in Estonia in 2020 and 2022, and in Ukrainian war refugees living with HIV who arrived in Estonia in 2022.
Methods: HIV-1 genomic RNA was sequenced in protease-reverse transcriptase and integrase regions. DRMs were determined separately by Stanford University CPR Tool and HIVdb Program.
Cancer Genet
January 2025
Cincinnati Children's Hospital Medical Center, Division of Oncology, Cincinnati, OH, USA; University of Cincinnati College of Medicine, Cincinnati, OH, USA. Electronic address:
Introduction: POT1 tumor predisposition (POT1-TPD) is an autosomal dominant disorder characterized by increased lifetime malignancy risk. Melanoma, angiosarcoma, and chronic lymphocytic leukemia are the most frequently reported malignancies [1]. Protection of telomeres protein 1 (POT1) is part of the shelterin protein complex to maintain/protect telomeres [2].
View Article and Find Full Text PDFJ Biol Chem
January 2025
Rosalind and Morris Goodman Cancer Institute, McGill University, Montreal, Quebec H3A 1A3, Canada; Department of Medicine, McGill University, Montreal, Quebec H3A 1A3, Canada; Department of Biochemistry, McGill University, Montreal, Quebec H3A 1A3, Canada; McGill University Health Center, Montreal, Quebec H3A 1A3, Canada. Electronic address:
Site-directed mutagenesis is a fundamental tool indispensable for protein and plasmid engineering. An important technological question is how to achieve the efficiency at the ideal level of 100%. Based on complementary primer pairs, the QuickChange method has been widely used, but it requires significant improvements due to its low efficiency and frequent unwanted mutations.
View Article and Find Full Text PDFJ Hazard Mater
January 2025
Marine Synthetic Ecology Research Center, Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), School of Marine Science, Sun Yat-sen University, Zhuhai 519080, China. Electronic address:
The polychlorinated aromatic antimicrobial agent triclosan (TCS) is widely used to indiscriminately and rapidly kill microorganisms. The global use of TCS has led to widespread environmental contamination, posing significant threats to ecosystem and human health. Here we reported a newly isolated Pseudomonas sp.
View Article and Find Full Text PDFSci China Life Sci
January 2025
Institute of Rare Diseases, West China Hospital of Sichuan University, Sichuan University, Chengdu, 610000, China.
Genomic sources from China are underrepresented in the population-specific reference database. We performed whole-genome sequencing or genome-wide genotyping on 1,207 individuals from four linguistically diverse groups (1,081 Sinitic, 56 Mongolic, 40 Turkic, and 30 Tibeto-Burman people) living in North China included in the 10K Chinese People Genomic Diversity Project (10K_CPGDP) to characterize the genetic architecture and adaptative history of ethnic groups in the Silk Road Region of China. We observed a population split between Northwest Chinese minorities (NWCMs) and Han Chinese since the Upper Paleolithic and later Neolithic genetic differentiation within NWCMs.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!