PGA: a software package for rapid, accurate, and flexible batch annotation of plastomes.

Plant Methods

1Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, 132 Lanhei Road, Kunming, 650204 Yunnan China.

Published: May 2019

Background: Plastome (plastid genome) sequences provide valuable information for understanding the phylogenetic relationships and evolutionary history of plants. Although the rapid development of high-throughput sequencing technology has led to an explosion of plastome sequences, annotation remains a significant bottleneck for plastomes. User-friendly batch annotation of multiple plastomes is an urgent need.

Results: We introduce Plastid Genome Annotator (PGA), a standalone command line tool that can perform rapid, accurate, and flexible batch annotation of newly generated target plastomes based on well-annotated reference plastomes. In contrast to current existing tools, PGA uses reference plastomes as the query and unannotated target plastomes as the subject to locate genes, which we refer to as the reverse query-subject BLAST search approach. PGA accurately identifies gene and intron boundaries as well as intron loss. The program outputs GenBank-formatted files as well as a log file to assist users in verifying annotations. Comparisons against other available plastome annotation tools demonstrated the high annotation accuracy of PGA, with little or no post-annotation verification necessary. Likewise, we demonstrated the flexibility of reference plastomes within PGA by annotating the plastome of using that of as a reference. The program, user manual and example data sets are freely available at https://github.com/quxiaojian/PGA.

Conclusions: PGA facilitates rapid, accurate, and flexible batch annotation of plastomes across plants. For projects in which multiple plastomes are generated, the time savings for high-quality plastome annotation are especially significant.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6528300PMC
http://dx.doi.org/10.1186/s13007-019-0435-7DOI Listing

Publication Analysis

Top Keywords

batch annotation
16
rapid accurate
12
accurate flexible
12
flexible batch
12
reference plastomes
12
plastomes
10
annotation
8
annotation plastomes
8
plastid genome
8
multiple plastomes
8

Similar Publications

Background And Aim: Acute-on-chronic liver failure (ACLF) is characterized by fast progression and high mortality, with systemic inflammation and immune paralysis as its key events. While natural killer (NK) cells are key innate immune cells, their unique function and subpopulation heterogeneity in ACLF have not been fully elucidated. This study aimed to investigate the characteristics of NK cell subsets in the peripheral blood of patients with ACLF and determine their roles in the inflammatory responses.

View Article and Find Full Text PDF

Single-cell RNA sequencing (scRNA-seq) analysis offers tremendous potential for addressing various biological questions, with one key application being the annotation of query datasets with unknown cell types using well-annotated external reference datasets. However, the performance of existing supervised or semi-supervised methods largely depends on the quality of source data. Furthermore, these methods often struggle with the batch effects arising from different platforms when handling multiple reference or query datasets, making precise annotation challenging.

View Article and Find Full Text PDF

LncSL: A Novel Stacked Ensemble Computing Tool for Subcellular Localization of lncRNA by Amino Acid-Enhanced Features and Two-Stage Automated Selection Strategy.

Int J Mol Sci

December 2024

School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou 213164, China.

Long non-coding RNA (lncRNA) is a non-coding RNA longer than 200 nucleotides, crucial for functions like cell cycle regulation and gene transcription. Accurate localization prediction from sequence information is vital for understanding lncRNA's biological roles. Computational methods offer an effective alternative to traditional experimental methods for annotating lncRNA subcellular positions.

View Article and Find Full Text PDF

Depth-corrected multi-factor dissection of chromatin accessibility for scATAC-seq data with PACS.

Nat Commun

January 2025

Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.

Single cell ATAC-seq (scATAC-seq) experimental designs have become increasingly complex, with multiple factors that might affect chromatin accessibility, including genotype, cell type, tissue of origin, sample location, batch, etc., whose compound effects are difficult to test by existing methods. In addition, current scATAC-seq data present statistical difficulties due to their sparsity and variations in individual sequence capture.

View Article and Find Full Text PDF

Mass spectrometry (MS)-based metabolomics often rely on separation techniques when analyzing complex biological specimens to improve method resolution, metabolome coverage, quantitative performance, and/or unknown identification. However, low sample throughput and complicated data preprocessing procedures remain major barriers to affordable metabolomic studies that are scalable to large populations. Herein, we introduce PeakMeister as a new software tool in the R statistical environment to enable standardized processing of serum metabolomic data acquired by multisegment injection-capillary electrophoresis-mass spectrometry (MSI-CE-MS), a high-throughput separation platform (<4 min/sample) which takes advantage of a serial injection format of 13 samples within a single analytical run.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!