Misannotation Awareness: A Tale of Two Gene-Groups.

Front Plant Sci

EU Marie Curie Chair, Instituto de Ciências Agrárias e Ambientais Mediterrânicas, Universidade de Évora Évora, Portugal.

Published: July 2016

Incorrectly or simply not annotated data is largely increasing in most public databases, undoubtedly caused by the rise in sequence data and the more recent boom of genomic projects. Molecular biologists and bioinformaticists should join efforts to tackle this issue. Practical challenges have been experienced when studying the alternative oxidase (AOX) gene family, and hence the motivation for the present work. Commonly used databases were screened for their capacity to distinguish AOX from the plastid terminal oxidase (also called plastoquinol terminal oxidase; PTOX) and we put forward a simple approach, based on amino acids signatures, that unequivocally distinguishes these gene families. Further, available sequence data on the AOX family in plants was carefully revised to: (1) confirm the classification as AOX and (2) identify to which AOX family member they belong to. We bring forward the urgent need of misannotation awareness and re-annotation of public AOX sequences by highlighting different types of misclassifications and the large under-estimation of data availability.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4909761PMC
http://dx.doi.org/10.3389/fpls.2016.00868DOI Listing

Publication Analysis

Top Keywords

misannotation awareness
8
sequence data
8
terminal oxidase
8
aox family
8
aox
6
awareness tale
4
tale gene-groups
4
gene-groups incorrectly
4
incorrectly simply
4
simply annotated
4

Similar Publications

Genome-aware annotation of CRISPR guides validates targets in variant cell lines and enhances discovery in screens.

Genome Med

November 2024

Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK.

Article Synopsis
  • CRISPR-Cas9 technology has transformed genetic research, but discrepancies between reference genomes and cell lines, especially in variant cancer lines, can introduce biases and affect results.
  • The Exorcise algorithm was developed to detect and correct mis-annotations in CRISPR libraries, improving the accuracy of gene-targeting guides based on the specific genomes being studied.
  • Application of Exorcise has shown enhanced discovery power in CRISPR screens and can be used for both library design and analysis stages, making it a valuable tool for researchers.
View Article and Find Full Text PDF

Accurate alignment of transcribed RNA to reference genomes is a critical step in the analysis of gene expression, which in turn has broad applications in biomedical research and in the basic sciences. We reveal that widely used splice-aware aligners, such as STAR and HISAT2, can introduce erroneous spliced alignments between repeated sequences, leading to the inclusion of falsely spliced transcripts in RNA-seq experiments. In some cases, the 'phantom' introns resulting from these errors make their way into widely-used genome annotation databases.

View Article and Find Full Text PDF

The human genome encodes approximately 20,000 proteins, many still uncharacterised. It has become clear that scientific research tends to focus on well-studied proteins, leading to a concern that poorly understood genes are unjustifiably neglected. To address this, we have developed a publicly available and customisable "Unknome database" that ranks proteins based on how little is known about them.

View Article and Find Full Text PDF

Widespread occurrence of in-source fragmentation in the analysis of natural compounds by liquid chromatography-electrospray ionization mass spectrometry.

Rapid Commun Mass Spectrom

June 2023

Key Laboratory of Basic Pharmacology of Ministry of Education & Joint International Research Laboratory of Ethnomedicine of Ministry of Education, Zunyi Medical University, Zunyi, China.

Rationale: The in-source fragmentation (ISF) of analyte or co-eluting substances produces unintentional fragment ions, which hampers identification and quantification by liquid chromatography-mass spectrometry (LC/MS). Natural compounds derived from plants also contain fragile moieties that may undergo ISF. However, the characteristics of ISF of natural compounds in LC/MS are still unclear.

View Article and Find Full Text PDF

Most single-nucleotide polymorphisms (SNPs) are located in non-coding regions, but the fraction usually studied is harbored in protein-coding regions because potential impacts on proteins are relatively easy to predict by popular tools such as the Variant Effect Predictor. These tools annotate variants independently without considering the potential effect of grouped or haplotypic variations, often called "multi-nucleotide variants" (MNVs). Here, we used a large RNA-seq dataset to survey MNVs, comprising 382 chicken samples originating from 11 populations analyzed in the companion paper in which 9.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!