AGOUTI: improving genome assembly and annotation using transcriptome data.

Gigascience

School of Informatics and Computing, Indiana University, Bloomington, IN, 47405, USA.

Published: July 2016

Background: Genomes sequenced using short-read, next-generation sequencing technologies can have many errors and may be fragmented into thousands of small contigs. These incomplete and fragmented assemblies lead to errors in gene identification, such that single genes spread across multiple contigs are annotated as separate gene models. Such biases can confound inferences about the number and identity of genes within species, as well as gene gain and loss between species.

Results: We present AGOUTI (Annotated Genome Optimization Using Transcriptome Information), a tool that uses RNA sequencing data to simultaneously combine contigs into scaffolds and fragmented gene models into single models. We show that AGOUTI improves both the contiguity of genome assemblies and the accuracy of gene annotation, providing updated versions of each as output. Running AGOUTI on both simulated and real datasets, we show that it is highly accurate and that it achieves greater accuracy and contiguity when compared with other existing methods.

Conclusion: AGOUTI is a powerful and effective scaffolder and, unlike most scaffolders, is expected to be more effective in larger genomes because of the commensurate increase in intron length. AGOUTI is able to scaffold thousands of contigs while simultaneously reducing the number of gene models by hundreds or thousands. The software is available free of charge under the MIT license.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4952227PMC
http://dx.doi.org/10.1186/s13742-016-0136-3DOI Listing

Publication Analysis

Top Keywords

gene models
12
agouti
6
gene
6
agouti improving
4
improving genome
4
genome assembly
4
assembly annotation
4
annotation transcriptome
4
transcriptome data
4
data background
4

Similar Publications

Importance: Enhanced breast cancer screening with magnetic resonance imaging (MRI) is recommended to women with elevated risk of breast cancer, yet uptake of screening remains unclear after genetic testing.

Objective: To evaluate uptake of MRI after genetic results disclosure and counseling.

Design, Setting, And Participants: This multicenter cohort study was conducted at the University of Southern California Norris Cancer Hospital, the Los Angeles General Medical Center, and the Stanford University Cancer Institute.

View Article and Find Full Text PDF

Screening of obstructive sleep apnea and diabetes mellitus -related biomarkers based on integrated bioinformatics analysis and machine learning.

Sleep Breath

January 2025

Department of Respiratory and Critical Care Medicine, Medical School of Nantong University, Nantong Key Laboratory of Respiratory Medicine, Affiliated Hospital of Nantong University, Nantong, 226001, China.

Background: The pathophysiology of obstructive sleep apnea (OSA) and diabetes mellitus (DM) is still unknown, despite clinical reports linking the two conditions. After investigating potential roles for DM-related genes in the pathophysiology of OSA, our goal is to investigate the molecular significance of the condition. Machine learning is a useful approach to understanding complex gene expression data to find biomarkers for the diagnosis of OSA.

View Article and Find Full Text PDF

This study investigated tempol action on genes and miRNAs related to NFκB pathway in androgen dependent or independent cell lines and in TRAMP model in the early and late-stages of cancer progression. A bioinformatic search was conducted to select the miRNAs to be measured based on the genes of interest from NFκB pathway. The miR-let-7c-5p, miR-26a-5p and miR-155-5p and five target genes (BCL2, BCL2L1, RELA, TNF, PTGS2) were chosen for RT-PCR and gene enrichment analyses.

View Article and Find Full Text PDF

Integrating machine learning with mendelian randomization for unveiling causal gene networks in glioblastoma multiforme.

Discov Oncol

January 2025

Department of Medical Imaging, Shenzhen Longhua District Key Laboratory of Neuroimaging, Shenzhen Longhua District Central Hospital, Shenzhen, 518110, China.

Background: Glioblastoma multiforme (GBM) is a highly aggressive brain cancer with poor prognosis and limited treatment options. Despite advances in understanding its molecular mechanisms, effective therapeutic strategies remain elusive due to the tumor's genetic complexity and heterogeneity.

Methods: This study employed a comprehensive analysis approach integrating 113 machine learning algorithms with Mendelian Randomization (MR) analysis to investigate the molecular underpinnings of GBM.

View Article and Find Full Text PDF

Background: KRAS inhibitors are revolutionizing the treatment of NSCLC, but clinico-genomic determinants of treatment efficacy warrant continued exploration.

Methods: Patients with advanced KRASG12C-mutant NSCLC treated with adagrasib (KRYSTAL-1-NCT03785249) were included in the analysis. Pre-treatment NGS data were collected per protocol.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!