Background: The continuing increase in size and quality of the "short reads" raw data is a significant help for the quality of the assembly obtained through various bioinformatics tools. However, building a reference genome sequence for most plant species remains a significant challenge due to the large number of repeated sequences which are problematic for a whole-genome quality de novo assembly. Furthermore, for most SNP identification approaches in plant genetics and breeding, only the "Gene-space" regions including the promoter, exon and intron sequences are considered.

Results: We developed the iPea protocol to produce a de novo Gene-space assembly by reconstructing, in an iterative way, the non-coding sequence flanking the Unigene cDNA sequence through addition of next-generation DNA-seq data. The approach was elaborated with the large diploid genome of pea (Pisum sativum L.), rich in repetitive sequences. The final Gene-space assembly included 35,400 contigs (97 Mb), covering 88 % of the 40,227 contigs (53.1 Mb) of the PsCam_low-copy Unigen set. Its accuracy was validated by the results of the built GenoPea 13.2 K SNP Array.

Conclusion: The iPEA protocol allows the reconstruction of a Gene-space based from RNA-Seq and DNA-seq data with limited computing resources.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4750290PMC
http://dx.doi.org/10.1186/s13104-016-1903-zDOI Listing

Publication Analysis

Top Keywords

ipea protocol
12
rich repetitive
8
repetitive sequences
8
limited computing
8
computing resources
8
gene-space assembly
8
dna-seq data
8
assembly
5
novo construction
4
construction "gene-space"
4

Similar Publications

Outcomes of Zika virus infection during pregnancy: contributions to the debate on the efficiency of cohort studies.

BMC Public Health

December 2017

School of Medicine - Department of Obstetrics and Gynaecology, Queen's University, 99 University Ave, Kingston, ON, K7L 3N6, Canada.

Background: Zika infection during pregnancy (ZIKVP) is known to be associated with adverse outcomes. Studies on this matter involve both rare outcomes and rare exposures and methodological choices are not straightforward. Cohort studies will surely offer more robust evidences, but their efficiency must be enhanced.

View Article and Find Full Text PDF

Background: The continuing increase in size and quality of the "short reads" raw data is a significant help for the quality of the assembly obtained through various bioinformatics tools. However, building a reference genome sequence for most plant species remains a significant challenge due to the large number of repeated sequences which are problematic for a whole-genome quality de novo assembly. Furthermore, for most SNP identification approaches in plant genetics and breeding, only the "Gene-space" regions including the promoter, exon and intron sequences are considered.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!