Unmapped reads are often discarded from the analysis of whole-genome re-sequencing, but new biological information and insights can be uncovered through their analysis. In this paper, we investigate unmapped reads from the re-sequencing data of 33 pea aphid genomes from individuals specialized on different host plants. The unmapped reads for each individual were retrieved following mapping to the Acyrthosiphon pisum reference genome and its mitochondrial and symbiont genomes. These sets of unmapped reads were then cross-compared, revealing that a significant number of these unmapped sequences were conserved across individuals. Interestingly, sequences were most commonly shared between individuals adapted to the same host plant, suggesting that these sequences may contribute to the divergence between host plant specialized biotypes. Analysis of the contigs obtained from assembling the unmapped reads pooled by biotype allowed us to recover some divergent genomic regions previously excluded from analysis and to discover putative novel sequences of A. pisum and its symbionts. In conclusion, this study emphasizes the interest of the unmapped component of re-sequencing data sets and the potential loss of important information. We here propose strategies to aid the capture and interpretation of this information.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4815510 | PMC |
http://dx.doi.org/10.1038/hdy.2014.85 | DOI Listing |
Front Genet
November 2024
Department of Surgical and Medical Sciences, Magna Graecia University of Catanzaro, Catanzaro, Italy.
Background And Aims: The rapid and accurate detection of viruses and the discovery of single nucleotide polymorphisms (SNPs) are critical for disease management and understanding viral evolution. This study presents a pipeline for virus detection, validation, and SNP discovery from next-generation sequencing (NGS) data. The pipeline processes raw sequencing data to identify viral sequences with high accuracy and sensitivity by integrating state-of-the-art bioinformatics tools with artificial intelligence.
View Article and Find Full Text PDFPLoS One
November 2024
Departamento de Biologia Estrutural, Molecular e Genética, Programa de Pós-Graduação em Biologia Evolutiva, Universidade Estadual de Ponta Grossa, Ponta Grossa, Paraná, Brazil.
BMC Genomics
November 2024
CSIR-Institute of Microbial Technology (IMTECH), Sector 39-A, Chandigarh, 160036, India.
Background: Microbes produce diverse bioactive natural products with applications in fields such as medicine and agriculture. In their genomes, these natural products are encoded by physically clustered genes known as biosynthetic gene clusters (BGCs). Genome and metagenome sequencing advances have enabled high-throughput identification of BGCs as a promising avenue for natural product discovery.
View Article and Find Full Text PDFCurr Issues Mol Biol
September 2024
Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph, ON N1G 2W1, Canada.
RNA molecules within ejaculated sperm can be characterized through whole-transcriptome sequencing, enabling the identification of pivotal transcripts that may influence reproductive success. However, the profiling of sperm transcriptomes through next-generation sequencing has several limitations impairing the identification of functional transcripts. In this study, we explored the nature of the RNA sequences present in the sperm transcriptome of two livestock species, cattle and horses, using RNA sequencing (RNA-seq) technology.
View Article and Find Full Text PDFMethods Mol Biol
September 2024
H.U. Group Research Institute, G.K./SRL Inc., Akiruno, Tokyo, Japan.
Hi-C is a popular ligation-based technique to detect 3D physical chromosome structure within the nucleus using cross-linking and next-generation sequencing. As an unbiased genome-wide assay based on chromosome conformation capture, it provides rich insights into chromosome structure, dynamic chromosome folding and interactions, and the regulatory state of a cell. Bioinformatics analyses of Hi-C data require dedicated protocols as most genome alignment tools assume that both paired-end reads will map to the same chromosome, resulting in large two-dimensional matrices as processed data.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!