Correcting Illumina data.

Brief Bioinform

Published: July 2015

Next-generation sequencing technologies revolutionized the ways in which genetic information is obtained and have opened the door for many essential applications in biomedical sciences. Hundreds of gigabytes of data are being produced, and all applications are affected by the errors in the data. Many programs have been designed to correct these errors, most of them targeting the data produced by the dominant technology of Illumina. We present a thorough comparison of these programs. Both HiSeq and MiSeq types of Illumina data are analyzed, and correcting performance is evaluated as the gain in depth and breadth of coverage, as given by correct reads and k-mers. Time and memory requirements, scalability and parallelism are considered as well. Practical guidelines are provided for the effective use of these tools. We also evaluate the efficiency of the current state-of-the-art programs for correcting Illumina data and provide research directions for further improvement.

Download full-text PDF

Source
http://dx.doi.org/10.1093/bib/bbu029DOI Listing

Publication Analysis

Top Keywords

illumina data
12
correcting illumina
8
data produced
8
data
6
data next-generation
4
next-generation sequencing
4
sequencing technologies
4
technologies revolutionized
4
revolutionized ways
4
ways genetic
4

Similar Publications

Molecular subtypes, such as defined by The Cancer Genome Atlas (TCGA), delineate a cancer's underlying biology, bringing hope to inform a patient's prognosis and treatment plan. However, most approaches used in the discovery of subtypes are not suitable for assigning subtype labels to new cancer specimens from other studies or clinical trials. Here, we address this barrier by applying five different machine learning approaches to multi-omic data from 8,791 TCGA tumor samples comprising 106 subtypes from 26 different cancer cohorts to build models based upon small numbers of features that can classify new samples into previously defined TCGA molecular subtypes-a step toward molecular subtype application in the clinic.

View Article and Find Full Text PDF

Characterizing the feeding ecology of threatened species is essential to establish appropriate conservation strategies. We focused our study on the proboscis monkey (Nasalis larvatus), an endangered primate species which is endemic to the island of Borneo. Our survey was conducted in the Lower Kinabatangan Wildlife Sanctuary (LKWS), a riverine protected area that is surrounded by oil palm plantations.

View Article and Find Full Text PDF

Small RNA sequencing analysis in two chickpea genotypes, JG 62 (Fusarium wilt-susceptible) and WR 315 (Fusarium wilt-resistant), under Fusarium wilt stress led to identification of 544 miRNAs which included 406 known and 138 novel miRNAs. A total of 115 miRNAs showed differential expression in both the genotypes across different combinations. A miRNA, Car-miR398 targeted copper chaperone for superoxide dismutase (CCS) that, in turn, regulated superoxide dismutase (SOD) activity during chickpea-Foc interaction.

View Article and Find Full Text PDF

Background: Alcohol Use Disorder (AUD) affects over 15 million individuals in the United States, contributing to oxidative stress, neuroinflammation, and elevating the risk of neurodegeneration. Despite this, the connection between AUD and aging conditions, particularly Alzheimer's disease (AD), remains unclear. AD, with a heritability of 60-80%, is genetically linked, necessitating an exploration of the molecular implications of AUD and genetic susceptibility to AD.

View Article and Find Full Text PDF

Background: Epigenetic mechanisms as a potential underlying pathogenic mechanism of neurodegenerative diseases have been the scope of several studies performed so far. However, there is a gap in analyzing different forms of early-onset dementia to minimize the effect of aging and the use of Lymphoblastoid cell lines (LCLs) as a possible disease model for earlier clinical phases.

Method: We performed a genome-wide DNA methylation analysis in 64 samples (from prefrontal cortex and lymphoblastoid cell lines) from Alzheimer's Disease (AD) and Frontotemporal dementia (FTD) using the Illumina Infinium MethylationEPIC V2.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!