Background: Genomic read alignment involves mapping (exactly or approximately) short reads from a particular individual onto a pre-sequenced reference genome of the same species. Because all individuals of the same species share the majority of their genomes, short reads alignment provides an alternative and much more efficient way to sequence the genome of a particular individual than does direct sequencing. Among many strategies proposed for this alignment process, indexing the reference genome and short read searching over the index is a dominant technique. Our goal is to design a space-efficient indexing structure with fast searching capability to catch the massive short reads produced by the next generation high-throughput DNA sequencing technology.

Results: We concentrate on indexing DNA sequences via sparse suffix arrays (SSAs) and propose a new short read aligner named Ψ-RA (PSI-RA: parallel sparse index read aligner). The motivation in using SSAs is the ability to trade memory against time. It is possible to fine tune the space consumption of the index based on the available memory of the machine and the minimum length of the arriving pattern queries. Although SSAs have been studied before for exact matching of short reads, an elegant way of approximate matching capability was missing. We provide this by defining the rightmost mismatch criteria that prioritize the errors towards the end of the reads, where errors are more probable. Ψ-RA supports any number of mismatches in aligning reads. We give comparisons with some of the well-known short read aligners, and show that indexing a genome with SSA is a good alternative to the Burrows-Wheeler transform or seed-based solutions.

Conclusions: Ψ-RA is expected to serve as a valuable tool in the alignment of short reads generated by the next generation high-throughput sequencing technology. Ψ-RA is very fast in exact matching and also supports rightmost approximate matching. The SSA structure that Ψ-RA is built on naturally incorporates the modern multicore architecture and thus further speed-up can be gained. All the information, including the source code of Ψ-RA, can be downloaded at: http://www.busillis.com/o_kulekci/PSIRA.zip.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3194238PMC
http://dx.doi.org/10.1186/1471-2164-12-S2-S7DOI Listing

Publication Analysis

Top Keywords

short reads
20
short read
12
parallel sparse
8
genomic read
8
read alignment
8
short
8
reference genome
8
generation high-throughput
8
read aligner
8
exact matching
8

Similar Publications

Comparative Assessment of Real-Time and Offline Short-Lag Spatial Coherence Imaging of Ultrasound Breast Masses.

Ultrasound Med Biol

March 2025

Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA; Department of Electrical & Computer Engineering, Johns Hopkins University, Baltimore, MD, USA; Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA. Electronic address:

Objective: To perform the first known investigation of differences between real-time and offline B-mode and short-lag spatial coherence (SLSC) images when evaluating fluid or solid content in 60 hypoechoic breast masses.

Methods: Real-time and retrospective (i.e.

View Article and Find Full Text PDF

Background: Cultivated strawberry (Fragaria xananassa Duch.), an allo-octoploid species arising from at least 3 diploid progenitors, poses a challenge for genomic analysis due to its high levels of heterozygosity and the complex nature of its polyploid genome.

Results: This study developed the complete haplotype-phased genome sequence from a short-day strawberry, 'Florida Brilliance' without parental data, assembling 56 chromosomes from telomere to telomere.

View Article and Find Full Text PDF

Background: Although electronic health record nursing summaries aim to provide a concise overview of patient data, they often fall short of meeting nurses' information needs, leading to underutilization. This gap arises from a lack of involvement of nurses in the design of health information technologies.

Objective: The purpose of this exploratory co-design case study was to solicit insights from nurses regarding nursing summary design considerations, including key information types and the preferred design prototype.

View Article and Find Full Text PDF

We see unprecedented weather causing widespread impacts across the world. In this perspective, we provide an overview of methods that help anticipate unprecedented weather hazards that can contribute to stop being surprised. We then discuss disaster management and climate adaptation practices, their gaps, and how the methods to anticipate unprecedented weather may help build resilience.

View Article and Find Full Text PDF

Metagenomic insights of microbial functions under conventional and conservation agriculture.

World J Microbiol Biotechnol

March 2025

Department of Environmental Engineering, Institut Teknologi Sepuluh Nopember, Surabaya, 60111, Indonesia.

Agricultural practices such as conventional (CN) and conservation agriculture (CA) influence the composition and structure of soil microorganisms. We used short reads and genome-resolved metagenomic-based dual sequencing approaches to create a profile of bacterial and archaeal communities in hyperthermic Typic Haplustepts soil after seven years of CA and CN. The most differences in the physico-chemical and biological properties of soil were higher pH, organics carbon, available nitrogen and microbial biomass contents, activities of dehydrogenase, β-glucosidase, and arylsulfatase, found in CA soil.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!