Publications by authors named "Noor Pratap Singh"

Ultrafast mapping of short reads to transcriptomic and metagenomic references via lightweight mapping techniques such as pseudoalignment has demonstrated success in substantially accelerating several types of analyses without much loss in accuracy compared to alignment-based approaches. The application of pseudoalignment to large reference sequences - like the genome - is, however, not trivial, due to the large size of the references or "targets" (i.e.

View Article and Find Full Text PDF

The problem of sequence identification or matching-determining the subset of reference sequences from a given collection that are likely to contain a short, queried nucleotide sequence-is relevant for many important tasks in Computational Biology, such as metagenomics and pangenome analysis. Due to the complex nature of such analyses and the large scale of the reference collections a resource-efficient solution to this problem is of utmost importance. This poses the threefold challenge of representing the reference collection with a data structure that is efficient to query, has light memory usage, and scales well to large collections.

View Article and Find Full Text PDF

Identifying differentially expressed transcripts poses a crucial yet challenging problem in transcriptomics. Substantial uncertainty is associated with the abundance estimates of certain transcripts which, if ignored, can lead to the exaggeration of false positives and, if included, may lead to reduced power. For a given set of RNA-Seq samples, TreeTerminus arranges transcripts in a hierarchical tree structure that encodes different layers of resolution for interpretation of the abundance of transcriptional groups, with uncertainty generally decreasing as one ascends the tree from the leaves.

View Article and Find Full Text PDF

A certain degree of uncertainty is always associated with the transcript abundance estimates. The uncertainty may make many downstream analyses, such as differential testing, difficult for certain transcripts. Conversely, gene-level analysis, though less ambiguous, is often too coarse-grained.

View Article and Find Full Text PDF

The problem of sequence identification or matching - determining the subset of references from a given collection that are likely to contain a query nucleotide sequence - is relevant for many important tasks in Computational Biology, such as metagenomics and pan-genome analysis. Due to the complex nature of such analyses and the large scale of the reference collections a resourceefficient solution to this problem is of utmost importance. The reference collection should therefore be pre-processed into an for fast queries.

View Article and Find Full Text PDF

Patterns of DNA methylation are significantly altered in cancers. Interpreting the functional consequences of DNA methylation requires the integration of multiple forms of data. The recent advancement in the next-generation sequencing can help to decode this relationship and in biomarker discovery.

View Article and Find Full Text PDF

Papillary Renal Cell Carcinoma (PRCC) is a heterogeneous disease with variations in disease progression and clinical outcomes. The advent of next generation sequencing techniques (NGS) has generated data from patients that can be analysed to develop a predictive model. In this study, we have adopted a machine learning approach to identify biomarkers and build classifiers to discriminate between early and late stages of PRCC from gene expression profiles.

View Article and Find Full Text PDF