Microbiome Res Rep
April 2024
This study introduces MetaBIDx, a computational method designed to enhance species prediction in metagenomic environments. The method addresses the challenge of accurate species identification in complex microbiomes, which is due to the large number of generated reads and the ever-expanding number of bacterial genomes. Bacterial identification is essential for disease diagnosis and tracing outbreaks associated with microbial infections.
View Article and Find Full Text PDFClassifying or identifying bacteria in metagenomic samples is an important problem in the analysis of metagenomic data. This task can be computationally expensive since microbial communities usually consist of hundreds to thousands of environmental microbial species. We proposed a new method for representing bacteria in a microbial community using genomic signatures of those bacteria.
View Article and Find Full Text PDFMost current approach to metagenomic classification employ short next generation sequencing (NGS) reads that are present in metagenomic samples to identify unique genomic regions. NGS reads, however, might not be long enough to differentiate similar genomes. This suggests a potential for using longer reads to improve classification performance.
View Article and Find Full Text PDFSummary: Although heteroplasmy has been studied extensively in animal systems, there is a lack of tools for analyzing, exploring and visualizing heteroplasmy at the genome-wide level in other taxonomic systems. We introduce icHET, which is a computational workflow that produces an interactive visualization that facilitates the exploration, analysis and discovery of heteroplasmy across multiple genomic samples. icHET works on short reads from multiple samples from any organism with an organellar reference genome (mitochondrial or plastid) and a nuclear reference genome.
View Article and Find Full Text PDFBioinformatics
September 2018
Motivation: The detection of genomic variants has great significance in genomics, bioinformatics, biomedical research and its applications. However, despite a lot of effort, Indels and structural variants are still under-characterized compared to SNPs. Current approaches based on next-generation sequencing data usually require large numbers of reads (high coverage) to be able to detect such types of variants accurately.
View Article and Find Full Text PDFBMC Bioinformatics
December 2017
Background: Quantification and identification of microbial genomes based on next-generation sequencing data is a challenging problem in metagenomics. Although current methods have mostly focused on analyzing bacteria whose genomes have been sequenced, such analyses are, however, complicated by the presence of unknown bacteria or bacteria whose genomes have not been sequence.
Results: We propose a method for detecting unknown bacteria in environmental samples.
J Bioinform Comput Biol
June 2017
Determining abundances of microbial genomes in metagenomic samples is an important problem in analyzing metagenomic data. Although homology-based methods are popular, they have shown to be computationally expensive due to the alignment of tens of millions of reads from metagenomic samples to reference genomes of hundreds to thousands of environmental microbial species. We introduce an efficient alignment-free approach to estimate abundances of microbial genomes in metagenomic samples.
View Article and Find Full Text PDFBMC Bioinformatics
October 2016
Efforts such as International HapMap Project and 1000 Genomes Project resulted in a catalog of millions of single nucleotides and insertion/deletion (INDEL) variants of the human population. Viewed as a reference of existing variants, this resource commonly serves as a gold standard for studying and developing methods to detect genetic variants. Our analysis revealed that this reference contained thousands of INDELs that were constructed in a biased manner.
View Article and Find Full Text PDFBMC Bioinformatics
August 2016
Background: Although it is frequently observed that aligning short reads to genomes becomes harder if they contain complex repeat patterns, there has not been much effort to quantify the relationship between complexity of genomes and difficulty of short-read alignment. Existing measures of sequence complexity seem unsuitable for the understanding and quantification of this relationship.
Results: We investigated several measures of complexity and found that length-sensitive measures of complexity had the highest correlation to accuracy of alignment.
BMC Bioinformatics
January 2015
Background: The analysis of gene expression has played an important role in medical and bioinformatics research. Although it is known that a large number of samples is needed to determine the patterns of gene expression accurately, practical designs of gene expression studies occasionally have insufficient numbers of samples, making it difficult to ascertain true response patterns of variantly expressed genes.
Results: We describe an approach to cope with the challenge of predicting true orders of gene response to treatments.
Background: The alignment of short reads generated by next-generation sequencers to genomes is an important problem in many biomedical and bioinformatics applications. Although many proposed methods work very well on narrow ranges of read lengths, they tend to suffer in performance and alignment quality for reads outside of these ranges.
Results: We introduce RandAL, a novel method that aligns DNA sequences to reference genomes.
Background: Identification of transcription factors (TFs) responsible for modulation of differentially expressed genes is a key step in deducing gene regulatory pathways. Most current methods identify TFs by searching for presence of DNA binding motifs in the promoter regions of co-regulated genes. However, this strategy may not always be useful as presence of a motif does not necessarily imply a regulatory role.
View Article and Find Full Text PDFHidden stops are nucleotide triples TAA, TAG and TGA that appear on the second and third reading frames of a protein coding gene. Recent studies suggested the important role of hidden stops in preventing misread of mRNA. We study the problem of designing protein-encoding genes with large number of hidden stops under several biological constraints.
View Article and Find Full Text PDFWe propose a novel method to estimate editing efficiency by adenosine deaminases that act on RNA (ADARs). The method employs the notion of stability of secondary structure in the vicinity of edited sites during transcription. Such an analysis of 'dynamic' structural motifs of RNA is important because as a pre-spliced RNA is being transcribed and elongated, its entire structure, and thus its local structures, may change drastically.
View Article and Find Full Text PDFJ Bioinform Comput Biol
February 2009
Post hoc assignment of patterns determined by all pairwise comparisons in microarray experiments with multiple treatments has been proven to be useful in assessing treatment effects. We propose the usage of transitive directed acyclic graphs (tDAG) as the representation of these patterns and show that such representation can be useful in clustering treatment effects, annotating existing clustering methods, and analyzing sample sizes. Advantages of this approach include: (1) unique and descriptive meaning of each cluster in terms of how genes respond to all pairs of treatments; (2) insensitivity of the observed patterns to the number of genes analyzed; and (3) a combinatorial perspective to address the sample size problem by observing the rate of contractible tDAG as the number of replicates increases.
View Article and Find Full Text PDF3H-1,2-dithiole-3-thione (D3T) and its analogues 4-methyl-5-pyrazinyl-3H-1,2-dithiole-3-thione (OLT) and 5-tert-butyl-3H-1,2-dithiole-3-thione (TBD) are chemopreventive agents that block or diminish early stages of carcinogenesis by inducing activities of detoxication enzymes. While OLT has been used in clinical trials, TBD has been shown to be more efficacious and possibly less toxic than OLT in animals. Here, we utilize a robust and high-resolution chemical genomics procedure to examine the pharmacological structure-activity relationships of these compounds in livers of male rats by microarray analyses.
View Article and Find Full Text PDFMotivation: Motif Tool Manager is a web-based framework for comparing and combining different approaches to discover novel DNA motifs. It comes with a set of five well-known approaches to motif discovery. It provides an easy mechanism for adding new motif finding tools to the framework through a web-interface and a minimal setup of the tools on the server.
View Article and Find Full Text PDFInt J Comput Biol Drug Des
January 2010
Proper management of bioinformatics data and tools is crucial because the amount of data is enormous, the type of data varies, and there are often different approaches (and consequently tools) for solving a particular problem. While specialised systems exist to serve specific needs, such systems are difficult to adapt and require large resource commitments for development and maintenance. We propose a system called Bioinformatics Tools and Data Management System (BioTDMS) that uses open-source technologies to provide a platform for managing both data and tools.
View Article and Find Full Text PDF