BMC Bioinformatics
December 2017
Background: In current statistical methods for calling differentially expressed genes in RNA-Seq experiments, the assumption is that an adjusted observed gene count represents an unknown true gene count. This adjustment usually consists of a normalization step to account for heterogeneous sample library sizes, and then the resulting normalized gene counts are used as input for parametric or non-parametric differential gene expression tests. A distribution of true gene counts, each with a different probability, can result in the same observed gene count.
View Article and Find Full Text PDFPhysiological responses to stress are controlled by expression of a large number of genes, many of which are regulated by microRNAs. Since most banana cultivars are salt-sensitive, improved understanding of genetic regulation of salt induced stress responses in banana can support future crop management and improvement in the face of increasing soil salinity related to irrigation and climate change. In this study we focused on determining miRNA and their targets that respond to NaCl exposure and used transcriptome sequencing of RNA and small RNA from control and NaCl-treated banana roots to assemble a cultivar-specific reference transcriptome and identify orthologous and Musa-specific miRNA responding to salinity.
View Article and Find Full Text PDFAllergy is a major health problem in industrialized countries. The number of transgenic food crops is growing rapidly creating the need for allergenicity assessment before they are introduced into human food chain. While existing bioinformatic methods have achieved good accuracies for highly conserved sequences, the discrimination of allergens and non-allergens from allergen-like non-allergen sequences remains difficult.
View Article and Find Full Text PDFLow target discovery rate has been linked to inadequate consideration of multiple factors that collectively contribute to druggability. These factors include sequence, structural, physicochemical, and systems profiles. Methods individually exploring each of these profiles for target identification have been developed, but they have not been collectively used.
View Article and Find Full Text PDFBMC Bioinformatics
March 2009
Background: DNA copy number variation (CNV) has been recognized as an important source of genetic variation. Array comparative genomic hybridization (aCGH) is commonly used for CNV detection, but the microarray platform has a number of inherent limitations.
Results: Here, we describe a method to detect copy number variation using shotgun sequencing, CNV-seq.
Summary: A variety of specialist databases have been developed to facilitate the study of allergens. However, these databases either contain different subsets of allergen data or are deficient in tools for assessing potential allergenicity of proteins. Here, we describe Allergen Atlas, a comprehensive repository of experimentally validated allergen sequences collected from in-house laboratory, online data submission, literature reports and all existing general-purpose and specialist databases.
View Article and Find Full Text PDFBMC Bioinformatics
December 2008
Background: Bioinformatics tools are commonly used for assessing potential protein allergenicity. While these methods have achieved good accuracies for highly conserved sequences, they are less effective when the overall similarity is low. In this study, we assessed the feasibility of using position-specific scoring matrices as a basis for predicting potential allergenicity in proteins.
View Article and Find Full Text PDFThe constant increase in atopic allergy and other hypersensitivity reactions has intensified the need for successful therapeutic approaches. Existing bioinformatic tools for predicting allergenic potential are primarily based on sequence similarity searches along the entire protein sequence and do not address the dual issues of conformational and overlapping B-cell epitope recognition sites. In this study, we report AllerPred, a computational system that is capable of capturing multiple overlapping continuous and discontinuous B-cell epitope binding patterns in allergenic proteins using SVM as its prediction engine.
View Article and Find Full Text PDFAllergy is a prevalent health problem in developed countries. With advances in genomic and proteomic technologies, there is a rapid increase in allergy-related data, including allergen sequences, allergic cross-reactivity, molecular structures, clinical measurements, and atmospheric concentrations. The more and more complex allergy data is fueling the need for advanced ways in information management and analysis.
View Article and Find Full Text PDFBackground: Repeats are present in all genomes, and often have important functions. However, in large genome sequencing projects, many repetitive regions remain uncharacterized. The genome of the protozoan parasite Trypanosoma cruzi consists of more than 50% repeats.
View Article and Find Full Text PDFModern alignment methods designed to work rapidly and efficiently with large datasets often do so at the cost of method sensitivity. To overcome this, we have developed a novel alignment program, GRAT, built to accurately align short, highly similar DNA sequences. The program runs rapidly and requires no more memory and CPU power than a desktop computer.
View Article and Find Full Text PDFBackground: The accurate prediction of a comprehensive set of messenger RNAs (targets) regulated by animal microRNAs (miRNAs) remains an open problem. In particular, the prediction of targets that do not possess evolutionarily conserved complementarity to their miRNA regulators is not adequately addressed by current tools.
Results: We have developed MicroTar, an animal miRNA target prediction tool based on miRNA-target complementarity and thermodynamic data.
Unlabelled: Assessment of potential allergenicity and patterns of cross-reactivity is necessary whenever novel proteins are introduced into human food chain. Current bioinformatic methods in allergology focus mainly on the prediction of allergenic proteins, with no information on cross-reactivity patterns among known allergens. In this study, we present AllerTool, a web server with essential tools for the assessment of predicted as well as published cross-reactivity patterns of allergens.
View Article and Find Full Text PDFBackground: Many genome projects are left unfinished due to complex, repeated regions. Finishing is the most time consuming step in sequencing and current finishing tools are not designed with particular attention to the repeat problem.
Results: We have developed DNPTrapper, a shotgun sequence finishing tool, specifically designed to address the problems posed by the presence of repeated regions in the target sequence.
Proc Natl Acad Sci U S A
September 2005
The identification of new virus species is a key issue for the study of infectious disease but is technically very difficult. We developed a system for large-scale molecular virus screening of clinical samples based on host DNA depletion, random PCR amplification, large-scale sequencing, and bioinformatics. The technology was applied to pooled human respiratory tract samples.
View Article and Find Full Text PDFWhole-genome sequencing of the protozoan pathogen Trypanosoma cruzi revealed that the diploid genome contains a predicted 22,570 proteins encoded by genes, of which 12,570 represent allelic pairs. Over 50% of the genome consists of repeated sequences, such as retrotransposons and genes for large families of surface molecules, which include trans-sialidases, mucins, gp63s, and a large novel family (>1300 copies) of mucin-associated surface protein (MASP) genes. Analyses of the T.
View Article and Find Full Text PDFWe describe a genetic variation map for the chicken genome containing 2.8 million single-nucleotide polymorphisms (SNPs). This map is based on a comparison of the sequences of three domestic chicken breeds (a broiler, a layer and a Chinese silkie) with that of their wild ancestor, red jungle fowl.
View Article and Find Full Text PDFUnlabelled: Finishing, i.e. gap closure and editing, is the most time-consuming part of genome sequencing.
View Article and Find Full Text PDFSequencing errors in combination with repeated regions cause major problems in shotgun sequencing, mainly due to the failure of assembly programs to distinguish single base differences between repeat copies from erroneous base calls. In this paper, a new strategy designed to correct errors in shotgun sequence data using defined nucleotide positions, DNPs, is presented. The method distinguishes single base differences from sequencing errors by analyzing multiple alignments consisting of a read and all its overlaps with other reads.
View Article and Find Full Text PDFComput Methods Programs Biomed
January 2003
The software commonly used for assembly of shotgun sequence data has several limitations. One such limitation becomes obvious when repetitive sequences are encountered. Shotgun assembly is a difficult task, even for non-repetitive regions, but the use of quality assessments of the data and efficient matching algorithms have made it possible to assemble most sequences efficiently.
View Article and Find Full Text PDFAn increasingly important problem in genome sequencing is the failure of the commonly used shotgun assembly programs to correctly assemble repetitive sequences. The assembly of non-repetitive regions or regions containing repeats considerably shorter than the average read length is in practice easy to solve, while longer repeats have been a difficult problem. We here present a statistical method to separate arbitrarily long, almost identical repeats, which makes it possible to correctly assemble complex repetitive sequence regions.
View Article and Find Full Text PDF