Cellular regulation mechanisms that involve proteins and other active molecules interacting with specific targets often involve the recognition of sequence patterns. Short sequence elements on DNA, RNA and proteins play a central role in mediating such molecular recognition events. Studies that focus on measuring and investigating sequence-based recognition processes make use of statistical and computational tools that support the identification and understanding of sequence motifs. We present a new web application, named DRIMust, freely accessible through the website http://drimust.technion.ac.il for de novo motif discovery services. The DRIMust algorithm is based on the minimum hypergeometric statistical framework and uses suffix trees for an efficient enumeration of motif candidates. DRIMust takes as input ranked lists of sequences in FASTA format and returns motifs that are over-represented at the top of the list, where the determination of the threshold that defines top is data driven. The resulting motifs are presented individually with an accurate P-value indication and as a Position Specific Scoring Matrix. Comparing DRIMust with other state-of-the-art tools demonstrated significant advantage to DRIMust, both in result accuracy and in short running times. Overall, DRIMust is unique in combining efficient search on large ranked lists with rigorous P-value assessment for the detected motifs.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3692051 | PMC |
http://dx.doi.org/10.1093/nar/gkt407 | DOI Listing |
Future Cardiol
December 2024
Section of Cardiology, Department of Medicine, University of Arizona Sarver Heart, Tucson, AZ, USA.
The Medina classification separates true bifurcation lesions into three unnecessary groups: 1.1.1, 1.
View Article and Find Full Text PDFBiochem Genet
October 2024
Department of Computational Biology and Bioinformatics, University of Kerala, Karyavattom, Trivandrum, Kerala, India.
Repetitive DNA sequences cause genomic instability and are important genetic markers. Identification of repeats is a critical step in genome annotation and analysis. On the other hand, repeats also pose a technical challenge for genome assembly and alignment programs using NGS data.
View Article and Find Full Text PDFEntropy (Basel)
March 2023
Key Laboratory of Aerospace Information Security and Trusted Computing, Ministry of Education, School of Cyber Science and Engineering, Wuhan University, Wuhan 430072, China.
Source code summarization focuses on generating qualified natural language descriptions of a code snippet (e.g., functionality, usage and version).
View Article and Find Full Text PDFProc Annu ACM SIAM Symp Discret Algorithms
January 2023
The suffix array, describing the lexicographical order of suffixes of a given text, and the suffix tree, a path-compressed trie of all suffixes, are the two most fundamental data structures for string processing, with plethora of applications in data compression, bioinformatics, and information retrieval. For a length- text, however, they use bits of space, which is often too costly. To address this, Grossi and Vitter [STOC 2000] and, independently, Ferragina and Manzini [FOCS 2000] introduced space-efficient versions of the suffix array, known as the (CSA) and the .
View Article and Find Full Text PDFHealth Informatics J
April 2022
Department of Neurology, 93104SRM Institute of Science and Technology, Kattankulathur, India.
Alzheimer's disease (AD) is one of the most common forms of dementia contributing to more than 70% of the cases. The factors accounting for the cause and progression of neurodegenerative diseases like AD are primarily genetic, in addition to life style and environmental factors. Early and accurate diagnoses of AD empower practitioners to take timely clinical decisions and preventive actions.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!