Motivation: As high-throughput transcriptome sequencing provides evidence for novel transcripts in many species, there is a renewed need for accurate methods to classify small genomic regions as protein coding or non-coding. We present PhyloCSF, a novel comparative genomics method that analyzes a multispecies nucleotide sequence alignment to determine whether it is likely to represent a conserved protein-coding region, based on a formal statistical comparison of phylogenetic codon models.
Results: We show that PhyloCSF's classification performance in 12-species Drosophila genome alignments exceeds all other methods we compared in a previous study. We anticipate that this method will be widely applicable as the transcriptomes of many additional species, tissues and subcellular compartments are sequenced, particularly in the context of ENCODE and modENCODE, and as interest grows in long non-coding RNAs, often initially recognized by their lack of protein coding potential rather than conserved RNA secondary structures.
Availability And Implementation: The Objective Caml source code and executables for GNU/Linux and Mac OS X are freely available at http://compbio.mit.edu/PhyloCSF CONTACT: mlin@mit.edu; manoli@mit.edu.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3117341 | PMC |
http://dx.doi.org/10.1093/bioinformatics/btr209 | DOI Listing |
Mol Neurobiol
January 2025
Ruikang Hospital Affiliated to Guangxi University of Chinese Medicine, Nanning, Guangxi, China.
Dysregulation of long non-coding RNAs (lncRNAs) is implicated in the pathophysiology of ischemic stroke (IS). However, the molecular mechanism of the lncRNA SERPINB9P1 in IS remains unclear. Our study aimed to explore the role and molecular mechanism of the lncRNA SERPINB9P1 in IS.
View Article and Find Full Text PDFDatabase (Oxford)
January 2025
School of Computer Science and Technology, Xidian University, 266 Xinglong Section of Xifeng Road, Xi'an, Shaanxi 710126, China.
The pathogenesis of complex diseases is intricately linked to various genes and network medicine has enhanced understanding of diseases. However, most network-based approaches ignore interactions mediated by noncoding RNAs (ncRNAs) and most databases only focus on the association between genes and diseases. Based on the mentioned questions, we have developed DisGeNet, a database focuses not only on the disease-associated genes but also on the interactions among genes.
View Article and Find Full Text PDFInt J Mol Sci
January 2025
Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, 125315 Moscow, Russia.
A pseudogene is a non-functional copy of a protein-coding gene. Processed pseudogenes, which are created by the reverse transcription of mRNA and subsequent integration of the resulting cDNA into the genome, being a major pseudogene class, represent a significant challenge in genome analysis due to their high sequence similarity to the parent genes and their frequent absence in the reference genome. This homology can lead to errors in variant identification, as sequences derived from processed pseudogenes can be incorrectly assigned to parental genes, complicating correct variant calling.
View Article and Find Full Text PDFInt J Mol Sci
January 2025
Department of Human Genetics, School of Public Health, University of Pittsburgh, Pittsburgh, PA 15261, USA.
A couple presented to the office with an apparently healthy infant for a thorough clinical assessment, as they had previously lost two male children to a neurodegenerative disorder. They also reported the death of a male cousin abroad with a comparable condition. We aimed to evaluate a novel coding pathogenic variant c.
View Article and Find Full Text PDFInt J Mol Sci
January 2025
Arthritis and Clinical Immunology Research Program, Oklahoma Medical Research Foundation, Oklahoma City, OK 73104, USA.
Systemic lupus erythematosus (SLE) is a complex autoimmune disorder characterized by widespread inflammation and autoantibody production. Its development and progression involve genetic, epigenetic, and environmental factors. Although genome-wide association studies (GWAS) have repeatedly identified a susceptibility signal at 16p13, its fine-scale source and its functional and mechanistic role in SLE remain unclear.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!