Advances in sequencing technologies have accelerated the sequencing of new genomes, far outpacing the generation of gene and protein resources needed to annotate them. Direct comparison and alignment of existing cDNA sequences from a related species is an effective and readily available means to determine genes in the new genomes. Current spliced alignment programs are inadequate for comparing sequences between different species, owing to their low sensitivity and splice junction accuracy. A new spliced alignment tool, sim4cc, overcomes problems in the earlier tools by incorporating three new features: universal spaced seeds, to increase sensitivity and allow comparisons between species at various evolutionary distances, and powerful splice signal models and evolutionarily-aware alignment techniques, to improve the accuracy of gene models. When tested on vertebrate comparisons at diverse evolutionary distances, sim4cc had significantly higher sensitivity compared to existing alignment programs, more than 10% higher than the closest competitor for some comparisons, while being comparable in speed to its predecessor, sim4. Sim4cc can be used in one-to-one or one-to-many comparisons of genomic and cDNA sequences, and can also be effectively incorporated into a high-throughput annotation engine, as demonstrated by the mapping of 64,000 Fagus grandifolia 454 ESTs and unigenes to the poplar genome.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2699533 | PMC |
http://dx.doi.org/10.1093/nar/gkp319 | DOI Listing |
Nat Commun
January 2025
National-Local Joint Engineering Laboratory of Druggability and New Drug Evaluation, National Engineering Research Center for New Drug and Druggability (cultivation), Guangdong Province Key Laboratory of New Drug Design and Evaluation, School of Pharmaceutical Sciences, Sun Yat-Sen University, Guangzhou, 510006, China.
Epitranscriptomic modifications, particularly N6-methyladenosine (mA), are crucial regulators of gene expression, influencing processes such as RNA stability, splicing, and translation. Traditional computational methods for detecting mA from Nanopore direct RNA sequencing (DRS) data are constrained by their reliance on experimentally validated labels, often resulting in the underestimation of modification sites. Here, we introduce pum6a, an innovative attention-based framework that integrates positive and unlabeled multi-instance learning (MIL) to address the challenges of incomplete labeling and missing read-level annotations.
View Article and Find Full Text PDFmedRxiv
February 2024
Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA.
Recently, a novel African ancestry specific Parkinson's disease (PD) risk signal was identified at the gene encoding glucocerebrosidase (). This variant (rs3115534-G) is carried by ~50% of West African PD cases and imparts a dose-dependent increase in risk for disease. The risk variant has varied frequencies across African ancestry groups, but is almost absent in European and Asian ancestry populations.
View Article and Find Full Text PDFBioinformatics
January 2025
Department of Computer Science, City University of Hong Kong, Hong Kong, China.
Motivation: Proteoforms are the different forms of a proteins generated from the genome with various sequence variations, splice isoforms, and post-translational modifications. Proteoforms regulate protein structures and functions. A single protein can have multiple proteoforms due to different modification sites.
View Article and Find Full Text PDFHum Genomics
January 2025
Population Health Program, QIMR Berghofer Medical Research Institute, Herston, QLD, 4006, Australia.
Background: TP53 variant classification benefits from the availability of large-scale functional data for missense variants generated using cDNA-based assays. However, absence of comprehensive splicing assay data for TP53 confounds the classification of the subset of predicted missense and synonymous variants that are also predicted to alter splicing. Our study aimed to generate and apply splicing assay data for a prioritised group of 59 TP53 predicted missense or synonymous variants that are also predicted to affect splicing by either SpliceAI or MaxEntScan.
View Article and Find Full Text PDFViruses
December 2024
W. Harry Feinstone Department of Molecular Microbiology and Immunology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA.
Chikungunya virus (CHIKV) is an emerging, mosquito-borne arthritic alphavirus increasingly associated with severe neurological sequelae and long-term morbidity. However, there is limited understanding of the crucial host components involved in CHIKV replicase assembly complex formation, and thus virus replication and virulence-determining factors, within the central nervous system (CNS). Furthermore, the majority of CHIKV CNS studies focus on neuronal infection, even though astrocytes represent the main cerebral target.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!