Publications by authors named "Jonathan Mudge"

Background: Nucleotide sequence can be translated in three reading frames from 5' to 3' producing distinct protein products. Many examples of RNA translation in two reading frames (dual coding) have been identified so far.

Results: We report simultaneous translation of mRNA transcripts derived from locus in all three reading frames that result in the synthesis of long proteins.

View Article and Find Full Text PDF

Ensembl (www.ensembl.org) is an open platform integrating publicly available genomics data across the tree of life with a focus on eukaryotic species related to human health, agriculture and biodiversity.

View Article and Find Full Text PDF

GENCODE produces comprehensive reference gene annotation for human and mouse. Entering its twentieth year, the project remains highly active as new technologies and methodologies allow us to catalog the genome at ever-increasing granularity. In particular, long-read transcriptome sequencing enables us to identify large numbers of missing transcripts and to substantially improve existing models, and our long non-coding RNA catalogs have undergone a dramatic expansion and reconfiguration as a result.

View Article and Find Full Text PDF
Article Synopsis
  • - Accurate gene annotations are essential for interpreting how genomes function, and the GENCODE consortium has spent twenty years creating reference annotations for human and mouse genomes, serving as a vital resource for researchers globally.
  • - Previous annotations of long non-coding RNAs (lncRNAs) were incomplete and poorly organized, hindering research, prompting GENCODE to launch a comprehensive effort that resulted in adding nearly 18,000 novel human genes and over 22,000 novel mouse genes, significantly increasing the catalog of transcripts.
  • - The new annotations not only show evolutionary patterns and link to genetic variants associated with traits but also improve understanding of previously unclear genomic functions, greatly advancing research into both human and mouse genetic diseases.
View Article and Find Full Text PDF
Article Synopsis
  • The Human Proteome Project (HPP) aims to identify every protein-coding gene’s isoform and integrate proteomics into studies of human health and disease.
  • Major updates include the retirement of neXtProt as the knowledge base, with UniProtKB now serving as the reference proteome, and GENCODE providing the target protein list.
  • Recent data shows that 93% of protein-coding genes have been expressed, leaving 1,273 non-expressed proteins, along with the introduction of a new scoring system for functional annotation of proteins.
View Article and Find Full Text PDF

Programmed ribosomal frameshifting is a translational recoding phenomenon in which a proportion of ribosomes are stimulated to slip backwards or forwards on an mRNA, rephasing the ribosome relative to the mRNA. While frameshifting is often employed by viruses, very few phylogenetically conserved examples are known in vertebrate genes and the evidence for some of these is controversial. Here we report a +1 frameshifting signal in the coding sequence of the human gene , encoding the ARL8-dependent, lysosome-kinesin-1 adaptor protein PLEKHM2.

View Article and Find Full Text PDF
Article Synopsis
  • Researchers aim to better understand the protein-coding genome due to its importance in human health, while questioning what previous genomic studies may have overlooked regarding non-canonical open reading frames (ncORFs).
  • Over the last ten years, ncORFs have shown potential relevance in human cell types and diseases, but their impact on the human proteome was previously unclear, prompting a collaborative effort to analyze their protein-level evidence.
  • The study found that 25% of analyzed ncORFs contribute to translated proteins, resulting in over 3,000 new peptides from extensive mass spectrometry data, and established an annotation framework and public tools to support ongoing research in this area.
View Article and Find Full Text PDF

Significant efforts have been made to characterize the biophysical properties of proteins. Small proteins have received less attention because their annotation has historically been less reliable. However, recent improvements in sequencing, proteomics, and bioinformatics techniques have led to the high-confidence annotation of small open reading frames (smORFs) that encode for functional proteins, producing smORF-encoded proteins (SEPs).

View Article and Find Full Text PDF
Article Synopsis
  • * They generated over 427 million long-read sequences and found that longer, more accurate sequences yield better transcript detection, while increased read depth enhances quantification.
  • * The study suggests that using reference-based tools works best for well-annotated genomes and recommends incorporating extra data to better identify rare transcripts, providing a benchmark for improving transcriptome analysis techniques in the future.
View Article and Find Full Text PDF

The application of ribosome profiling has revealed an unexpected abundance of translation in addition to that responsible for the synthesis of previously annotated protein-coding regions. Multiple short sequences have been found to be translated within single RNA molecules, within both annotated protein-coding and noncoding regions. The biological significance of this translation is a matter of intensive investigation.

View Article and Find Full Text PDF

Significant efforts have been made to characterize the biophysical properties of proteins. Small proteins have received less attention because their annotation has historically been less reliable. However, recent improvements in sequencing, proteomics, and bioinformatics techniques have led to the high-confidence annotation of small open reading frames (smORFs) that encode for functional proteins, producing smORF-encoded proteins (SEPs).

View Article and Find Full Text PDF

The application of ribosome profiling has revealed an unexpected abundance of translation in addition to that responsible for the synthesis of previously annotated protein-coding regions. Multiple short sequences have been found to be translated within single RNA molecules, both within annotated protein-coding and non-coding regions. The biological significance of this translation is a matter of intensive investigation.

View Article and Find Full Text PDF

Ensembl (https://www.ensembl.org) is a freely available genomic resource that has produced high-quality annotations, tools, and services for vertebrates and model organisms for more than two decades.

View Article and Find Full Text PDF
Article Synopsis
  • Ribosome profiling (Ribo-Seq) has revealed thousands of noncanonical open reading frames (ORFs) that might expand the number of human protein-coding sequences (CDSs) by up to 30%, increasing the count from approximately 19,500 to over 26,000.
  • * However, there are significant uncertainties about how many of these noncanonical ORFs actually produce functional proteins, with estimates varying widely from a few thousand to several hundred thousand.
  • * This research gap has left the genomics and proteomics communities excited but also in need of guidance on how to evaluate the coding potential of these noncanonical ORFs.*
View Article and Find Full Text PDF
Article Synopsis
  • The Long-read RNA-Seq Genome Annotation Assessment Project (LRGASP) Consortium aimed to evaluate long-read sequencing for analyzing transcripts by generating over 427 million sequences from various species.
  • The findings highlighted that longer, accurate sequences yield better transcript identification, while increased read depth enhances quantification accuracy, particularly in well-annotated genomes.
  • The study serves as a benchmark for transcriptome analysis strategies and suggests using additional data for detecting rare transcripts or employing reference-free methods.
View Article and Find Full Text PDF
Article Synopsis
  • - Ribosome profiling (Ribo-seq) has revealed that there may be at least 7,000 non-canonical open reading frames (ORFs) in the human genome that could expand the number of recognized protein-coding sequences by 30% from around 19,500 to over 26,000.
  • - Despite the exciting possibilities for new coding regions, the scientific community faces challenges in verifying how many of these ORFs actually produce proteins, as estimates of their quantity range widely from a few thousand to several hundred thousand.
  • - The article discusses ongoing research on non-canonical ORFs, the use of ribosome profiling and immunopeptidomics to study them, and the need to understand the evidence required to classify
View Article and Find Full Text PDF
Article Synopsis
  • * A deep-learning model can predict allele-specific activity using only local nucleotide sequences, emphasizing key transcription-factor-binding motifs affected by genetic variants.
  • * Combining EN-TEx with previous genome annotations shows significant connections between allele-specific loci and GWAS loci, and aids in transferring known eQTLs to challenging tissue types, improving personal functional genomics research.
View Article and Find Full Text PDF
Article Synopsis
  • This study examines the evolutionary roots of over 7,000 newly identified short open reading frames (sORFs) in humans, finding that many are relatively new in evolutionary terms and formed de novo.
  • Researchers discovered 221 previously overlooked sORFs that can generate tiny peptides, smaller than any known human microprotein.
  • Through mass spectrometry and cellular assays, the study links these small peptides to important biological processes like mRNA splicing and translational regulation, shedding light on the role of young proteins in the human proteome.
View Article and Find Full Text PDF

Pathogenic variations in the sodium voltage-gated channel alpha subunit 1 (SCN1A) gene are responsible for multiple epilepsy phenotypes, including Dravet syndrome, febrile seizures (FS) and genetic epilepsy with FS plus. Phenotypic heterogeneity is a hallmark of SCN1A-related epilepsies, the causes of which are yet to be clarified. Genetic variation in the non-coding regulatory regions of SCN1A could be one potential causal factor.

View Article and Find Full Text PDF

The synthesis of most proteins begins at AUG codons, yet a small number of non-AUG initiated proteoforms are also known. Here we analyse a large number of publicly available Ribo-seq datasets to identify novel, previously uncharacterised non-AUG proteoforms using Trips-Viz implementation of a novel algorithm for detecting translated ORFs. In parallel we analyse genomic alignment of 120 mammals to identify evidence of protein coding evolution in sequences encoding potential extensions.

View Article and Find Full Text PDF

GENCODE produces high quality gene and transcript annotation for the human and mouse genomes. All GENCODE annotation is supported by experimental data and serves as a reference for genome biology and clinical genomics. The GENCODE consortium generates targeted experimental data, develops bioinformatic tools and carries out analyses that, along with externally produced data and methods, support the identification and annotation of transcript structures and the determination of their function.

View Article and Find Full Text PDF

Ensembl (https://www.ensembl.org) has produced high-quality genomic resources for vertebrates and model organisms for more than twenty years.

View Article and Find Full Text PDF

Mesial temporal lobe epilepsy with hippocampal sclerosis and a history of febrile seizures is associated with common variation at rs7587026, located in the promoter region of SCN1A. We sought to explore possible underlying mechanisms. SCN1A expression was analysed in hippocampal biopsy specimens of individuals with mesial temporal lobe epilepsy with hippocampal sclerosis who underwent surgical treatment, and hippocampal neuronal cell loss was quantitatively assessed using immunohistochemistry.

View Article and Find Full Text PDF