1,484 results match your criteria: "Cambridge CB10 1SD UK ; Wellcome Trust Sanger Institute[Affiliation]"

Guidelines for Gene and Genome Assembly Nomenclature.

Genetics

January 2025

EMBL-EBI - Non-Vertebrate Genomics Team, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SD, UK.

The rapid increase in the number of reference-quality genome assemblies presents significant new opportunities for genomic research. However, the absence of standardized naming conventions for genome assemblies and annotations across datasets creates substantial challenges. Inconsistent naming hinders the identification of correct assemblies, complicates the integration of bioinformatics pipelines, and makes it difficult to link assemblies across multiple resources.

View Article and Find Full Text PDF

The scientific community has long benefited from the opportunities provided by data reuse. Recognizing the need to identify the challenges and bottlenecks to reuse in the agricultural research community and propose solutions for them, the data reuse working group was started within the AgBioData consortium framework. Here, we identify the limitations of data standards, metadata deficiencies, data interoperability, data ownership, data availability, user skill level, resource availability, and equity issues, with a specific focus on agricultural genomics research.

View Article and Find Full Text PDF

RNA secondary (2D) structure visualisation is an essential tool for understanding RNA function. R2DT is a software package designed to visualise RNA 2D structures in consistent, recognisable, and reproducible layouts. The latest release, R2DT 2.

View Article and Find Full Text PDF

Specialized metabolites are molecules involved in plants' interaction with their environment. Elucidating their biosynthetic pathways is a challenging but rewarding task, leading to societal applications and ecological insights. Furanocoumarins emerged multiple times in Angiosperms, raising the question of how different enzymes evolved into catalyzing identical reactions.

View Article and Find Full Text PDF

Background: Bioinformatics is fundamental to biomedical sciences, but its mastery presents a steep learning curve for bench biologists and clinicians. Learning to code while analyzing data is difficult. The curve may be flattened by separating these two aspects and providing intermediate steps for budding bioinformaticians.

View Article and Find Full Text PDF

The PRIDE database is the largest public data repository of mass spectrometry-based proteomics data and currently stores more than 40,000 data sets covering a wide range of organisms, experimental techniques, and biological conditions. During the past few years, PRIDE has seen a significant increase in the amount of submitted data-independent acquisition (DIA) proteomics data sets. This provides an excellent opportunity for large-scale data reanalysis and reuse.

View Article and Find Full Text PDF
Article Synopsis
  • The use of well-structured ontologies and ontology-aware tools enhances data and analyses to be FAIR (Findable, Accessible, Interoperable, Reusable), supporting effective lexical searches and biologically meaningful annotation grouping.
  • Researchers face challenges in adopting ontologies, primarily due to their complexity and the tendency to create simplified hierarchies that may misuse relationship types, leading to ineffective organization.
  • A suite of validation tools is introduced to help users align their hierarchies with established ontology structures, providing graphical reports and tailored views for various atlases like the HuBMAP Human Reference Atlas and the Human Developmental Cell Atlas.
View Article and Find Full Text PDF

Alleviating batch effects in cell type deconvolution with SCCAF-D.

Nat Commun

December 2024

GMU-GIBH Joint School of Life Sciences, The Guangdong-Hong Kong-Macao Joint Laboratory for Cell Fate Regulation and Diseases, Guangzhou Laboratory, Guangzhou Medical University, Guangzhou, China.

Cell type deconvolution methods can impute cell proportions from bulk transcriptomics data, revealing changes in disease progression or organ development. But benchmarking studies often use simulated bulk data from the same source as the reference, which limits its application scenarios. This study examines batch effects in deconvolution and introduces SCCAF-D, a computational workflow that ensures a Pearson Correlation Coefficient above 0.

View Article and Find Full Text PDF

Motivation: Developing competency in the broad area of bioinformatics is challenging globally, owing to the breadth of the field and the diversity of its audiences for education and training. Course design can be facilitated by the use of a competency framework-a set of competency requirements that define the knowledge, skills and attitudes needed by individuals in (or aspiring to be in) a particular profession or role. These competency requirements can help to define curricula as they can inform both the content and level to which competency needs to be developed.

View Article and Find Full Text PDF

Ensembl 2025.

Nucleic Acids Res

January 2025

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

Ensembl (www.ensembl.org) is an open platform integrating publicly available genomics data across the tree of life with a focus on eukaryotic species related to human health, agriculture and biodiversity.

View Article and Find Full Text PDF

New developments for the Quest for Orthologs benchmark service.

NAR Genom Bioinform

December 2024

Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden.

The Quest for Orthologs (QfO) orthology benchmark service (https://orthology.benchmarkservice.org) hosts a wide range of standardized benchmarks for orthology inference evaluation.

View Article and Find Full Text PDF

Supervised machine learning (ML) is used extensively in biology and deserves closer scrutiny. The Data Optimization Model Evaluation (DOME) recommendations aim to enhance the validation and reproducibility of ML research by establishing standards for key aspects such as data handling and processing, optimization, evaluation, and model interpretability. The recommendations help to ensure that key details are reported transparently by providing a structured set of questions.

View Article and Find Full Text PDF

Motivation: Genome-wide association studies (GWAS) have been remarkably successful in identifying associations between genetic variants and imaging-derived phenotypes. To date, the main focus of these analyses has been on established, clinically-used imaging features. We sought to investigate if deep learning approaches can detect more nuanced patterns of image variability.

View Article and Find Full Text PDF

PHI-base - the multi-species pathogen-host interaction database in 2025.

Nucleic Acids Res

January 2025

Protecting Crops and the Environment, Rothamsted Research, Harpenden AL5 2JQ, UK.

Article Synopsis
  • The Pathogen-Host Interactions Database (PHI-base) has been curating genes related to various pathogens since 2005, focusing on their roles in pathogenicity and interactions with different hosts, including humans and plants.
  • The latest update, version 4.17, shows significant growth with a 19% increase in genes and a 23% increase in interactions since the previous version.
  • The upcoming version 5.0 introduces a new curation workflow, unifies existing data, and enhances data-sharing capabilities, making it a more comprehensive resource for researchers, available at specific online portals.
View Article and Find Full Text PDF

CATH (https://www.cathdb.info) is a structural classification database that assigns domains to the structures in the Protein Data Bank (PDB) and AlphaFold Protein Structure Database (AFDB) and adds layers of biological information, including homology and functional annotation.

View Article and Find Full Text PDF

GENCODE 2025: reference gene annotation for human and mouse.

Nucleic Acids Res

January 2025

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

GENCODE produces comprehensive reference gene annotation for human and mouse. Entering its twentieth year, the project remains highly active as new technologies and methodologies allow us to catalog the genome at ever-increasing granularity. In particular, long-read transcriptome sequencing enables us to identify large numbers of missing transcripts and to substantially improve existing models, and our long non-coding RNA catalogs have undergone a dramatic expansion and reconfiguration as a result.

View Article and Find Full Text PDF
Article Synopsis
  • - Accurate gene annotations are essential for interpreting how genomes function, and the GENCODE consortium has spent twenty years creating reference annotations for human and mouse genomes, serving as a vital resource for researchers globally.
  • - Previous annotations of long non-coding RNAs (lncRNAs) were incomplete and poorly organized, hindering research, prompting GENCODE to launch a comprehensive effort that resulted in adding nearly 18,000 novel human genes and over 22,000 novel mouse genes, significantly increasing the catalog of transcripts.
  • - The new annotations not only show evolutionary patterns and link to genetic variants associated with traits but also improve understanding of previously unclear genomic functions, greatly advancing research into both human and mouse genetic diseases.
View Article and Find Full Text PDF

The Pfam protein families database: embracing AI/ML.

Nucleic Acids Res

January 2025

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK.

The Pfam protein families database is a comprehensive collection of protein domains and families used for genome annotation and protein structure and function analysis (https://www.ebi.ac.

View Article and Find Full Text PDF

The international nucleotide sequence database collaboration (INSDC): enhancing global participation.

Nucleic Acids Res

January 2025

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA.

The members of the International Nucleotide Sequence Database Collaboration (INSDC; https://insdc.org) have built systems to collect, archive and disseminate sequence data for more than four decades. The three collaborating organizations, the National Library of Medicine, National Center for Biotechnology Information (NLM-NCBI) in the United States, Research Organization of Information and Systems, National Institute of Genetics (ROIS-NIG) in Japan; and the European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI) formalized their relationship through the adoption of an arrangement which documents their commitment to free and open access to genomic sequences.

View Article and Find Full Text PDF

The NHGRI-EBI GWAS Catalog: standards for reusability, sustainability and diversity.

Nucleic Acids Res

January 2025

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

The NHGRI-EBI GWAS Catalog serves as a vital resource for the genetic research community, providing access to the most comprehensive database of human GWAS results. Currently, it contains close to 7 000 publications for >15 000 traits, from which more than 625 000 lead associations have been curated. Additionally, 85 000 full genome-wide summary statistics datasets-containing association data for all variants in the analysis-are available for downstream analyses such as meta-analysis, fine-mapping, Mendelian randomisation or development of polygenic risk scores.

View Article and Find Full Text PDF

Rfam 15: RNA families database in 2025.

Nucleic Acids Res

January 2025

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

The Rfam database, a widely used repository of non-coding RNA families, has undergone significant updates in release 15.0. This paper introduces major improvements, including the expansion of Rfamseq to 26 106 genomes, a 76% increase, incorporating the latest UniProt reference proteomes and additional viral genomes.

View Article and Find Full Text PDF
Article Synopsis
  • The Human Proteome Project (HPP) aims to identify every protein-coding gene’s isoform and integrate proteomics into studies of human health and disease.
  • Major updates include the retirement of neXtProt as the knowledge base, with UniProtKB now serving as the reference proteome, and GENCODE providing the target protein list.
  • Recent data shows that 93% of protein-coding genes have been expressed, leaving 1,273 non-expressed proteins, along with the introduction of a new scoring system for functional annotation of proteins.
View Article and Find Full Text PDF

The PRIDE database at 20 years: 2025 update.

Nucleic Acids Res

January 2025

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.

Article Synopsis
  • The PRIDE database is a premier repository for mass spectrometry-based proteomics data and plays a key role in the ProteomeXchange consortium, facilitating research sharing.
  • Over the past three years, PRIDE has made significant advancements, including a new file transfer protocol and an automatic dataset validation process, resulting in approximately 534 datasets submitted monthly.
  • Recent innovations include the introduction of a PRIDE chatbot for user support and enhanced efforts to integrate high-quality data with resources like UniProt and Ensembl.
View Article and Find Full Text PDF