SampleExplorer: using language models to discover relevant transcriptome data.

Bioinformatics

The Kids Research Institute Australia, North Entrance, Perth Children's Hospital, 15 Hospital Ave, Nedlands, WA 6009, Australia.

Published: December 2024

Motivation: Over the last two decades, transcriptomics has become a standard technique in biomedical research. We now have large databases of RNA-seq data, accompanied by valuable metadata detailing scientific objectives and the experimental procedures used. The metadata is crucial in understanding and replicating published studies, but so far has been underutilized in helping researchers to discover existing datasets.

Results: We present SampleExplorer, a tool allowing researchers to search for relevant data using both text and gene set queries. SampleExplorer embeds sample metadata and uses a transformer-based language model to retrieve similar datasets. Extensive benchmarking (see Supplementary Materials and Methods) using the ARCHS4 database demonstrates that SampleExplorer provides an effective approach for retrieving biologically relevant samples from large-scale transcriptomicdata. This tool provides an efficient approach for discovering relevant gene expression datasets in large public repositories. It improves sample and dataset identification across diverse experimental contexts, helping researchers leverage existing transcriptomic data for potential replication or verification studies.

Unlabelled: Availability and implementation: SampleExplorer is available as a Python package compatible with versions 3.9 to 3.11, available for installation via the Python Package Index (PyPI). The codebase and documentation are accessible at https://github.com/wlchin/SampleExplorer. Supplementary data (Supplementary Materials and Methods) provides detailed methodological information, including an algorithmic description of the retrieval process and data preparation steps.

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btae759DOI Listing

Publication Analysis

Top Keywords

helping researchers
8
supplementary materials
8
materials methods
8
python package
8
data
6
sampleexplorer
5
sampleexplorer language
4
language models
4
models discover
4
relevant
4

Similar Publications

Imaging Biomarker Studies of Antipsychotic-Naïve First-Episode Schizophrenia in China: Progress and Future Directions.

Schizophr Bull

January 2025

Department of Radiology, and Functional and Molecular Imaging Key Laboratory of Sichuan Province, West China Hospital, Sichuan University, Chengdu 610041, China.

Background And Hypothesis: Identifying biomarkers at onset and specifying the progression over the early course of schizophrenia is critical for better understanding of illness pathophysiology and providing novel information relevant to illness prognosis and treatment selection. Studies of antipsychotic-naïve first-episode schizophrenia in China are making contributions to this goal.

Study Design: A review was conducted for how antipsychotic-naïve first-episode patients were identified and studied, the investigated biological measures, with a focus on neuroimaging, and how they extend the understanding of schizophrenia regarding the illness-related brain abnormality, treatment effect characterization and outcome prediction, and subtype discovery and patient stratification, in comparison to findings from western populations.

View Article and Find Full Text PDF

Immunomodulatory drug (IMiD) resistance is a key clinical challenge in myeloma treatment. Previous data suggests almost one third of myeloma patients acquire mutations in the key IMiD effector cereblon by the time they are pomalidomide refractory. Some events, including stop codons/frameshift mutations and copy loss, having clearly explicable effects on cereblon function.

View Article and Find Full Text PDF

Volatile oils (VOs), synonymously termed essential oils (EOs), are highly hydrophobic liquids obtained from aromatic plants, containing diverse organic compounds for example terpenes and terpenoids. These oils exhibit significant neuroprotective properties, containing antioxidant, anti-inflammatory, anti-apoptotic, glutamate activation, cholinesterase inhibitory action, and anti-protein aggregatory action, making them potential therapeutic agents in managing neurodegenerative diseases (NDs). VOs regulate glutamate activation, enhance synaptic plasticity, and inhibit oxidative stress through the stimulation of antioxidant enzymes.

View Article and Find Full Text PDF

The immune system has emerged as a major factor in the pathogenesis of Alzheimer's disease (AD). PANoptosis is a newly defined programmed cell death mechanism related to many inflammatory diseases. This study aimed to identify the differentially expressed (DE) PANoptosis-related genes with characteristics of immune dysregulation (PRGIDs) in AD using bioinformatics analysis of bulk RNA-seq and single-nuclei RNA sequencing (snRNA-seq) data.

View Article and Find Full Text PDF

Microspore culture is an efficient and rapid method that produces doubled haploid (DH) lines for hybrid breeding in crops and vegetables. However, the low frequency of microspore embryogenesis and spontaneous diploidization in Chinese kale still require improvement. In the present work, an efficient microspore culture protocol was constructed and used for DH producing in Chinese kale breeding.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!