SampleExplorer: Using language models to discover relevant transcriptome data.

Bioinformatics

The Kids Research Institute Australia, University of Western Australia, Nedlands, WA 6009, Australia.

Published: December 2024

Motivation: Over the last two decades, transcriptomics has become a standard technique in biomedical research. We now have large databases of RNA-seq data, accompanied by valuable metadata detailing scientific objectives and the experimental procedures employed. The metadata is crucial in understanding and replicating published studies, but so far has been underutilised in helping researchers to discover existing datasets.

Results: We present SampleExplorer, a tool allowing researchers to search for relevant data using both text and gene set queries. SampleExplorer embeds sample metadata and uses a transformer-based language model (LM) to retrieve similar datasets. Extensive benchmarking (see Materials and Methods) using the ARCHS4 database demonstrates that SampleExplorer provides an effective approach for retrieving biologically relevant samples from large-scale transcriptomic data.

Conclusions: SampleExplorer provides an efficient approach for discovering relevant gene expression datasets in large public repositories. It improves sample and dataset identification across diverse experimental contexts, helping researchers leverage existing transcriptomic data for potential replication or verification studies.

Supplementary Information: Supplementary data (Materials and Methods) are available at Bioinformatics online.

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btae759DOI Listing

Publication Analysis

Top Keywords

helping researchers
8
materials methods
8
sampleexplorer
5
data
5
sampleexplorer language
4
language models
4
models discover
4
relevant
4
discover relevant
4
relevant transcriptome
4

Similar Publications

Background: Current research on the transmission of trauma and eating disorders across generations is limited. However, quantitative studies suggest that the influence of parents' and grandparents' eating disorders and their prior exposure to trauma are associated with the development of eating disorders in future generations. Qualitative research exploring personal accounts of the impact of transgenerational trauma on the development of eating disorders has been largely unexplored.

View Article and Find Full Text PDF

Background: Patients with estrogen receptor (ER)-positive breast cancer (BC) can be treated with endocrine therapy targeting ER, however, metastatic recurrence occurs in 25% of the patients who have initially been treated. Secreted proteins from tumors play important roles in cancer metastasis but previous methods for isolating secretory proteins had limitations in identifying novel targets.

Methods: We applied an in situ secretory protein labeling technique using TurboID to analyze secretome from tamoxifen-resistant (TAMR) BC.

View Article and Find Full Text PDF

Background: Urinary tract infection (UTI) is a frequent health-threatening condition. Early reliable diagnosis of UTI helps to prevent misuse or overuse of antibiotics and hence prevent antibiotic resistance. The gold standard for UTI diagnosis is urine culture which is a time-consuming and also an error prone method.

View Article and Find Full Text PDF

Development and pilot testing of INTERVENER, a web-based tool to match barriers to the cancer continuum organization to evidence-based interventions.

BMC Health Serv Res

January 2025

Early Detection, Prevention & Infections Branch, International Agency for Research on Cancer, 25 Avenue Tony Garnier, Lyon, 69366 Cedex 07, France.

Background: Barriers to the cancer continuum organization and interventions to approach them have been identified; however, there is a lack of a tool matching them. Our aim was to develop a web-based tool to identify the main barriers to the process of the cancer continuum organization, and propose matched evidence-based interventions (EBI) to overcome them.

Methods: A questionnaire on barriers at six steps of the process of the cancer continuum organization was answered by collaborators.

View Article and Find Full Text PDF

Explainable unsupervised anomaly detection for healthcare insurance data.

BMC Med Inform Decis Mak

January 2025

Department of Electrical Engineering, ESAT-STADIUS, KU Leuven, Kasteelpark Arenberg 10, B-3001 Leuven, Belgium.

Background: Waste and fraud are important problems for health insurers to deal with. With the advent of big data, these insurers are looking more and more towards data mining and machine learning methods to help in detecting waste and fraud. However, labeled data is costly and difficult to acquire as it requires expert investigators and known care providers with atypical behavior.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!