AI Article Synopsis

  • Publicly available genomic data is crucial for studying human variation and diseases, but often lacks proper labeling and phenotype information, limiting its usefulness.
  • The researchers developed an in silico phenotyping method that utilizes well-annotated data to predict missing phenotypes from genomic measurements, focusing on 70,000 RNA-seq samples processed in the recount2 project.
  • Their approach helps to analyze public genomic data more effectively, allowing for the exploration of biological traits and experimental conditions, with the methods and predictions now accessible through the phenopredict and recount R packages.

Article Abstract

Publicly available genomic data are a valuable resource for studying normal human variation and disease, but these data are often not well labeled or annotated. The lack of phenotype information for public genomic data severely limits their utility for addressing targeted biological questions. We develop an in silico phenotyping approach for predicting critical missing annotation directly from genomic measurements using well-annotated genomic and phenotypic data produced by consortia like TCGA and GTEx as training data. We apply in silico phenotyping to a set of 70 000 RNA-seq samples we recently processed on a common pipeline as part of the recount2 project. We use gene expression data to build and evaluate predictors for both biological phenotypes (sex, tissue, sample source) and experimental conditions (sequencing strategy). We demonstrate how these predictions can be used to study cross-sample properties of public genomic data, select genomic projects with specific characteristics, and perform downstream analyses using predicted phenotypes. The methods to perform phenotype prediction are available in the phenopredict R package and the predictions for recount2 are available from the recount R package. With data and phenotype information available for 70,000 human samples, expression data is available for use on a scale that was not previously feasible.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5961118PMC
http://dx.doi.org/10.1093/nar/gky102DOI Listing

Publication Analysis

Top Keywords

expression data
12
genomic data
12
data
10
data phenotype
8
phenotype prediction
8
public genomic
8
silico phenotyping
8
genomic
6
improving public
4
public rna-seq
4

Similar Publications

External delay and dispersion correction of automatically sampled arterial blood with dual flow rates.

Biomed Phys Eng Express

January 2025

Brain Health Imaging Centre, Centre for Addiction and Mental Health, B68-250 College St, Toronto, Ontario, M5T 1R8, CANADA.

Objective: Arterial sampling for PET imaging often involves continuously measuring the radiotracer activity concentration in blood using an automatic blood sampling system (ABSS). We proposed and validated an external delay and dispersion correction procedure needed when a change in flow rate occurs during data acquisition. We also measured the external dispersion constant of [11C]CURB, [18F]FDG, [18F]FEPPA, and [18F]SynVesT-1.

View Article and Find Full Text PDF

Purpose: Fibroblast growth factor receptor 2 isoform IIIb (FGFR2b) protein overexpression is an emerging biomarker in gastric cancer and gastroesophageal junction cancer (GC). We assessed FGFR2b protein overexpression prevalence in nearly 3,800 tumor samples as part of the prescreening process for a global phase III study in patients with newly diagnosed advanced or metastatic GC.

Methods: As of June 28, 2024, 3,782 tumor samples from prescreened patients from 37 countries for the phase III FORTITUDE-101 trial (ClinicalTrials.

View Article and Find Full Text PDF

Purpose: To investigate whether hormone receptor-positive, human epidermal growth factor receptor 2-low (HR+HER2-low) versus HR+HER2-zero early breast cancers have distinct genomic and clinical characteristics.

Methods: This study included HR+, HER2-negative early breast cancers from patients enrolled in the phase III, randomized BIG 1-98 and SOFT clinical trials that had undergone tumor genomic sequencing. Tumors were classified HR+HER2-low if they had a centrally reviewed HER2 immunohistochemistry (IHC) score of 1+ or 2+ with negative in situ hybridization and HR+HER2-zero if they had an HER2 IHC score of 0.

View Article and Find Full Text PDF

Pollen germination and pollen tube (PT) growth are extremely sensitive to high temperatures. During heat stress (HS), global translation shuts down and favors the maintenance of the essential cellular proteome for cell viability and protection against protein misfolding. Here, we demonstrate that under normal conditions, the Arabidopsis (Arabidopsis thaliana) eukaryotic translation initiation factor subunit eif3m1/eif3m2 double mutant exhibits poor pollen germination, loss of PT integrity and an increased rate of aborted seeds.

View Article and Find Full Text PDF

Background: Transgender and gender diverse (TGD) people seek gender-affirming care at any age to manage gender identities or expressions that differ from their birth gender. Gender-affirming hormone treatment (GAHT) and gender-affirming surgery may alter reproductive function and/or anatomy, limiting future reproductive options to varying degrees, if individuals desire to either give birth or become a biological parent.

Objective And Rationale: TGD people increasingly pursue help for their reproductive questions, including fertility, fertility preservation, active desire for children, and future options.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!