Transcriptomic data is often expensive and difficult to generate in large cohorts relative to genomic data; therefore, it is often important to integrate multiple transcriptomic datasets from both microarray- and next generation sequencing (NGS)-based transcriptomic data across similar experiments or clinical trials to improve analytical power and discovery of novel transcripts and genes. However, transcriptomic data integration presents a few challenges including reannotation and batch effect removal. We developed the Gene Expression Data Integration (GEDI) R package to enable transcriptomic data integration by combining existing R packages. With just four functions, the GEDI R package makes constructing a transcriptomic data integration pipeline straightforward. Together, the functions overcome the complications in transcriptomic data integration by automatically reannotating the data and removing the batch effect. The removal of the batch effect is verified with principal component analysis and the data integration is verified using a logistic regression model with forward stepwise feature selection. To demonstrate the functionalities of the GEDI package, we integrated five bovine endometrial transcriptomic datasets from the NCBI Gene Expression Omnibus. These transcriptomic datasets were from multiple high-throughput platforms, namely, array-based Affymetrix and Agilent platforms, and NGS-based Illumina paired-end RNA-seq platform. Furthermore, we compared the GEDI package to existing tools and found that GEDI is the only tool that provides a full transcriptomic data integration pipeline including verification of both batch effect removal and data integration for downstream genomic and bioinformatics applications. © 2024 The Author(s). Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: ReadGE, a function to import gene expression datasets Basic Protocol 2: GEDI, a function to reannotate and merge gene expression datasets Basic Protocol 3: BatchCorrection, a function to remove batch effects from gene expression data Basic Protocol 4: VerifyGEDI, a function to confirm successful integration of gene expression data.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1002/cpz1.70046 | DOI Listing |
Brief Bioinform
November 2024
Institute for Molecular Bioscience, The University of Queensland, 306 Carmody Road, St Lucia, Brisbane, QLD 4072, Australia.
Regulatory genes are critical determinants of cellular responses in development and disease, but standard RNA sequencing (RNA-seq) analysis workflows, such as differential expression analysis, have significant limitations in revealing the regulatory basis of cell identity and function. To address this challenge, we present the TRIAGE R package, a toolkit specifically designed to analyze regulatory elements in both bulk and single-cell RNA-seq datasets. The package is built upon TRIAGE methods, which leverage consortium-level H3K27me3 data to enrich for cell-type-specific regulatory regions.
View Article and Find Full Text PDFBrief Bioinform
November 2024
State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, 2 Sipailou, Xuanwu District, Nanjing 210096, China.
Spatial transcriptomics technologies have been extensively applied in biological research, enabling the study of transcriptome while preserving the spatial context of tissues. Paired with spatial transcriptomics data, platforms often provide histology and (or) chromatin images, which capture cellular morphology and chromatin organization. Additionally, single-cell RNA sequencing (scRNA-seq) data from matching tissues often accompany spatial data, offering a transcriptome-wide gene expression profile of individual cells.
View Article and Find Full Text PDFNat Commun
January 2025
Bioinformatics and computational systems biology of cancer, Institut Curie, Inserm U900, PSL Research University, Paris, France.
Immunotherapy is improving the survival of patients with metastatic non-small cell lung cancer (NSCLC), yet reliable biomarkers are needed to identify responders prospectively and optimize patient care. In this study, we explore the benefits of multimodal approaches to predict immunotherapy outcome using multiple machine learning algorithms and integration strategies. We analyze baseline multimodal data from a cohort of 317 metastatic NSCLC patients treated with first-line immunotherapy, including positron emission tomography images, digitized pathological slides, bulk transcriptomic profiles, and clinical information.
View Article and Find Full Text PDFBreast Cancer Res
January 2025
School of Electronic Engineering and Computer Science, Queen Mary University of London, London, UK.
Recent evidence indicates that endocrine resistance in estrogen receptor-positive (ER+) breast cancer is closely correlated with phenotypic characteristics of epithelial-to-mesenchymal transition (EMT). Nonetheless, identifying tumor tissues with a mesenchymal phenotype remains challenging in clinical practice. In this study, we validated the correlation between EMT status and resistance to endocrine therapy in ER+ breast cancer from a transcriptomic perspective.
View Article and Find Full Text PDFBMC Plant Biol
January 2025
Guangdong Provincial Key Laboratory of Postharvest Science of Fruits and Vegetables/Key Laboratory of Biology and Genetic Improvement of Horticultural Crops, Ministry of Agriculture and Rural Affairs, College of Horticulture, South China Agricultural University, Guangzhou, 510642, China.
Background: Flowering is a complex, finely regulated process involving multiple phytohormones and transcription factors. However, flowering regulation in pitaya (Hylocereus polyrhizus) remains largely unexamined. This study addresses this gap by investigating gibberellin-3 (GA3) effects on flower bud (FB) development in pitaya.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!