Background: The Cancer Genome Atlas Project (TCGA) is a National Cancer Institute effort to profile at least 500 cases of 20 different tumor types using genomic platforms and to make these data, both raw and processed, available to all researchers. TCGA data are currently over 1.2 Petabyte in size and include whole genome sequence (WGS), whole exome sequence, methylation, RNA expression, proteomic, and clinical datasets. Publicly accessible TCGA data are released through public portals, but many challenges exist in navigating and using data obtained from these sites. We developed TCGA Expedition to support the research community focused on computational methods for cancer research. Data obtained, versioned, and archived using TCGA Expedition supports command line access at high-performance computing facilities as well as some functionality with third party tools. For a subset of TCGA data collected at University of Pittsburgh, we also re-associate TCGA data with de-identified data from the electronic health records. Here we describe the software as well as the architecture of our repository, methods for loading of TCGA data to multiple platforms, and security and regulatory controls that conform to federal best practices.
Results: TCGA Expedition software consists of a set of scripts written in Bash, Python and Java that download, extract, harmonize, version and store all TCGA data and metadata. The software generates a versioned, participant- and sample-centered, local TCGA data directory with metadata structures that directly reference the local data files as well as the original data files. The software supports flexible searches of the data via a web portal, user-centric data tracking tools, and data provenance tools. Using this software, we created a collaborative repository, the Pittsburgh Genome Resource Repository (PGRR) that enabled investigators at our institution to work with all TCGA data formats, and to interrogate these data with analysis pipelines, and associated tools. WGS data are especially challenging for individual investigators to use, due to issues with downloading, storage, and processing; having locally accessible WGS BAM files has proven invaluable.
Conclusion: Our open-source, freely available TCGA Expedition software can be used to create a local collaborative infrastructure for acquiring, managing, and analyzing TCGA data and other large public datasets.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5082933 | PMC |
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0165395 | PLOS |
BMC Cancer
January 2025
Department of Otorhinolaryngology, Shenzhen Key Laboratory of Otorhinolaryngology, Longgang Otorhinolaryngology Hospital, Shenzhen Institute of Otorhinolaryngology, No. 3004 Longgang Avenue, Shenzhen, Guangdong, China.
Background: To investigate the role of the translocase of the outer mitochondrial membrane 40 (TOM40) in oral squamous cell carcinoma (OSCC) with the aim of identifying new biomarkers or potential therapeutic targets.
Methods: TOM40 expression level in OSCC was evaluated using datasets downloaded from The Cancer Genome Atlas (TCGA), as well as clinical data. The correlation between TOM40 expression level and the clinicopathological parameters and survival were analyzed in TCGA.
Sci Rep
January 2025
Department of Emergency, the Eighth Affiliated Hospital of Sun Yat-sen University, Shenzhen, Guangdong, China.
Hepatocellular carcinoma (HCC) is a predominant cause of cancer-related mortality globally, noted for its propensity towards late-stage diagnosis and scarcity of effective treatment modalities. The process of metabolic reprogramming, with a specific emphasis on lipid metabolism, is instrumental in the progression of HCC. Nevertheless, the precise mechanisms through which lipid metabolism impacts HCC and its viability as a therapeutic target have yet to be fully elucidated.
View Article and Find Full Text PDFJ Biochem
January 2025
Department of Cellular Biochemistry, Graduate School of Pharmaceutical Sciences, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan.
Glutamate-rich WD40 repeat containing 1 (GRWD1) is a novel oncogene/oncoprotein that downregulates the p53 tumor suppressor protein through several mechanisms. One important mechanism involves binding of GRWD1 to RPL11, which competitively inhibits the RPL11-MDM2 interaction and releases RPL11-mediated suppression of MDM2 ubiquitin ligase activity toward p53. Here, we mined the TCGA (The Cancer Genome Atlas) database to gain in-depth insight into the clinical relevance of GRWD1.
View Article and Find Full Text PDFBMC Med Genomics
January 2025
Department of Oncology, The First People's Hospital of Yibin, No.65, Wenxing Street, Cuiping District, Yibin, 644000, China.
Background: Advanced gastric cancer (GC) exhibits a high recurrence rate and a dismal prognosis. Myocyte enhancer factor 2c (MEF2C) was found to contribute to the development of various types of cancer. Therefore, our aim is to develop a prognostic model that predicts the prognosis of GC patients and initially explore the role of MEF2C in immunotherapy for GC.
View Article and Find Full Text PDFSci Rep
January 2025
Department of Pathology, The Second Xiangya Hospital of Central South University, Changsha, China.
MicroRNA (miRNA) dysregulation has been identified in several carcinomas, including non-small cell lung cancer (NSCLC), and is known to play a role in the development and progression of this disease. We initially conducted a miRNA microarray analysis, which revealed that the MNK inhibitor CGP57380 increased the expression of miR-150-3p. A similar analysis was performed using data from The Cancer Genome Atlas (TCGA).
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!