Querying massive functional genomic and annotation data collections, linking and summarizing the query results across data sources/data types are important steps in high-throughput genomic and genetic analytical workflows. However, these steps are made difficult by the heterogeneity and breadth of data sources, experimental assays, biological conditions/tissues/cell types and file formats. FILER (FunctIonaL gEnomics Repository) is a framework for querying large-scale genomics knowledge with a large, curated integrated catalog of harmonized functional genomic and annotation data coupled with a scalable genomic search and querying interface. FILER uniquely provides: (i) streamlined access to >50 000 harmonized, annotated genomic datasets across >20 integrated data sources, >1100 tissues/cell types and >20 experimental assays; (ii) a scalable genomic querying interface; and (iii) ability to analyze and annotate user's experimental data. This rich resource spans >17 billion GRCh37/hg19 and GRCh38/hg38 genomic records. Our benchmark querying 7 × 10 hg19 FILER records shows FILER is highly scalable, with a sub-linear 32-fold increase in querying time when increasing the number of queries 1000-fold from 1000 to 1 000 000 intervals. Together, these features facilitate reproducible research and streamline integrating/querying large-scale genomic data within analyses/workflows. FILER can be deployed on cloud or local servers (https://bitbucket.org/wanglab-upenn/FILER) for integration with custom pipelines and is freely available (https://lisanwanglab.org/FILER).

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8759563PMC
http://dx.doi.org/10.1093/nargab/lqab123DOI Listing

Publication Analysis

Top Keywords

querying large-scale
8
functional genomics
8
genomics knowledge
8
genomic
8
functional genomic
8
genomic annotation
8
annotation data
8
data sources
8
experimental assays
8
scalable genomic
8

Similar Publications

Objective: Single-center studies have suggested that solid organ transplant recipients are at increased risk for arterial aneurysms. Moreover, they describe a more aggressive natural history with increased rates of expansion and rupture. In this exploratory analysis, we aim to assess the frequency of arterial aneurysms in solid organ transplant recipients using a large-scale national database.

View Article and Find Full Text PDF

Aligning Large Language Models with Humans: A Comprehensive Survey of ChatGPT's Aptitude in Pharmacology.

Drugs

December 2024

Department of Pharmacy and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610212, China.

Background: Due to the lack of a comprehensive pharmacology test set, evaluating the potential and value of large language models (LLMs) in pharmacology is complex and challenging.

Aims: This study aims to provide a test set reference for assessing the application potential of both general-purpose and specialized LLMs in pharmacology.

Methods: We constructed a pharmacology test set consisting of three tasks: drug information retrieval, lead compound structure optimization, and research trend summarization and analysis.

View Article and Find Full Text PDF

Over the past decade, there has been substantial growth in both the quantity and complexity of available biomedical data. In order to more efficiently harness this extensive data and alleviate challenges associated with integration of multi-omics data, we developed Petagraph, a biomedical knowledge graph that encompasses over 32 million nodes and 118 million relationships. Petagraph leverages more than 180 ontologies and standards in the Unified Biomedical Knowledge Graph (UBKG) to embed millions of quantitative genomics data points.

View Article and Find Full Text PDF

Hydraulic fracturing has unlocked vast amounts of hydrocarbons trapped within unconventional shale formations. This large-scale engineering approach inadvertently introduces microorganisms into the hydrocarbon reservoir, allowing them to inhabit a new physical space and thrive in the unique biogeochemical resources present in the environment. Advancing our fundamental understanding of microbial growth and physiology in this extreme subsurface environment is critical to improving biofouling control efficacy and maximizing opportunities for beneficial natural resource exploitation.

View Article and Find Full Text PDF
Article Synopsis
  • The study aimed to assess the risk of reoperation in patients with spine deformities undergoing major surgeries from the thoracic to pelvis region over a 10-year period, using a large dataset.
  • Out of 7,062 patients, the overall reoperation rate was found to be 23.2%, with specific rates of 16.9% at 2 years and 22.1% at 5 years; factors like preoperative kyphosis and extensive instrumentation were linked to higher risk.
  • Key findings indicated that using interbody cages reduced reoperation risk, while age, medical conditions, and the presence of osteotomies did not significantly affect outcomes.
View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!