Various types of analyses performed over multi-omics data are driven today by next-generation sequencing (NGS) techniques that produce large volumes of DNA/RNA sequences. Although many tools allow for parallel processing of NGS data in a Big Data distributed environment, they do not facilitate the improvement of the quality of NGS data for a large scale in a simple declarative manner. Meanwhile, large sequencing projects and routine DNA/RNA sequencing associated with molecular profiling of diseases for personalized treatment require both good quality data and appropriate infrastructure for efficient storing and processing of the data. To solve the problems, we adapt the concept of Data Lake for storing and processing big NGS data. We also propose a dedicated library that allows cleaning the DNA/RNA sequences obtained with single-read and paired-end sequencing techniques. To accommodate the growth of NGS data, our solution is largely scalable on the Cloud and may rapidly and flexibly adjust to the amount of data that should be processed. Moreover, to simplify the utilization of the data cleaning methods and implementation of other phases of data analysis workflows, our library extends the declarative U-SQL query language providing a set of capabilities for data extraction, processing, and storing. The results of our experiments prove that the whole solution supports requirements for ample storage and highly parallel, scalable processing that accompanies NGS-based multi-omics data analyses.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8314304PMC
http://dx.doi.org/10.3389/fgene.2021.699280DOI Listing

Publication Analysis

Top Keywords

ngs data
20
data
16
multi-omics data
12
quality ngs
8
data analyses
8
dna/rna sequences
8
storing processing
8
ngs
6
processing
5
large-scale serverless
4

Similar Publications

Hepatocellular carcinoma (HCC) is the most prevalent form of liver cancer, and ranks among the most lethal malignancies globally, primarily due to its high rates of recurrence and metastasis. Despite the urgency, no reliable biomarkers currently exist for predicting tumor recurrence in HCC. Telomerase reverse transcriptase (TERT) promoter mutations (TERTpm) and cellular tumor antigen p53 mutations (TP53m) have been frequently documented in HCC, but their combined clinical significance remains undefined.

View Article and Find Full Text PDF
Article Synopsis
  • The emergence of Next Generation Sequencing (NGS) technology has transformed clinical diagnostics, providing extensive microbiome data for personalized medicine.
  • Despite its potential, microbiome data's complexity and variability pose challenges for traditional statistical and machine learning approaches, including deep learning.
  • The paper presents a novel feature engineering technique that combines two data feature sets, significantly improving the Deep Neural Network's performance in colorectal cancer detection, raising the Area Under the Curve (AUC) from 0.800 to 0.923, thus enhancing microbiome data analysis and disease detection capabilities.
View Article and Find Full Text PDF

A deep intronic variant associated with X-linked hypophosphatemia in a Finnish family.

JBMR Plus

February 2025

Research Program for Clinical and Molecular Metabolism, Faculty of Medicine, University of Helsinki, Helsinki 00014, Finland.

Hypophosphatemic rickets is a rare bone disease characterized by short stature, bone deformities, impaired bone mineralization, and dental problems. Most commonly, hypophosphatemic rickets is caused by pathogenic variants in the X-chromosomal gene, but autosomal dominant and recessive forms also exist. We investigated a Finnish family in which the son (index, 29 yr) and mother (56 yr) had hypophosphatemia since childhood.

View Article and Find Full Text PDF

Introduction: Limited information exists on next-generation sequencing (NGS) success for lung tumors of 30 mm or less. We aimed to compare NGS success rates across biopsy techniques for these tumors, assess DNA sequencing quality, and verify reliability against surgical resection results.

Methods: We used data from the Initiative for Early Lung Cancer Research on Treatment study, including patients with lung tumors measuring 30 mm or less who had surgery and NGS on biopsies since 2016.

View Article and Find Full Text PDF

Aplastic anemia (AA) is a life-threatening bone marrow failure syndrome. The advent of next-generation sequencing (NGS) has shed light on the link between somatic mutations (SM) and the efficacy of immunosuppressive therapy (IST) in AA patients. However, the relationship between SM and hematopoietic stem cell transplantation (HSCT) has not been extensively explored.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!