Automating data analysis pipelines is a key requirement to ensure reproducibility of results, especially when dealing with large volumes of data. Here we assembled automated pipelines for the analysis of High-throughput Sequencing (HTS) data originating from RNA-Seq, ChIP-Seq and Germline variant calling experiments. We implemented these workflows in Common workflow language (CWL) and evaluated their performance by: i) reproducing the results of two previously published studies on Chronic Lymphocytic Leukemia (CLL), and ii) analyzing whole genome sequencing data from four Genome in a Bottle Consortium (GIAB) samples, comparing the detected variants against their respective golden standard truth sets. We demonstrated that CWL-implemented workflows clearly achieved high accuracy in reproducing previously published results, discovering significant biomarkers and detecting germline SNP and small INDEL variants. CWL pipelines are characterized by reproducibility and reusability; combined with containerization, they provide the ability to overcome issues of software incompatibility and laborious configuration requirements. In addition, they are flexible and can be used immediately or adapted to the specific needs of an experiment or study. The CWL-based workflows developed in this study, along with version information for all software tools, are publicly available on GitHub (https://github.com/BiodataAnalysisGroup/CWL_HTS_pipelines) under the MIT License. They are suitable for the analysis of short-read (such as Illumina-based) data and constitute an open resource that can facilitate automation, reproducibility and cross-platform compatibility for standard bioinformatic analyses.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10662043PMC
http://dx.doi.org/10.3389/fbinf.2023.1275593DOI Listing

Publication Analysis

Top Keywords

rna-seq chip-seq
8
chip-seq germline
8
germline variant
8
variant calling
8
common workflow
8
workflow language
8
language cwl
8
reproducing published
8
data
5
software pipelines
4

Similar Publications

Background: Although recent progress provides mechanistic insights into diabetic nephropathy (DN), effective treatments remain scarce. DN, characterized by proteinuria and a progressive decline in renal function, primarily arises from podocyte injury, which impairs the glomerular filtration barrier. Wogonoside, a bioactive compound from the traditional Chinese herb Scutellaria baicalensis, has not been explored for its role in DN.

View Article and Find Full Text PDF

Histone lactylation regulates PRKN-Mediated mitophagy to promote M2 Macrophage polarization in bladder cancer.

Int Immunopharmacol

January 2025

Department of Urology, Nanfang Hospital, Southern Medical University, Guangzhou, Guangdong 510515, PR China. Electronic address:

Background: Bladder cancer (BCa), particularly muscle-invasive bladder cancer (MIBC), is associated with poor prognosis, partly because of immune evasion driven by M2 tumor-associated macrophages (TAMs). Understanding the regulatory mechanisms of M2 macrophage polarization via PRKN-mediated mitophagy and histone lactylation (H3K18la) is crucial for improving treatment strategies.

Methods: A single-cell atlas from 46 human BCa samples was constructed to identify macrophage subpopulations.

View Article and Find Full Text PDF

Effects of MeCP2 on chronic seizures and cognitive function in mice with temporal lobe epilepsy.

Epilepsy Res

January 2025

Institute of Neurobiology, School of Basic Medical Sciences, Xi'an Jiaotong University Health Science Center, 76 West Yanta Road, Xi'an City 710061, China; Institute of Neuroscience, Translational Medicine Institute, Xi'an Jiaotong University Health Science Center, 76 West Yanta Road, Xi'an City 710061, China. Electronic address:

Mutations in methyl CpG binding protein 2 (MeCP2) are linked to Rett syndrome, in which epilepsy is one of the most well-described disorders. However, little is known about the specific role of MeCP2 during epileptogenesis. Our previous study has demonstrated that MeCP2 has a unique control on the development of mossy fiber sprouting (MFS) in the epileptic hippocampus.

View Article and Find Full Text PDF

edgeR is an R/Bioconductor software package for differential analyses of sequencing data in the form of read counts for genes or genomic features. Over the past 15 years, edgeR has been a popular choice for statistical analysis of data from sequencing technologies such as RNA-seq or ChIP-seq. edgeR pioneered the use of the negative binomial distribution to model read count data with replicates and the use of generalized linear models to analyze complex experimental designs.

View Article and Find Full Text PDF

Monocarpic plants flower only once and then produce seeds. Many monocarpic plants require a cold treatment known as vernalization before they flower. This requirement delays flowering until the plant senses warm temperatures in the spring.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!