OpenStats: A robust and scalable software package for reproducible analysis of high-throughput phenotypic data.

Hamed Haselimashhadi Jeremy C Mason Ann-Marie Mallon Damian Smedley Terrence F Meehan Helen Parkinson

PLoS One

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom.

Published: January 2021

Reproducibility in the statistical analyses of data from high-throughput phenotyping screens requires a robust and reliable analysis foundation that allows modelling of different possible statistical scenarios. Regular challenges are scalability and extensibility of the analysis software. In this manuscript, we describe OpenStats, a freely available software package that addresses these challenges. We show the performance of the software in a high-throughput phenomic pipeline in the International Mouse Phenotyping Consortium (IMPC) and compare the agreement of the results with the most similar implementation in the literature. OpenStats has significant improvements in speed and scalability compared to existing software packages including a 13-fold improvement in computational time to the current production analysis pipeline in the IMPC. Reduced complexity also promotes FAIR data analysis by providing transparency and benefiting other groups in reproducing and re-usability of the statistical methods and results. OpenStats is freely available under a Creative Commons license at www.bioconductor.org/packages/OpenStats.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7773254	PMC
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0242933	PLOS

Publication Analysis

Top Keywords

software package

openstats freely

software

analysis

openstats

openstats robust

robust scalable

scalable software

package reproducible

reproducible analysis

Similar Publications

Elevated CA19-9 within the normal range suggests poorer prognosis in stage II CRC: A retrospective analysis of a large sample in a single center.

J Cancer Res Ther

December 2024

Department of Colorectal Surgery, Shanghai Cancer Center, Fudan University, Xuhui District, Shanghai, China.

Ruoxin Zhang Fan Chen Junyong Weng Zilan Ye Xinxiang Li

Objective: Carbohydrate antigen 19-9 (CA19-9) and carcinoembryonic antigen (CEA) serve as pivotal tumor markers in colorectal cancer (CRC). However, uncertainty persists regarding the prognostic significance of the two tumor markers when falling within the normal range. We attempt to compare the prognostic differences of tumor markers at different levels within the reference range.

View Article and Find Full Text PDF

Similar Publications

ModeHunter: A Package for the Reductionist Analysis, Animation, and Application of Elastic Biomolecular Motion.

J Phys Chem B

January 2025

Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York 10065, United States.

Willy Wriggers Maytha Alshammari Joseph N Stember Sebastian Stolzenberg Essam Metwally

ModeHunter is a modular Python software package for the simulation of 3D biophysical motion across spatial resolution scales using modal analysis of elastic networks. It has been curated from our in-house Python scripts over the last 15 years, with a focus on detecting similarities of elastic motion between atomic structures, coarse-grained graphs, and volumetric data obtained from biophysical or biomedical imaging origins, such as electron microscopy or tomography. With ModeHunter, normal modes of biophysical motion can be analyzed with various static visualization techniques or brought to life by dynamics animation in terms of single or multimode trajectories or decoy ensembles.

View Article and Find Full Text PDF

Similar Publications

A Novel Signature Combing Cuproptosis- and Ferroptosis-Related Genes in Nonalcoholic Fatty Liver Disease.

Chin Med Sci J

December 2024

School of Public Health.

Rou-Rou Fang Qi-Fan Yang Jing Zhao Shou-Zhu Xu

Objectives: To identify cuproptosis- and ferroptosis-related genes involved in nonalcoholic fatty liver disease and to determine the diagnostic value of hub genes.

Methods: The gene expression dataset GSE89632 was retrieved from the Gene Expression Omnibus database to identify differentially expressed genes (DEGs) between the non-alcoholic steatohepatitis (NASH) group and the healthy group using the 'limma' package in R software and weighted gene co-expression network analysis. Gene ontology, kyoto encyclopedia of genes and genomes pathway, and single-sample gene set enrichment analyses were performed to identify functional enrichment of DEGs.

View Article and Find Full Text PDF

Similar Publications

Improving the Depth and Reliability of Glycopeptide Identification Using Protein Prospector.

Mol Cell Proteomics

January 2025

Department of Pharmaceutical Chemistry, University of California, San Francisco.

Robert J Chalkley Peter R Baker

Glycosylation is the most common and diverse modification of proteins. It can affect protein function and stability and is associated with many diseases. While proteomic methods to study most post-translational modifications are now quite mature, glycopeptide analysis is still a challenge, particularly from the aspect of data analysis.

View Article and Find Full Text PDF

Similar Publications

Proteoform Identification and Quantification Based on Alignment Graphs.

Bioinformatics

January 2025

Department of Computer Science, City University of Hong Kong, Hong Kong, China.

Zhaohui Zhan Lusheng Wang

Motivation: Proteoforms are the different forms of a proteins generated from the genome with various sequence variations, splice isoforms, and post-translational modifications. Proteoforms regulate protein structures and functions. A single protein can have multiple proteoforms due to different modification sites.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!