Publications by Michael R Crusoe

Publications by authors named "Michael R Crusoe"

Page 1 of 1

Recording provenance of workflow runs with RO-Crate.

Simone Leo Michael R Crusoe Laura Rodríguez-Navas Raül Sirvent Alexander Kanitz

PLoS One

September 2024

Recording the provenance of scientific computation results is key to the support of traceability, reproducibility and quality assessment of data products. Several data models have been explored to address this need, providing representations of workflow plans and their executions as well as means of packaging the resulting information for archiving and sharing. However, existing approaches tend to lack interoperable adoption across workflow management systems.

View Article and Find Full Text PDF

Implementation of FAIR Practices in Computational Metabolomics Workflows-A Case Study.

Mahnoor Zulfiqar Michael R Crusoe Birgitta König-Ries Christoph Steinbeck Kristian Peters

Metabolites

February 2024

Scientific workflows facilitate the automation of data analysis tasks by integrating various software and tools executed in a particular order. To enable transparency and reusability in workflows, it is essential to implement the FAIR principles. Here, we describe our experiences implementing the FAIR principles for metabolomics workflows using the Metabolome Annotation Workflow (MAW) as a case study.

View Article and Find Full Text PDF

Perspectives on automated composition of workflows in the life sciences.

Anna-Lena Lamprecht Magnus Palmblad Jon Ison Veit Schwämmle Mohammad Sadnan Al Manir Michael R Crusoe

F1000Res

December 2021

Scientific data analyses often combine several computational tools in automated pipelines, or workflows. Thousands of such workflows have been used in the life sciences, though their composition has remained a cumbersome manual process due to a lack of standards for annotation, assembly, and implementation. Recent technological advances have returned the long-standing vision of automated workflow composition into focus.

View Article and Find Full Text PDF

Organizing and running bioinformatics hackathons within Africa: The H3ABioNet cloud computing experience.

Azza E Ahmed Phelelani T Mpangase Sumir Panji Shakuntala Baichoo Yassine Souilmi Michael R Crusoe

AAS Open Res

August 2019

The need for portable and reproducible genomics analysis pipelines is growing globally as well as in Africa, especially with the growth of collaborative projects like the Human Health and Heredity in Africa Consortium (H3Africa). The Pan-African H3Africa Bioinformatics Network (H3ABioNet) recognized the need for portable, reproducible pipelines adapted to heterogeneous computing environments, and for the nurturing of technical expertise in workflow languages and containerization technologies. Building on the network's Standard Operating Procedures (SOPs) for common genomic analyses, H3ABioNet arranged its first Cloud Computing and Reproducible Workflows Hackathon in 2016, with the purpose of translating those SOPs into analysis pipelines able to run on heterogeneous computing environments and meeting the needs of H3Africa research projects.

View Article and Find Full Text PDF

MGnify: the microbiome analysis resource in 2020.

Alex L Mitchell Alexandre Almeida Martin Beracochea Miguel Boland Josephine Burgin Michael R Crusoe

Nucleic Acids Res

January 2020

MGnify (http://www.ebi.ac.

View Article and Find Full Text PDF

Sharing interoperable workflow provenance: A review of best practices and their practical application in CWLProv.

Farah Zaib Khan Stian Soiland-Reyes Richard O Sinnott Andrew Lonie Carole Goble Michael R Crusoe

Gigascience

November 2019

Background: The automation of data analysis in the form of scientific workflows has become a widely adopted practice in many fields of research. Computationally driven data-intensive experiments using workflows enable automation, scaling, adaptation, and provenance support. However, there are still several challenges associated with the effective sharing, publication, and reproducibility of such workflows due to the incomplete capture of provenance and lack of interoperability between different technical (software) platforms.

View Article and Find Full Text PDF

Scalable Workflows and Reproducible Data Analysis for Genomics.

Francesco Strozzi Roel Janssen Ricardo Wurmus Michael R Crusoe George Githinji

Methods Mol Biol

January 2020

Biological, clinical, and pharmacological research now often involves analyses of genomes, transcriptomes, proteomes, and interactomes, within and between individuals and across species. Due to large volumes, the analysis and integration of data generated by such high-throughput technologies have become computationally intensive, and analysis can no longer happen on a typical desktop computer.In this chapter we show how to describe and execute the same analysis using a number of workflow systems and how these follow different approaches to tackle execution and reproducibility issues.

View Article and Find Full Text PDF

Enabling precision medicine via standard communication of HTS provenance, analysis, and results.

Gil Alterovitz Dennis Dean Carole Goble Michael R Crusoe Stian Soiland-Reyes

PLoS Biol

December 2018

A personalized approach based on a patient's or pathogen's unique genomic sequence is the foundation of precision medicine. Genomic findings must be robust and reproducible, and experimental data capture should adhere to findable, accessible, interoperable, and reusable (FAIR) guiding principles. Moreover, effective precision medicine requires standardized reporting that extends beyond wet-lab procedures to computational methods.

View Article and Find Full Text PDF

Recommendations for the packaging and containerizing of bioinformatics software.

Bjorn Gruening Olivier Sallou Pablo Moreno Felipe da Veiga Leprevost Hervé Ménager Michael R Crusoe

F1000Res

November 2019

Software Containers are changing the way scientists and researchers develop, deploy and exchange scientific software. They allow labs of all sizes to easily install bioinformatics software, maintain multiple versions of the same software and combine tools into powerful analysis pipelines. However, containers and software packages should be produced under certain rules and standards in order to be reusable, compatible and easy to integrate into pipelines and analysis workflows.

View Article and Find Full Text PDF

Comparative Genomics Reveals Accelerated Evolution in Conserved Pathways during the Diversification of Anole Lizards.

Marc Tollis Elizabeth D Hutchins Jessica Stapley Shawn M Rupp Walter L Eckalbar Michael R Crusoe

Genome Biol Evol

February 2018

Squamates include all lizards and snakes, and display some of the most diverse and extreme morphological adaptations among vertebrates. However, compared with birds and mammals, relatively few resources exist for comparative genomic analyses of squamates, hampering efforts to understand the molecular bases of phenotypic diversification in such a speciose clade. In particular, the ∼400 species of anole lizard represent an extensive squamate radiation.

View Article and Find Full Text PDF

Walking the Talk: Adopting and Adapting Sustainable Scientific Software Development processes in a Small Biology Lab.

Michael R Crusoe C Titus Brown

J Open Res Softw

November 2016

The khmer software project provides both research and production functionality for largescale nucleic-acid sequence analysis. The software implements several novel data structures and algorithms that perform data pre-fltering for common bioinformatics tasks, including sequence mapping and assembly. Development is driven by a small lab with one full-time developer (MRC), as well as several graduate students and a professor (CTB) who contribute regularly to research features.

View Article and Find Full Text PDF

Channeling Community Contributions to Scientific Software: A Sprint Experience.

Michael R Crusoe C Titus Brown

J Open Res Softw

July 2016

In 2014, the khmer software project participated in a two-day global sprint coordinated by the Mozilla Science Lab. We offered a mentored experience in contributing to a scientific software project for anyone who was interested. We provided entry-level tasks and worked with contributors as they worked through our development process.

View Article and Find Full Text PDF

The khmer software package: enabling efficient nucleotide sequence analysis.

Michael R Crusoe Hussien F Alameldin Sherine Awad Elmar Boucher Adam Caldwell

F1000Res

November 2015

The khmer package is a freely available software library for working efficiently with fixed length DNA words, or k-mers. khmer provides implementations of a probabilistic k-mer counting data structure, a compressible De Bruijn graph representation, De Bruijn graph partitioning, and digital normalization. khmer is implemented in C++ and Python, and is freely available under the BSD license at https://github.

View Article and Find Full Text PDF