Bioinformatics workflows are increasingly used for sharing analyses, serving as a cornerstone for enhancing the reproducibility and shareability of bioinformatics analyses. In particular, Nextflow is a commonly used workflow system, permitting the creation of large workflows while offering substantial flexibility. An increasing number of Nextflow workflows are being shared on repositories such as GitHub.
View Article and Find Full Text PDFMotivation: The reproducibility crisis has highlighted the importance of improving the way bioinformatics data analyses are implemented, executed, and shared. To address this, various tools such as content versioning systems, workflow management systems, and software environment management systems have been developed. While these tools are becoming more widely used, there is still much work to be done to increase their adoption.
View Article and Find Full Text PDFData analysis pipelines are now established as an effective means for specifying and executing bioinformatics data analysis and experiments. While scripting languages, particularly Python, R and notebooks, are popular and sufficient for developing small-scale pipelines that are often intended for a single user, it is now widely recognized that they are by no means enough to support the development of large-scale, shareable, maintainable and reusable pipelines capable of handling large volumes of data and running on high performance computing clusters. This review outlines the key requirements for building large-scale data pipelines and provides a mapping of existing solutions that fulfill them.
View Article and Find Full Text PDFObjectives: To visualize the evolution of all registered COVID-19 vaccine trials.
Study Design And Setting: As part of the living mapping of the COVID-NMA initiative, we identify biweekly all COVID-19 vaccine trials and automatically extract data from the EU clinical trials registry, ClinicalTrials.gov, IRCT and the World Health Organization International Clinical Trials Registry Platform.
Objectives: Researchers worldwide are actively engaging in research activities to search for preventive and therapeutic interventions against coronavirus disease 2019 (COVID-19). Our aim was to describe the planning of randomized controlled trials (RCTs) in terms of timing related to the course of the COVID-19 epidemic and research question evaluated.
Study Design And Setting: We performed a living mapping of RCTs registered in the WHO International Clinical Trials Registry Platform.
Phylogeny.fr, created in 2008, has been designed to facilitate the execution of phylogenetic workflows, and is nowadays widely used. However, since its development, user needs have evolved, new tools and workflows have been published, and the number of jobs has increased dramatically, thus promoting new practices, which motivated its refactoring.
View Article and Find Full Text PDFAnalysing models of biological networks typically relies on workflows in which different software tools with sensitive parameters are chained together, many times with additional manual steps. The accessibility and reproducibility of such workflows is challenging, as publications often overlook analysis details, and because some of these tools may be difficult to install, and/or have a steep learning curve. The CoLoMoTo Interactive Notebook provides a unified environment to edit, execute, share, and reproduce analyses of qualitative models of biological networks.
View Article and Find Full Text PDFBackground: Scientific workflows management systems are increasingly used to specify and manage bioinformatics experiments. Their programming model appeals to bioinformaticians, who can use them to easily specify complex data processing pipelines. Such a model is underpinned by a graph structure, where nodes represent bioinformatics tasks and links represent the dataflow.
View Article and Find Full Text PDFMotivation: High-throughput technologies provide fundamental informations concerning thousands of genes. Many of the current research laboratories daily use one or more of these technologies and end-up with lists of genes. Assessing the originality of the results obtained includes being aware of the number of publications available concerning individual or multiple genes and accessing information about these publications.
View Article and Find Full Text PDFUnlabelled: Biologists are frequently faced with the problem of integrating information from multiple heterogeneous sources with their own experimental data. Given the large number of public sources, it is difficult to choose which sources to integrate without assistance. When doing this manually, biologists differ in their preferences concerning the sources to be queried as well as the strategies, i.
View Article and Find Full Text PDFJ Bioinform Comput Biol
October 2006
Fueled by novel technologies capable of producing massive amounts of data for a single experiment, scientists are faced with an explosion of information which must be rapidly analyzed and combined with other data to form hypotheses and create knowledge. Today, numerous biological questions can be answered without entering a wet lab. Scientific protocols designed to answer these questions can be run entirely on a computer.
View Article and Find Full Text PDFPac Symp Biocomput
January 2007
As the number, richness and diversity of biological sources grow, scientists are increasingly confronted with the problem of selecting appropriate sources and tools. To address this problem, we have designed BioGuidel, a user-centric framework that helps scientists choose sources and tools according to their preferences and strategy, by specifying queries through a user-friendly visual interface. In this paper, we provide a complete RDF representation of BioGuide and introduce XPR (eXtensible Path language for RDF), an extension of FSL2 that is expressive enough to model all BioGuide queries.
View Article and Find Full Text PDFThe present review attempts to cover the most recent initiatives directed towards representing, storing, displaying and processing protein-related data suited to undertake "comparative proteomics" studies. Data interpretation is brought into focus. Efforts invested into analysing and interpreting experimental data increasingly express the need for adding meaning.
View Article and Find Full Text PDFMotivation: Biologists are now faced with the problem of integrating information from multiple heterogeneous public sources with their own experimental data contained in individual sources. The selection of the sources to be considered is thus critically important.
Results: Our aim is to support biologists by developing a module based on an algorithm that presents a selection of sources relevant to their query and matched to their own preferences.