Publications by authors named "Luca Pireddu"

Background: There is much value to be gained by linking clinical studies and (biosample-) collections that have been generated in the context of a clinical study. However, the linking problem is hard because usually no direct references between a clinical study and an associated collection are available.

Methods: The BBMRI-ERIC Directory and the ECRIN Metadata Repository (MDR), already include much of the information required to link clinical studies and related sample collections.

View Article and Find Full Text PDF

Recording the provenance of scientific computation results is key to the support of traceability, reproducibility and quality assessment of data products. Several data models have been explored to address this need, providing representations of workflow plans and their executions as well as means of packaging the resulting information for archiving and sharing. However, existing approaches tend to lack interoperable adoption across workflow management systems.

View Article and Find Full Text PDF

Introduction: Prostate cancer (PCa) is the most frequent tumor among men in Europe and has both indolent and aggressive forms. There are several treatment options, the choice of which depends on multiple factors. To further improve current prognostication models, we established the Turin Prostate Cancer Prognostication (TPCP) cohort, an Italian retrospective biopsy cohort of patients with PCa and long-term follow-up.

View Article and Find Full Text PDF

Access to large volumes of so-called whole-slide images-high-resolution scans of complete pathological slides-has become a cornerstone of the development of novel artificial intelligence methods in pathology for diagnostic use, education/training of pathologists, and research. Nevertheless, a methodology based on risk analysis for evaluating the privacy risks associated with sharing such imaging data and applying the principle "as open as possible and as closed as necessary" is still lacking. In this article, we develop a model for privacy risk analysis for whole-slide images which focuses primarily on identity disclosure attacks, as these are the most important from a regulatory perspective.

View Article and Find Full Text PDF

Background: The Investigation/Study/Assay (ISA) Metadata Framework is an established and widely used set of open source community specifications and software tools for enabling discovery, exchange, and publication of metadata from experiments in the life sciences. The original ISA software suite provided a set of user-facing Java tools for creating and manipulating the information structured in ISA-Tab-a now widely used tabular format. To make the ISA framework more accessible to machines and enable programmatic manipulation of experiment metadata, the JSON serialization ISA-JSON was developed.

View Article and Find Full Text PDF

Current high-throughput sequencing technologies allow us to acquire entire genomes in a very short time and at a relatively sustainable cost, thus resulting in an increasing diffusion of genetic test capabilities, in specialized clinical laboratories and research centers. In contrast, it is still limited the impact of genomic information on clinical decisions, as an effective interpretation is a challenging task. From the technological point of view, genomic data are big in size, have a complex granular nature and strongly depend on the computational steps of the generation and processing workflows.

View Article and Find Full Text PDF

Motivation: Developing a robust and performant data analysis workflow that integrates all necessary components whilst still being able to scale over multiple compute nodes is a challenging task. We introduce a generic method based on the microservice architecture, where software tools are encapsulated as Docker containers that can be connected into scientific workflows and executed using the Kubernetes container orchestrator.

Results: We developed a Virtual Research Environment (VRE) which facilitates rapid integration of new tools and developing scalable and interoperable workflows for performing metabolomics data analysis.

View Article and Find Full Text PDF

Background: Metabolomics is the comprehensive study of a multitude of small molecules to gain insight into an organism's metabolism. The research field is dynamic and expanding with applications across biomedical, biotechnological, and many other applied biological domains. Its computationally intensive nature has driven requirements for open data formats, data repositories, and data analysis tools.

View Article and Find Full Text PDF

Motivation: Workflow managers for scientific analysis provide a high-level programming platform facilitating standardization, automation, collaboration and access to sophisticated computing resources. The Galaxy workflow manager provides a prime example of this type of platform. As compositions of simpler tools, workflows effectively comprise specialized computer programs implementing often very complex analysis procedures.

View Article and Find Full Text PDF

With ever-increasing amounts of data being produced by next-generation sequencing (NGS) experiments, the requirements placed on supporting e-infrastructures have grown. In this work, we provide recommendations based on the collective experiences from participants in the EU COST Action SeqAhead for the tasks of data preprocessing, upstream processing, data delivery, and downstream analysis, as well as long-term storage and archiving. We cover demands on computational and storage resources, networks, software stacks, automation of analysis, education, and also discuss emerging trends in the field.

View Article and Find Full Text PDF
Article Synopsis
  • - High-throughput technologies like next-generation sequencing have increased the data intensity in molecular biology, necessitating bioinformaticians to leverage high-performance computing for managing and analyzing large datasets.
  • - Workflow systems could simplify the creation of bioinformatics pipelines, automating tasks and enhancing reproducibility, but their complexity often leads to many pipelines being developed without them.
  • - The EU COST action SeqAhead hackathons revealed that different organizations tackling similar bioinformatics challenges often use varied and incompatible approaches, prompting the need for recommendations to create more efficient and user-friendly workflow systems.
View Article and Find Full Text PDF

Summary: BioBlend.objects is a new component of the BioBlend package, adding an object-oriented interface for the Galaxy REST-based application programming interface. It improves support for metacomputing on Galaxy entities by providing higher-level functionality and allowing users to more easily create programs to explore, query and create Galaxy datasets and workflows.

View Article and Find Full Text PDF

Summary: Hadoop MapReduce-based approaches have become increasingly popular due to their scalability in processing large sequencing datasets. However, as these methods typically require in-depth expertise in Hadoop and Java, they are still out of reach of many bioinformaticians. To solve this problem, we have created SeqPig, a library and a collection of tools to manipulate, analyze and query sequencing datasets in a scalable and simple manner.

View Article and Find Full Text PDF

The complex network of specialized cells and molecules in the immune system has evolved to defend against pathogens, but inadvertent immune system attacks on "self" result in autoimmune disease. Both genetic regulation of immune cell levels and their relationships with autoimmunity are largely undetermined. Here, we report genetic contributions to quantitative levels of 95 cell types encompassing 272 immune traits, in a cohort of 1,629 individuals from four clustered Sardinian villages.

View Article and Find Full Text PDF

Summary: SEAL is a scalable tool for short read pair mapping and duplicate removal. It computes mappings that are consistent with those produced by BWA and removes duplicates according to the same criteria employed by Picard MarkDuplicates. On a 16-node Hadoop cluster, it is capable of processing about 13 GB per hour in map+rmdup mode, while reaching a throughput of 19 GB per hour in mapping-only mode.

View Article and Find Full Text PDF

Quantitative structure-activity relationship (QSAR) analysis has been frequently utilized as a computational tool for the prediction of several eco-toxicological parameters including the acute aquatic toxicity. In the present study, we describe a novel integrated strategy to describe the acute aquatic toxicity through the combination of both toxicokinetic and toxicodynamic behaviors of chemicals. In particular, a robust classification model (TOXclass) has been derived by combining Support Vector Machine (SVM) analysis with three classes of toxicokinetic-like molecular descriptors: the autocorrelation molecular electrostatic potential (autoMEP) vectors, Sterimol topological descriptors and logP(o/w) property values.

View Article and Find Full Text PDF

Pathway Analyst (Path-A) is a publicly available web server (http://path-a.cs.ualberta.

View Article and Find Full Text PDF

A PHP Error was encountered

Severity: Warning

Message: fopen(/var/lib/php/sessions/ci_sessioneg469hpmq0j8lp1lfs3gihnpfuq5i0s4): Failed to open stream: No space left on device

Filename: drivers/Session_files_driver.php

Line Number: 177

Backtrace:

File: /var/www/html/index.php
Line: 316
Function: require_once

A PHP Error was encountered

Severity: Warning

Message: session_start(): Failed to read session data: user (path: /var/lib/php/sessions)

Filename: Session/Session.php

Line Number: 137

Backtrace:

File: /var/www/html/index.php
Line: 316
Function: require_once