Publications by authors named "Umberto Ferraro Petrillo"

Context: The utility of thyroglobulin (Tg) in the follow-up of differentiated thyroid cancer (DTC) patients has been well-documented. Although third-generation immunoassays have improved accuracy, limitations persist (interfering anti-Tg antibodies and measurement variability). Evolving treatment strategies require a reevaluation of Tg thresholds for optimal patient management.

View Article and Find Full Text PDF

Some scientific studies involve huge amounts of bioinformatics data that cannot be analyzed on personal computers usually employed by researchers for day-to-day activities but rather necessitate effective computational infrastructures that can work in a distributed way. For this purpose, distributed computing systems have become useful tools to analyze large amounts of bioinformatics data and to generate relevant results on virtual environments, where software can be executed for hours or even days without affecting the personal computer or laptop of a researcher. Even if distributed computing resources have become pivotal in multiple bioinformatics laboratories, often researchers and students use them in the wrong ways, making mistakes that can cause the distributed computers to underperform or that can even generate wrong outcomes.

View Article and Find Full Text PDF

Background: Huge amounts of molecular interaction data are continuously produced and stored in public databases. Although many bioinformatics tools have been proposed in the literature for their analysis, based on their modeling through different types of biological networks, several problems still remain unsolved when the problem turns on a large scale.

Results: We propose DIAMIN, that is, a high-level software library to facilitate the development of applications for the efficient analysis of large-scale molecular interaction networks.

View Article and Find Full Text PDF

Motivation: Alignment-free (AF) distance/similarity functions are a key tool for sequence analysis. Experimental studies on real datasets abound and, to some extent, there are also studies regarding their control of false positive rate (Type I error). However, assessment of their power, i.

View Article and Find Full Text PDF

The role of minimal extrathyroidal extension (mETE) as a risk factor for persistent papillary thyroid carcinoma (PTC) is still debated. The aims of this study were to assess the clinical impact of mETE as a predictor of worse initial treatment response in PTC patients and to verify the impact of radioiodine therapy after surgery in patients with mETE. We reviewed all records in the Italian Thyroid Cancer Observatory database and selected 2237 consecutive patients with PTC who satisfied the inclusion criteria (PTC with no lymph node metastases and at least 1 year of follow-up).

View Article and Find Full Text PDF

Background: Storage of genomic data is a major cost for the Life Sciences, effectively addressed via specialized data compression methods. For the same reasons of abundance in data production, the use of Big Data technologies is seen as the future for genomic data storage and processing, with MapReduce-Hadoop as leaders. Somewhat surprisingly, none of the specialized FASTA/Q compressors is available within Hadoop.

View Article and Find Full Text PDF

Motivation: Alignment-free distance and similarity functions (AF functions, for short) are a well-established alternative to pairwise and multiple sequence alignments for many genomic, metagenomic and epigenomic tasks. Due to data-intensive applications, the computation of AF functions is a Big Data problem, with the recent literature indicating that the development of fast and scalable algorithms computing AF functions is a high-priority task. Somewhat surprisingly, despite the increasing popularity of Big Data technologies in computational biology, the development of a Big Data platform for those tasks has not been pursued, possibly due to its complexity.

View Article and Find Full Text PDF

We discuss the challenge of comparing three gene prioritization methods: network propagation, integer linear programming rank aggregation (RA), and statistical RA. These methods are based on different biological categories and estimate disease-gene association. Previously proposed comparison schemes are based on three measures of performance: receiver operating curve, area under the curve, and median rank ratio.

View Article and Find Full Text PDF
Article Synopsis
  • The study assesses the effectiveness of the 2015 American Thyroid Association (ATA) risk stratification system in predicting outcomes for patients with differentiated thyroid cancer (DTC) one year post-treatment.
  • It involved a review of data from 2,071 patients across 40 treatment centers, classifying risk levels as low, intermediate, or high based on the ATA guidelines.
  • Findings indicated that the initial ATA risk classification was a strong predictor of persistent disease, with the center where treatment occurred having little impact on these predictions.
View Article and Find Full Text PDF

Background: Distributed approaches based on the MapReduce programming paradigm have started to be proposed in the Bioinformatics domain, due to the large amount of data produced by the next-generation sequencing techniques. However, the use of MapReduce and related Big Data technologies and frameworks (e.g.

View Article and Find Full Text PDF

Motivation: Information theoretic and compositional/linguistic analysis of genomes have a central role in bioinformatics, even more so since the associated methodologies are becoming very valuable also for epigenomic and meta-genomic studies. The kernel of those methods is based on the collection of k-mer statistics, i.e.

View Article and Find Full Text PDF

Summary: MapReduce Hadoop bioinformatics applications require the availability of special-purpose routines to manage the input of sequence files. Unfortunately, the Hadoop framework does not provide any built-in support for the most popular sequence file formats like FASTA or BAM. Moreover, the development of these routines is not easy, both because of the diversity of these formats and the need for managing efficiently sequence datasets that may count up to billions of characters.

View Article and Find Full Text PDF