Publications by Marco Mesiti | LitMetric

Publications by authors named "Marco Mesiti"

Page 1 of 1

Intrinsic-dimension analysis for guiding dimensionality reduction and data fusion in multi-omics data processing.

Jessica Gliozzo Mauricio Soto-Gomez Valentina Guarino Arturo Bonometti Alberto Cabri Marco Mesiti

Artif Intell Med

December 2024

Multi-omics data have revolutionized biomedical research by providing a comprehensive understanding of biological systems and the molecular mechanisms of disease development. However, analyzing multi-omics data is challenging due to high dimensionality and limited sample sizes, necessitating proper data-reduction pipelines to ensure reliable analyses. Additionally, its multimodal nature requires effective data-integration pipelines.

View Article and Find Full Text PDF

An ontology-based knowledge graph for representing interactions involving RNA molecules.

Emanuele Cavalleri Alberto Cabri Mauricio Soto-Gomez Sara Bonfitto Paolo Perlasca Marco Mesiti

Sci Data

August 2024

The "RNA world" represents a novel frontier for the study of fundamental biological processes and human diseases and is paving the way for the development of new drugs tailored to each patient's biomolecular characteristics. Although scientific data about coding and non-coding RNA molecules are constantly produced and available from public repositories, they are scattered across different databases and a centralized, uniform, and semantically consistent representation of the "RNA world" is still lacking. We propose RNA-KG, a knowledge graph (KG) encompassing biological knowledge about RNAs gathered from more than 60 public databases, integrating functional relationships with genes, proteins, and chemicals and ontologically grounded biomedical concepts.

View Article and Find Full Text PDF

An open source knowledge graph ecosystem for the life sciences.

Tiffany J Callahan Ignacio J Tripodi Adrianne L Stefanski Luca Cappelletti Sanya B Taneja Marco Mesiti

Sci Data

April 2024

Article Synopsis

Translational research needs data from different levels of biological systems, but combining that data is tough for scientists.
New technologies help gather more data, but researchers struggle to organize all the information effectively.
PheKnowLator is a tool that helps scientists create customizable knowledge graphs easily, making it better for managing complex health information without slowing down their work.

View Article and Find Full Text PDF

The promises of large language models for protein design and modeling.

Giorgio Valentini Dario Malchiodi Jessica Gliozzo Marco Mesiti Mauricio Soto-Gomez

Front Bioinform

November 2023

The recent breakthroughs of Large Language Models (LLMs) in the context of natural language processing have opened the way to significant advances in protein research. Indeed, the relationships between human natural language and the "language of proteins" invite the application and adaptation of LLMs to protein modelling and design. Considering the impressive results of GPT-4 and other recently developed LLMs in processing, generating and translating human languages, we anticipate analogous results with the language of proteins.

View Article and Find Full Text PDF

Heterogeneous data integration methods for patient similarity networks.

Jessica Gliozzo Marco Mesiti Marco Notaro Alessandro Petrini Alex Patak

Brief Bioinform

July 2022

Patient similarity networks (PSNs), where patients are represented as nodes and their similarities as weighted edges, are being increasingly used in clinical research. These networks provide an insightful summary of the relationships among patients and can be exploited by inductive or transductive learning algorithms for the prediction of patient outcome, phenotype and disease risk. PSNs can also be easily visualized, thus offering a natural way to inspect complex heterogeneous patient data and providing some level of explainability of the predictions obtained by machine learning algorithms.

View Article and Find Full Text PDF

Multi-resolution visualization and analysis of biomolecular networks through hierarchical community detection and web-based graphical tools.

Paolo Perlasca Marco Frasca Cheick Tidiane Ba Jessica Gliozzo Marco Notaro Marco Mesiti

PLoS One

January 2021

The visual exploration and analysis of biomolecular networks is of paramount importance for identifying hidden and complex interaction patterns among proteins. Although many tools have been proposed for this task, they are mainly focused on the query and visualization of a single protein with its neighborhood. The global exploration of the entire network and the interpretation of its underlying structure still remains difficult, mainly due to the excessively large size of the biomolecular networks.

View Article and Find Full Text PDF

parSMURF, a high-performance computing tool for the genome-wide detection of pathogenic variants.

Alessandro Petrini Marco Mesiti Max Schubach Marco Frasca Daniel Danis

Gigascience

May 2020

Background: Several prediction problems in computational biology and genomic medicine are characterized by both big data as well as a high imbalance between examples to be learned, whereby positive examples can represent a tiny minority with respect to negative examples. For instance, deleterious or pathogenic variants are overwhelmed by the sea of neutral variants in the non-coding regions of the genome: thus, the prediction of deleterious variants is a challenging, highly imbalanced classification problem, and classical prediction tools fail to detect the rare pathogenic examples among the huge amount of neutral variants or undergo severe restrictions in managing big genomic data.

Results: To overcome these limitations we propose parSMURF, a method that adopts a hyper-ensemble approach and oversampling and undersampling techniques to deal with imbalanced data, and parallel computational techniques to both manage big genomic data and substantially speed up the computation.

View Article and Find Full Text PDF

Network modeling of patients' biomolecular profiles for clinical phenotype/outcome prediction.

Jessica Gliozzo Paolo Perlasca Marco Mesiti Elena Casiraghi Viviana Vallacchi

Sci Rep

February 2020

Methods for phenotype and outcome prediction are largely based on inductive supervised models that use selected biomarkers to make predictions, without explicitly considering the functional relationships between individuals. We introduce a novel network-based approach named Patient-Net (P-Net) in which biomolecular profiles of patients are modeled in a graph-structured space that represents gene expression relationships between patients. Then a kernel-based semi-supervised transductive algorithm is applied to the graph to explore the overall topology of the graph and to predict the phenotype/clinical outcome of patients.

View Article and Find Full Text PDF

The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens.

Naihui Zhou Yuxiang Jiang Timothy R Bergquist Alexandra J Lee Balint Z Kacsoh Marco Mesiti

Genome Biol

November 2019

Background: The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function.

Results: Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes.

View Article and Find Full Text PDF

UNIPred-Web: a web tool for the integration and visualization of biomolecular networks for protein function prediction.

Paolo Perlasca Marco Frasca Cheick Tidiane Ba Marco Notaro Alessandro Petrini Marco Mesiti

BMC Bioinformatics

August 2019

Background: One of the main issues in the automated protein function prediction (AFP) problem is the integration of multiple networked data sources. The UNIPred algorithm was thereby proposed to efficiently integrate -in a function-specific fashion- the protein networks by taking into account the imbalance that characterizes protein annotations, and to subsequently predict novel hypotheses about unannotated proteins. UNIPred is publicly available as R code, which might result of limited usage for non-expert users.

View Article and Find Full Text PDF

A GPU-based algorithm for fast node label learning in large and unbalanced biomolecular networks.

Marco Frasca Giuliano Grossi Jessica Gliozzo Marco Mesiti Marco Notaro

BMC Bioinformatics

October 2018

Background: Several problems in network biology and medicine can be cast into a framework where entities are represented through partially labeled networks, and the aim is inferring the labels (usually binary) of the unlabeled part. Connections represent functional or genetic similarity between entities, while the labellings often are highly unbalanced, that is one class is largely under-represented: for instance in the automated protein function prediction (AFP) for most Gene Ontology terms only few proteins are annotated, or in the disease-gene prioritization problem only few genes are actually known to be involved in the etiology of a given disease. Imbalance-aware approaches to accurately predict node labels in biological networks are thereby required.

View Article and Find Full Text PDF

Authorised access web portal for Italian data bank on sudden unexpected perinatal and infant death.

Giulia Ottaviani Paolo Perlasca Marco Mesiti Luca Ferrari Anna M Lavezzi

Acta Paediatr

July 2017

View Article and Find Full Text PDF

An expanded evaluation of protein function prediction methods shows an improvement in accuracy.

Yuxiang Jiang Tal Ronnen Oron Wyatt T Clark Asma R Bankapur Daniel D'Andrea Marco Mesiti

Genome Biol

September 2016

Background: A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging.

View Article and Find Full Text PDF

RANKS: a flexible tool for node label ranking and classification in biological networks.

Giorgio Valentini Giuliano Armano Marco Frasca Jianyi Lin Marco Mesiti

Bioinformatics

September 2016

Unlabelled: RANKS is a flexible software package that can be easily applied to any bioinformatics task formalizable as ranking of nodes with respect to a property given as a label, such as automated protein function prediction, gene disease prioritization and drug repositioning. To this end RANKS provides an efficient and easy-to-use implementation of kernelized score functions, a semi-supervised algorithmic scheme embedding both local and global learning strategies for the analysis of biomolecular networks. To facilitate comparative assessment, baseline network-based methods, e.

View Article and Find Full Text PDF

Think globally and solve locally: secondary memory-based network learning for automated multi-species function prediction.

Marco Mesiti Matteo Re Giorgio Valentini

Gigascience

May 2014

Background: Network-based learning algorithms for automated function prediction (AFP) are negatively affected by the limited coverage of experimental data and limited a priori known functional annotations. As a consequence their application to model organisms is often restricted to well characterized biological processes and pathways, and their effectiveness with poorly annotated species is relatively limited. A possible solution to this problem might consist in the construction of big networks including multiple species, but this in turn poses challenging computational problems, due to the scalability limitations of existing algorithms and the main memory requirements induced by the construction of big networks.

View Article and Find Full Text PDF

GOssTo: a stand-alone application and a web tool for calculating semantic similarities on the Gene Ontology.

Horacio Caniza Alfonso E Romero Samuel Heron Haixuan Yang Alessandra Devoto Marco Mesiti

Bioinformatics

August 2014

Summary: We present GOssTo, the Gene Ontology semantic similarity Tool, a user-friendly software system for calculating semantic similarities between gene products according to the Gene Ontology. GOssTo is bundled with six semantic similarity measures, including both term- and graph-based measures, and has extension capabilities to allow the user to add new similarities. Importantly, for any measure, GOssTo can also calculate the Random Walk Contribution that has been shown to greatly improve the accuracy of similarity measures.

View Article and Find Full Text PDF

A fast ranking algorithm for predicting gene functions in biomolecular networks.

Matteo Re Marco Mesiti Giorgio Valentini

IEEE/ACM Trans Comput Biol Bioinform

July 2013

Ranking genes in functional networks according to a specific biological function is a challenging task raising relevant performance and computational complexity problems. To cope with both these problems we developed a transductive gene ranking method based on kernelized score functions able to fully exploit the topology and the graph structure of biomolecular networks and to capture significant functional relationships between genes. We run the method on a network constructed by integrating multiple biomolecular data sources in the yeast model organism, achieving significantly better results than the compared state-of-the-art network-based algorithms for gene function prediction, and with relevant savings in computational time.

View Article and Find Full Text PDF

XML-based approaches for the integration of heterogeneous bio-molecular data.

Marco Mesiti Ernesto Jiménez-Ruiz Ismael Sanz Rafael Berlanga-Llavori Paolo Perlasca

BMC Bioinformatics

October 2009

Background: The today's public database infrastructure spans a very large collection of heterogeneous biological data, opening new opportunities for molecular biology, bio-medical and bioinformatics research, but raising also new problems for their integration and computational processing.

Results: In this paper we survey the most interesting and novel approaches for the representation, integration and management of different kinds of biological data by exploiting XML and the related recommendations and approaches. Moreover, we present new and interesting cutting edge approaches for the appropriate management of heterogeneous biological data represented through XML.

View Article and Find Full Text PDF