Publications by authors named "Giorgio Valentini"

Multi-omics data have revolutionized biomedical research by providing a comprehensive understanding of biological systems and the molecular mechanisms of disease development. However, analyzing multi-omics data is challenging due to high dimensionality and limited sample sizes, necessitating proper data-reduction pipelines to ensure reliable analyses. Additionally, its multimodal nature requires effective data-integration pipelines.

View Article and Find Full Text PDF

Structured representations of clinical data can support computational analysis of individuals and cohorts, and ontologies representing disease entities and phenotypic abnormalities are now commonly used for translational research. The Medical Action Ontology (MAxO) provides a computational representation of treatments and other actions taken for the clinical management of patients. Currently, manual biocuration is used to assign MAxO terms to rare diseases, enabling clinical management of rare diseases to be described computationally for use in clinical decision support and mechanism discovery.

View Article and Find Full Text PDF

The "RNA world" represents a novel frontier for the study of fundamental biological processes and human diseases and is paving the way for the development of new drugs tailored to each patient's biomolecular characteristics. Although scientific data about coding and non-coding RNA molecules are constantly produced and available from public repositories, they are scattered across different databases and a centralized, uniform, and semantically consistent representation of the "RNA world" is still lacking. We propose RNA-KG, a knowledge graph (KG) encompassing biological knowledge about RNAs gathered from more than 60 public databases, integrating functional relationships with genes, proteins, and chemicals and ontologically grounded biomedical concepts.

View Article and Find Full Text PDF
Article Synopsis
  • - Large language models (LLMs) are being tested for their ability to help diagnose genetic diseases, but their evaluation is complicated due to how they generate unstructured responses.
  • - Researchers benchmarked LLMs against 5,213 case reports using established phenotypic criteria and compared their performance to a traditional diagnostic tool, Exomiser.
  • - The best-performing LLM correctly diagnosed cases 23.6% of the time, while Exomiser achieved 35.5%, indicating that while LLMs are improving, they still lag behind conventional bioinformatics methods and need further research for effective integration into diagnostic processes.
View Article and Find Full Text PDF
Article Synopsis
  • - The study explores how using non-biomedical synonyms can enhance the quality of concept embeddings for biomedical terms, like Myocardial Infarction.
  • - By replacing synonyms with their most common representatives via WordNet, researchers found an average 8% reduction in intra-cluster distance among 1055 concept sets, indicating improved embedding similarity.
  • - The findings suggest that this method is effective in refining biomedical concept embeddings using the Word2Vec algorithm and the approach is accessible through a Python package online.
View Article and Find Full Text PDF

Acute COVID-19 infection can be followed by diverse clinical manifestations referred to as Post Acute Sequelae of SARS-CoV2 Infection (PASC). Studies have shown an increased risk of being diagnosed with new-onset psychiatric disease following a diagnosis of acute COVID-19. However, it was unclear whether non-psychiatric PASC-associated manifestations (PASC-AMs) are associated with an increased risk of new-onset psychiatric disease following COVID-19.

View Article and Find Full Text PDF

Objective: Female reproductive disorders (FRDs) are common health conditions that may present with significant symptoms. Diet and environment are potential areas for FRD interventions. We utilized a knowledge graph (KG) method to predict factors associated with common FRDs (for example, endometriosis, ovarian cyst, and uterine fibroids).

View Article and Find Full Text PDF
Article Synopsis
  • Translational research needs data from different levels of biological systems, but combining that data is tough for scientists.
  • New technologies help gather more data, but researchers struggle to organize all the information effectively.
  • PheKnowLator is a tool that helps scientists create customizable knowledge graphs easily, making it better for managing complex health information without slowing down their work.
View Article and Find Full Text PDF

Motivation: Graph representation learning is a family of related approaches that learn low-dimensional vector representations of nodes and other graph elements called embeddings. Embeddings approximate characteristics of the graph and can be used for a variety of machine-learning tasks such as novel edge prediction. For many biomedical applications, partial knowledge exists about positive edges that represent relationships between pairs of entities, but little to no knowledge is available about negative edges that represent the explicit lack of a relationship between two nodes.

View Article and Find Full Text PDF

Graph representation learning methods opened new avenues for addressing complex, real-world problems represented by graphs. However, many graphs used in these applications comprise millions of nodes and billions of edges and are beyond the capabilities of current methods and software implementations. We present GRAPE (Graph Representation Learning, Prediction and Evaluation), a software resource for graph processing and embedding that is able to scale with big graphs by using specialized and smart data structures, algorithms, and a fast parallel implementation of random-walk-based methods.

View Article and Find Full Text PDF

The recent breakthroughs of Large Language Models (LLMs) in the context of natural language processing have opened the way to significant advances in protein research. Indeed, the relationships between human natural language and the "language of proteins" invite the application and adaptation of LLMs to protein modelling and design. Considering the impressive results of GPT-4 and other recently developed LLMs in processing, generating and translating human languages, we anticipate analogous results with the language of proteins.

View Article and Find Full Text PDF

Background: The cause and symptoms of long COVID are poorly understood. It is challenging to predict whether a given COVID-19 patient will develop long COVID in the future.

Methods: We used electronic health record (EHR) data from the National COVID Cohort Collaborative to predict the incidence of long COVID.

View Article and Find Full Text PDF
Article Synopsis
  • Current large language models like GPT-4 struggle with accurately diagnosing medical conditions from structured data extracted from clinical texts, achieving correct diagnoses only 5.3-17.6% of the time.* -
  • The study highlighted that the performance of the prompts generated from structured data was significantly worse than from original narrative texts, indicating the complexity of clinical language.* -
  • There is a need for further research to improve prompt creation techniques using common clinical data to enhance the effectiveness of AI in supporting medical diagnostics.*
View Article and Find Full Text PDF
Article Synopsis
  • - The study aimed to explore how diet and environmental factors may relate to common female reproductive disorders (FRDs) using a knowledge graph (KG) method to identify associated variables like endometriosis and ovarian cysts.
  • - Researchers utilized data from the Personalized Environment and Genes Study, merging it with nutrient and agricultural chemical data to create a KG, leading to 8535 significant predicted links between FRDs and various external factors based on analysis techniques like random forest and logistic regression.
  • - The findings highlight the potential for future research to investigate these links further, underscoring that while no causal relationships were concluded, the study offers a basis for generating hypotheses related to FRDs and their environmental and dietary influences.
View Article and Find Full Text PDF

Motivation: Advances in RNA sequencing technologies have achieved an unprecedented accuracy in the quantification of mRNA isoforms, but our knowledge of isoform-specific functions has lagged behind. There is a need to understand the functional consequences of differential splicing, which could be supported by the generation of accurate and comprehensive isoform-specific gene ontology annotations.

Results: We present isoform interpretation, a method that uses expectation-maximization to infer isoform-specific functions based on the relationship between sequence and functional isoform similarity.

View Article and Find Full Text PDF
Article Synopsis
  • Healthcare datasets from Electronic Health Records (EHRs) are valuable for studying patient outcomes, but often have missing data that can lead to bias if not handled properly.* -
  • Multiple imputation algorithms aim to fill in missing values, but there's no clear consensus on the best one, and selecting parameters for these algorithms can be challenging.* -
  • This paper presents a new framework for evaluating methods to handle missing data, demonstrating its effectiveness using a large dataset of type-2 diabetes patients and providing insights into how various imputation techniques perform.*
View Article and Find Full Text PDF
Article Synopsis
  • A personalized treatment approach for Multiple Sclerosis is essential due to the variety of available medications, and machine learning is being used to enhance precision medicine.
  • Researchers used machine learning to create models that predict how well patients will respond to the drug fingolimod based on clinical and genetic data, using two patient cohorts from Italy and France.
  • The findings showed that a combined clinical-genetic model improved prediction accuracy for fingolimod response, achieving an AUROC of 0.71, but more research is needed to apply this method in clinical settings.
View Article and Find Full Text PDF

Background: Stratification of patients with post-acute sequelae of SARS-CoV-2 infection (PASC, or long COVID) would allow precision clinical management strategies. However, long COVID is incompletely understood and characterised by a wide range of manifestations that are difficult to analyse computationally. Additionally, the generalisability of machine learning classification of COVID-19 clinical outcomes has rarely been tested.

View Article and Find Full Text PDF

Background: Cis-regulatory regions (CRRs) are non-coding regions of the DNA that fine control the spatio-temporal pattern of transcription; they are involved in a wide range of pivotal processes such as the development of specific cell-lines/tissues and the dynamic cell response to physiological stimuli. Recent studies showed that genetic variants occurring in CRRs are strongly correlated with pathogenicity or deleteriousness. Considering the central role of CRRs in the regulation of physiological and pathological conditions, the correct identification of CRRs and of their tissue-specific activity status through Machine Learning methods plays a major role in dissecting the impact of genetic variants on human diseases.

View Article and Find Full Text PDF
Article Synopsis
  • * They analyzed electronic health records from 52 hospitals and found that metformin significantly reduced the incidence of severe COVID-19 compared to other treatments in those with prediabetes.
  • * While metformin showed some benefits for COVID-19 severity in patients with PCOS, the results were not as strong compared to those in the prediabetes group, highlighting metformin's potential benefits for different conditions.
View Article and Find Full Text PDF

Unlabelled: Acute COVID-19 infection can be followed by diverse clinical manifestations referred to as Post Acute Sequelae of SARS-CoV2 Infection (PASC). Studies have shown an increased risk of being diagnosed with new-onset psychiatric disease following a diagnosis of acute COVID-19. However, it was unclear whether non-psychiatric PASC-associated manifestations (PASC-AMs) are associated with an increased risk of new-onset psychiatric disease following COVID-19.

View Article and Find Full Text PDF

Background: With the continuing COVID-19 pandemic, identifying medications that improve COVID-19 outcomes is crucial. Studies suggest that use of metformin, an oral antihyperglycemic, is associated with reduced COVID-19 severity in individuals with diabetes compared to other antihyperglycemic medications. Some patients without diabetes, including those with polycystic ovary syndrome (PCOS) and prediabetes, are prescribed metformin for off-label use, which provides an opportunity to further investigate the effect of metformin on COVID-19.

View Article and Find Full Text PDF

Patient similarity networks (PSNs), where patients are represented as nodes and their similarities as weighted edges, are being increasingly used in clinical research. These networks provide an insightful summary of the relationships among patients and can be exploited by inductive or transductive learning algorithms for the prediction of patient outcome, phenotype and disease risk. PSNs can also be easily visualized, thus offering a natural way to inspect complex heterogeneous patient data and providing some level of explainability of the predictions obtained by machine learning algorithms.

View Article and Find Full Text PDF

Accurate stratification of patients with post-acute sequelae of SARS-CoV-2 infection (PASC, or long COVID) would allow precision clinical management strategies. However, the natural history of long COVID is incompletely understood and characterized by an extremely wide range of manifestations that are difficult to analyze computationally. In addition, the generalizability of machine learning classification of COVID-19 clinical outcomes has rarely been tested.

View Article and Find Full Text PDF