Stud Health Technol Inform
August 2024
Real-world data (RWD) has the potential to revolutionize healthcare by offering valuable insights into patient outcomes and treatment efficacy. However, leveraging RWD effectively presents challenges, including its inherent limitations, diverse stakeholders, and insufficient data management pipelines. A proposed framework advocates three essential elements: adherence to FAIR principles (Findable, Accessible, Interoperable, and Reusable), stakeholder engagement and education, and highlighting the need for inclusive, pragmatic federated hybrid pipelines.
View Article and Find Full Text PDFIndividual Specific Networks (ISNs) are a tool used in computational biology to infer Individual Specific relationships between biological entities from omics data. ISNs provide insights into how the interactions among these entities affect their respective functions. To address the scarcity of solutions for efficiently computing ISNs on large biological datasets, we present ISN-tractor, a data-agnostic, highly optimized Python library to build and analyse ISNs.
View Article and Find Full Text PDFBackground: The integrity and reliability of clinical research outcomes rely heavily on access to vast amounts of data. However, the fragmented distribution of these data across multiple institutions, along with ethical and regulatory barriers, presents significant challenges to accessing relevant data. While federated learning offers a promising solution to leverage insights from fragmented data sets, its adoption faces hurdles due to implementation complexities, scalability issues, and inclusivity challenges.
View Article and Find Full Text PDFGenome interpretation (GI) encompasses the computational attempts to model the relationship between genotype and phenotype with the goal of understanding how the first leads to the second. While traditional approaches have focused on sub-problems such as predicting the effect of single nucleotide variants or finding genetic associations, recent advances in neural networks (NNs) have made it possible to develop end-to-end GI models that take genomic data as input and predict phenotypes as output. However, technical and modeling issues still need to be fixed for these models to be effective, including the widespread underdetermination of genomic datasets, making them unsuitable for training large, overfitting-prone, NNs.
View Article and Find Full Text PDFHigh-throughput sequencing allowed the discovery of many disease variants, but nowadays it is becoming clear that the abundance of genomics data mostly just moved the bottleneck in Genetics and Precision Medicine from a data availability issue to a data interpretation issue. To solve this empasse it would be beneficial to apply the latest Deep Learning (DL) methods to the Genome Interpretation (GI) problem, similarly to what AlphaFold did for Structural Biology. Unfortunately DL requires large datasets to be viable, and aggregating genomics datasets poses several legal, ethical and infrastructural complications.
View Article and Find Full Text PDFBackground: Investigating low-prevalence diseases such as multiple sclerosis is challenging because of the rather small number of individuals affected by this disease and the scattering of real-world data across numerous data sources. These obstacles impair data integration, standardization, and analysis, which negatively impact the generation of significant meaningful clinical evidence.
Objective: This study aims to present a comprehensive, research question-agnostic, multistakeholder-driven end-to-end data analysis pipeline that accommodates 3 prevalent data-sharing streams: individual data sharing, core data set sharing, and federated model sharing.
Background: Despite clear evidence of nonlinear interactions in the molecular architecture of polygenic diseases, linear models have so far appeared optimal in genotype-to-phenotype modeling. A key bottleneck for such modeling is that genetic data intrinsically suffers from underdetermination ([Formula: see text]). Millions of variants are present in each individual while the collection of large, homogeneous cohorts is hindered by phenotype incidence, sequencing cost, and batch effects.
View Article and Find Full Text PDFFederated multipartner machine learning has been touted as an appealing and efficient method to increase the effective training data volume and thereby the predictivity of models, particularly when the generation of training data is resource-intensive. In the landmark MELLODDY project, indeed, each of ten pharmaceutical companies realized aggregated improvements on its own classification or regression models through federated learning. To this end, they leveraged a novel implementation extending multitask learning across partners, on a platform audited for privacy and security.
View Article and Find Full Text PDFMotivation: The prediction of reliable Drug-Target Interactions (DTIs) is a key task in computer-aided drug design and repurposing. Here, we present a new approach based on data fusion for DTI prediction built on top of the NXTfusion library, which generalizes the Matrix Factorization paradigm by extending it to the nonlinear inference over Entity-Relation graphs.
Results: We benchmarked our approach on five datasets and we compared our models against state-of-the-art methods.
In vitro non-cellular permeability models such as the parallel artificial membrane permeability assay (PAMPA) are widely applied tools for early-phase drug candidate screening. In addition to the commonly used porcine brain polar lipid extract for modeling the blood-brain barrier's permeability, the total and polar fractions of bovine heart and liver lipid extracts were investigated in the PAMPA model by measuring the permeability of 32 diverse drugs. The zeta potential of the lipid extracts and the net charge of their glycerophospholipid components were also determined.
View Article and Find Full Text PDFBackground And Objectives: Certain demographic and clinical characteristics, including the use of some disease-modifying therapies (DMTs), are associated with severe acute respiratory syndrome coronavirus 2 infection severity in people with multiple sclerosis (MS). Comprehensive exploration of these relationships in large international samples is needed.
Methods: Clinician-reported demographic/clinical data from 27 countries were aggregated into a data set of 5,648 patients with suspected/confirmed coronavirus disease 2019 (COVID-19).
Background: Interferon-β, a disease-modifying therapy (DMT) for MS, may be associated with less severe COVID-19 in people with MS.
Results: Among 5,568 patients (83.4% confirmed COVID-19), interferon-treated patients had lower risk of severe COVID-19 compared to untreated, but not to glatiramer-acetate, dimethyl-fumarate, or pooled other DMTs.
Current human Single Amino acid Variants (SAVs) databases provide a link between a SAVs and their effect on the carrier individual phenotype, often dividing them into Deleterious/Neutral variants. This is a very coarse-grained description of the genotype-to-phenotype relationship because it relies on un-realistic assumptions such as the perfect Mendelian behavior of each SAV and considers only dichotomic phenotypes. Moreover, the link between the effect of a SAV on a protein (its molecular phenotype) and the individual phenotype is often very complex, because multiple level of biological abstraction connect the protein and individual level phenotypes.
View Article and Find Full Text PDFThe complexity and heterogeneity of cancers leads to variable responses of patients to treatments and interventions. Developing models that accurately predict patient's care pathways using prognostic and predictive biomarkers is increasingly important in both clinical practice and scientific research. The main objective of the ATHENA project is to: (1) accelerate data driven precision medicine for two use cases - bladder cancer and multiple myeloma, (2) apply distributed and privacy-preserving analytical methods/ algorithms to stratify patients (decision support), (3) help healthcare professionals deliver earlier and better targeted treatments, and (4) explore care pathway automations and improve outcomes for each patient.
View Article and Find Full Text PDFMotivation: Transcriptional regulation mechanisms allow cells to adapt and respond to external stimuli by altering gene expression. The possible cell transcriptional states are determined by the underlying gene regulatory network (GRN), and reliably inferring such network would be invaluable to understand biological processes and disease progression.
Results: In this article, we present a novel method for the inference of GRNs, called PORTIA, which is based on robust precision matrix estimation, and we show that it positively compares with state-of-the-art methods while being orders of magnitude faster.
Structural bioinformatics suffers from the lack of interfaces connecting biological structures and machine learning methods, making the application of modern neural network architectures impractical. This negatively affects the development of structure-based bioinformatics methods, causing a bottleneck in biological research. Here we present PyUUL ( https://pyuul.
View Article and Find Full Text PDFPolygenic risk score analyses on embryos (PGT-P) are being marketed by some private testing companies to parents using in vitro fertilisation as being useful in selecting the embryos that carry the least risk of disease in later life. It appears that at least one child has been born after such a procedure. But the utility of a PRS in this respect is severely limited, and to date, no clinical research has been performed to assess its diagnostic effectiveness in embryos.
View Article and Find Full Text PDF