The COVID-19 pandemic is marked by the successive emergence of new SARS-CoV-2 variants, lineages, and sublineages that outcompete earlier strains, largely due to factors like increased transmissibility and immune escape. We propose DeepAutoCoV, an unsupervised deep learning anomaly detection system, to predict future dominant lineages (FDLs). We define FDLs as viral (sub)lineages that will constitute >10% of all the viral sequences added to the GISAID, a public database supporting viral genetic sequence sharing, in a given week.
View Article and Find Full Text PDFArtificial Intelligence (AI) and Machine Learning (ML) approaches that could learn from large data sources have been identified as useful tools to support clinicians in their decisional process; AI and ML implementations have had a rapid acceleration during the recent COVID-19 pandemic. However, many ML classifiers are "black box" to the final user, since their underlying reasoning process is often obscure. Additionally, the performance of such models suffers from poor generalization ability in the presence of dataset shifts.
View Article and Find Full Text PDFBackground: A major obstacle faced by families with rare diseases is obtaining a genetic diagnosis. The average "diagnostic odyssey" lasts over five years and causal variants are identified in under 50%, even when capturing variants genome-wide. To aid in the interpretation and prioritization of the vast number of variants detected, computational methods are proliferating.
View Article and Find Full Text PDFIdentifying disease-causing variants in Rare Disease patients' genome is a challenging problem. To accomplish this task, we describe a machine learning framework, that we called "Suggested Diagnosis", whose aim is to prioritize genetic variants in an exome/genome based on the probability of being disease-causing. To do so, our method leverages standard guidelines for germline variant interpretation as defined by the American College of Human Genomics (ACMG) and the Association for Molecular Pathology (AMP), inheritance information, phenotypic similarity, and variant quality.
View Article and Find Full Text PDFWe show, for the first time, radio measurements of the depth of shower maximum (X_{max}) of air showers induced by cosmic rays that are compared to measurements of the established fluorescence method at the same location. Using measurements at the Pierre Auger Observatory we show full compatibility between our radio and the previously published fluorescence dataset, and between a subset of air showers observed simultaneously with both radio and fluorescence techniques, a measurement setup unique to the Pierre Auger Observatory. Furthermore, we show radio X_{max} resolution as a function of energy and demonstrate the ability to make competitive high-resolution X_{max} measurements with even a sparse radio array.
View Article and Find Full Text PDFThe coronavirus disease of 2019 (COVID-19) pandemic is characterized by sequential emergence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants, lineages, and sublineages, outcompeting previously circulating ones because of, among other factors, increased transmissibility and immune escape. We propose DeepAutoCoV, an unsupervised deep learning anomaly detection system to predict future dominant lineages (FDLs). We define FDLs as viral (sub)lineages that will constitute more than 10% of all the viral sequences added to the GISAID database on a given week.
View Article and Find Full Text PDFBackground: A major obstacle faced by rare disease families is obtaining a genetic diagnosis. The average "diagnostic odyssey" lasts over five years, and causal variants are identified in under 50%. The Rare Genomes Project (RGP) is a direct-to-participant research study on the utility of genome sequencing (GS) for diagnosis and gene discovery.
View Article and Find Full Text PDFBackground: Artificial intelligence (AI) has proved to be of great value in diagnosing and managing infection. ALFABETO (ALL-FAster-BEtter-TOgether) is a tool created to support healthcare professionals in the triage, mainly in optimizing hospital admissions.
Methods: The AI was trained during the pandemic's "first wave" (February-April 2020).
Instantons, which are nonperturbative solutions to Yang-Mills equations, provide a signal for the occurrence of quantum tunneling between distinct classes of vacua. They can give rise to decays of particles otherwise forbidden. Using data collected at the Pierre Auger Observatory, we search for signatures of such instanton-induced processes that would be suggestive of super-heavy particles decaying in the Galactic halo.
View Article and Find Full Text PDFThe data in this article include 10,000 synthetic patients with liver disorders, characterized by 70 different variables, including clinical features, and patient outcomes, such as hospital admission or surgery. Patient data are generated, simulating as close as possible real patient data, using a publicly available Bayesian network describing a casual model for liver disorders. By varying the network parameters, we also generated an additional set of 500 patients with characteristics that deviated from the initial patient population.
View Article and Find Full Text PDFOur knowledge regarding the role proteins play in the mutual relationship among oocytes, surrounding follicle cells, stroma, and the vascular network inside the ovary is still poor and obtaining insights into this context would significantly aid our understanding of folliculogenesis. Here, we describe a spatial proteomics approach to characterize the proteome of individual follicles at different growth stages in a whole prepubertal 25-day-old mouse ovary. A total of 401 proteins were identified by nano-scale liquid chromatography-electrospray ionization-tandem mass spectrometry (nLC-ESI-MS/MS), 69 with a known function in ovary biology, as demonstrated by earlier proteomics studies.
View Article and Find Full Text PDFIncreasingly complex learning methods such as boosting, bagging and deep learning have made ML models more accurate, but harder to interpret and explain, culminating in black-box machine learning models. Model developers and users alike are often presented with a trade-off between performance and intelligibility, especially in high-stakes applications like medicine. In the present article we propose a novel methodological approach for generating explanations for the predictions of a generic machine learning model, given a specific instance for which the prediction has been made.
View Article and Find Full Text PDFObjectives: The objective of this study is the implementation of an automatic procedure to weekly detect new SARS-CoV-2 variants and non-neutral variants (variants of concern (VOC) and variants of interest (VOI)).
Methods: We downloaded spike protein primary sequences from the public resource GISAID and we represented each sequence as k-mer counts. For each week since 1 July 2020, we evaluate if each sequence represents an anomaly based on a One Class support vector machine (SVM) classification algorithm trained on neutral protein sequences collected from February to June 2020.
This study aims to investigate the correlation between intravoxel incoherent motion diffusion-weighted imaging (IVIM-DWI) parameters in magnetic resonance imaging (MRI) and programmed death-ligand 1 (PD-L1) expression in non-small cell lung cancer (NSCLC). Twenty-one patients diagnosed with stage III NSCLC from April 2021 to April 2022 were included. The tumors were distinguished into two groups: no PD-L1 expression (<1%), and positive PD-L1 expression (≥1%).
View Article and Find Full Text PDFStud Health Technol Inform
May 2022
In this work we show that Incremental Machine Learning can be used to predict the classification of emerging SARS-CoV-2 lineages, dynamically distinguishing between neutral variants and non-neutral ones, i.e. variants of interest or variants of concerns.
View Article and Find Full Text PDFGenomic variant interpretation is a critical step of the diagnostic procedure, often supported by the application of tools that may predict the damaging impact of each variant or provide a guidelines-based classification. We propose the application of Machine Learning methodologies, in particular Penalized Logistic Regression, to support variant classification and prioritization. Our approach combines ACMG/AMP guidelines for germline variant interpretation as well as variant annotation features and provides a probabilistic score of pathogenicity, thus supporting the prioritization and classification of variants that would be interpreted as uncertain by the ACMG/AMP guidelines.
View Article and Find Full Text PDFInterest in Machine Learning applications to tackle clinical and biological problems is increasing. This is driven by promising results reported in many research papers, the increasing number of AI-based software products, and by the general interest in Artificial Intelligence to solve complex problems. It is therefore of importance to improve the quality of machine learning output and add safeguards to support their adoption.
View Article and Find Full Text PDFMachine Learning research applied to the medical field is increasing. However, few of the proposed approaches are actually deployed in clinical settings. One reason is that current methods may not be able to generalize on new unseen instances which differ from the training population, thus providing unreliable classifications.
View Article and Find Full Text PDFResearch Question: Can artificial intelligence and advanced image analysis extract and harness novel information derived from cytoplasmic movements of the early human embryo to predict development to blastocyst?
Design: In a proof-of-principle study, 230 human preimplantation embryos were retrospectively assessed using an artificial neural network. After intracytoplasmic sperm injection, embryos underwent time-lapse monitoring for 44 h. For comparison, standard embryo assessment of each embryo by a single embryologist was carried out to predict development to blastocyst stage based on a single picture frame taken at 42 h of development.
In recent years, high-throughput sequencing technologies provide unprecedented opportunity to depict cancer samples at multiple molecular levels. The integration and analysis of these multi-omics datasets is a crucial and critical step to gain actionable knowledge in a precision medicine framework. This paper explores recent data-driven methodologies that have been developed and applied to respond major challenges of stratified medicine in oncology, including patients' phenotyping, biomarker discovery, and drug repurposing.
View Article and Find Full Text PDFThe integration of both genomics and clinical data to model disease progression is now possible, thanks to the increasing availability of molecular patients' profiles. This may lead to the definition of novel decision support tools, able to tailor therapeutic interventions on the basis of a "precise" patients' risk stratification, given their health status evolution. However, longitudinal analysis requires long-term data collection and curation, which can be time demanding, expensive and sometimes unfeasible.
View Article and Find Full Text PDFNavigation is a vital cognitive function for animals to find resources and avoid danger, and navigational processes are theorized to be a critical evolutionary foundation of episodic memory. Path integration, the continuous updating of position and orientation during self-motion, is a major contributor to spatial navigation. However, the most common paradigm for testing path integration-triangle completion-includes potential sources of error that cannot be disentangled.
View Article and Find Full Text PDFVariant interpretation for the diagnosis of genetic diseases is a complex process. The American College of Medical Genetics and Genomics, with the Association for Molecular Pathology, have proposed a set of evidence-based guidelines to support variant pathogenicity assessment and reporting in Mendelian diseases. Cardiovascular disorders are a field of application of these guidelines, but practical implementation is challenging due to the genetic disease heterogeneity and the complexity of information sources that need to be integrated.
View Article and Find Full Text PDF