Motivation: The integration of vast, complex biological data with computational models offers profound insights and predictive accuracy. Yet, such models face challenges: poor generalization and limited labeled data.
Results: To overcome these difficulties in binary classification tasks, we developed the Method for Optimal Classification by Aggregation (MOCA) algorithm, which addresses the problem of generalization by virtue of being an ensemble learning method and can be used in problems with limited or no labeled data.
Characterizing the effect of combination therapies is vital for treating diseases like cancer. We introduce correlated drug action (CDA), a baseline model for the study of drug combinations in both cell cultures and patient populations, which assumes that the efficacy of drugs in a combination may be correlated. We apply temporal CDA (tCDA) to clinical trial data, and demonstrate the utility of this approach in identifying possible synergistic combinations and others that can be explained in terms of monotherapies.
View Article and Find Full Text PDFThe Centers for Disease Control and Prevention promoted the Test-to-Stay (TTS) program to facilitate in-person instruction in K-12 schools during COVID-19. This program delineates guidelines for schools to regularly test students and staff to minimize risks of infection transmission. TTS enrollment can be implemented via two different consent models: opt-in, in which students do not test regularly by default, and the opposite, opt-out model.
View Article and Find Full Text PDFApplications of machine learning in the biomedical sciences are growing rapidly. This growth has been spurred by diverse cross-institutional and interdisciplinary collaborations, public availability of large datasets, an increase in the accessibility of analytic routines, and the availability of powerful computing resources. With this increased access and exposure to machine learning comes a responsibility for education and a deeper understanding of its bases and bounds, borne equally by data scientists seeking to ply their analytic wares in medical research and by biomedical scientists seeking to harness such methods to glean knowledge from data.
View Article and Find Full Text PDFImportance: With a shortfall in fellowship-trained breast radiologists, mammography screening programs are looking toward artificial intelligence (AI) to increase efficiency and diagnostic accuracy. External validation studies provide an initial assessment of how promising AI algorithms perform in different practice settings.
Objective: To externally validate an ensemble deep-learning model using data from a high-volume, distributed screening program of an academic health system with a diverse patient population.
Binary classification is one of the central problems in machine-learning research and, as such, investigations of its general statistical properties are of interest. We studied the ranking statistics of items in binary classification problems and observed that there is a formal and surprising relationship between the probability of a sample belonging to one of the two classes and the Fermi-Dirac distribution determining the probability that a fermion occupies a given single-particle quantum state in a physical system of noninteracting fermions. Using this equivalence, it is possible to compute a calibrated probabilistic output for binary classifiers.
View Article and Find Full Text PDFCancer testis antigens (CTAs) are an extensive gene family with a unique expression pattern restricted to germ cells, but aberrantly reactivated in cancer tissues. Studies indicate that the expression (or re-expression) of CTAs within the MAGE-A family is common in hepatocellular carcinoma (HCC). However, no systematic characterization has yet been reported.
View Article and Find Full Text PDFOngoing research efforts have been examining how to utilize artificial intelligence technology to help healthcare consumers make sense of their clinical data, such as diagnostic radiology reports. How to promote the acceptance of such novel technology is a heated research topic. Recent studies highlight the importance of providing local explanations about AI prediction and model performance to help users determine whether to trust AI's predictions.
View Article and Find Full Text PDFBackground: Assistive automatic seizure detection can empower human annotators to shorten patient monitoring data review times. We present a proof-of-concept for a seizure detection system that is sensitive, automated, patient-specific, and tunable to maximise sensitivity while minimizing human annotation times. The system uses custom data preparation methods, deep learning analytics and electroencephalography (EEG) data.
View Article and Find Full Text PDFSummary: The advent of high-throughput technologies has provided researchers with measurements of thousands of molecular entities and enable the investigation of the internal regulatory apparatus of the cell. However, network inference from high-throughput data is far from being a solved problem. While a plethora of different inference methods have been proposed, they often lead to non-overlapping predictions, and many of them lack user-friendly implementations to enable their broad utilization.
View Article and Find Full Text PDFSingle-cell RNA-sequencing (scRNAseq) technologies are rapidly evolving. Although very informative, in standard scRNAseq experiments, the spatial organization of the cells in the tissue of origin is lost. Conversely, spatial RNA-seq technologies designed to maintain cell localization have limited throughput and gene coverage.
View Article and Find Full Text PDFOur ability to discover effective drug combinations is limited, in part by insufficient understanding of how the transcriptional response of two monotherapies results in that of their combination. We analyzed matched time course RNAseq profiling of cells treated with single drugs and their combinations and found that the transcriptional signature of the synergistic combination was unique relative to that of either constituent monotherapy. The sequential activation of transcription factors in time in the gene regulatory network was implicated.
View Article and Find Full Text PDFImportance: Mammography screening currently relies on subjective human interpretation. Artificial intelligence (AI) advances could be used to increase mammography screening accuracy by reducing missed cancers and false positives.
Objective: To evaluate whether AI can overcome human mammography interpretation limitations with a rigorous, unbiased evaluation of machine learning algorithms.
Clonal evolution of a tumor ecosystem depends on different selection pressures that are principally immune and treatment mediated. We integrate RNA-seq, DNA sequencing, TCR-seq and SNP array data across multiple regions of liver cancer specimens to map spatio-temporal interactions between cancer and immune cells. We investigate how these interactions reflect intra-tumor heterogeneity (ITH) by correlating regional neo-epitope and viral antigen burden with the regional adaptive immune response.
View Article and Find Full Text PDFThe increasing availability of complex data in biology and medicine has promoted the use of machine learning in classification tasks to address important problems in translational and fundamental science. Two important obstacles, however, may limit the unraveling of the full potential of machine learning in these fields: the lack of generalization of the resulting models and the limited number of labeled data sets in some applications. To address these important problems, we developed an unsupervised ensemble algorithm called strategy for unsupervised multiple method aggregation (SUMMA).
View Article and Find Full Text PDFBiological and regulatory mechanisms underlying many multi-gene expression-based disease biomarkers are often not readily evident. We describe an innovative framework, NeTFactor, that combines network analyses with gene expression data to identify transcription factors (TFs) that significantly and maximally regulate such a biomarker. NeTFactor uses a computationally-inferred context-specific gene regulatory network and applies topological, statistical, and optimization methods to identify regulator TFs.
View Article and Find Full Text PDFThe effectiveness of most cancer targeted therapies is short-lived. Tumors often develop resistance that might be overcome with drug combinations. However, the number of possible combinations is vast, necessitating data-driven approaches to find optimal patient-specific treatments.
View Article and Find Full Text PDFBackground: A total of 10%-20% of patients develop long-term toxicity following radiotherapy for prostate cancer. Identification of common genetic variants associated with susceptibility to radiotoxicity might improve risk prediction and inform functional mechanistic studies.
Methods: We conducted an individual patient data meta-analysis of six genome-wide association studies (n = 3871) in men of European ancestry who underwent radiotherapy for prostate cancer.
ecision models representing the clinical situations where treatment options entail a significant risk of morbidity or mortality should consider the variations in risk preferences of individuals. In this study, we develop a stochastic modeling framework that optimizes risk-sensitive diagnostic decisions after a mammography exam. For a given patient, our objective is to find the utility maximizing diagnostic decisions where we define the utility over quality-adjusted survival duration.
View Article and Find Full Text PDFTo develop a map of cell-cell communication mediated by extracellular RNA (exRNA), the NIH Extracellular RNA Communication Consortium created the exRNA Atlas resource (https://exrna-atlas.org). The Atlas version 4P1 hosts 5,309 exRNA-seq and exRNA qPCR profiles from 19 studies and a suite of analysis and visualization tools.
View Article and Find Full Text PDFExtracellular vesicles (EVs) offer many opportunities in early-stage disease diagnosis, treatment monitoring, and precision therapy owing to their high abundance in bodily fluids, accessibility from liquid biopsy, and presence of nucleic acid and protein cargo from their cell of origin. Despite their growing promise, isolation of EVs for analysis remains a labor-intensive and time-consuming challenge given their nanoscale dimensions (30-200 nm) and low buoyant density. Here, we report a simple, size-based EV separation technology that integrates 1024 nanoscale deterministic lateral displacement (nanoDLD) arrays on a single chip capable of parallel processing sample fluids at rates of up to 900 μL h-1.
View Article and Find Full Text PDFThe response to respiratory viruses varies substantially between individuals, and there are currently no known molecular predictors from the early stages of infection. Here we conduct a community-based analysis to determine whether pre- or early post-exposure molecular factors could predict physiologic responses to viral exposure. Using peripheral blood gene expression profiles collected from healthy subjects prior to exposure to one of four respiratory viruses (H1N1, H3N2, Rhinovirus, and RSV), as well as up to 24 h following exposure, we find that it is possible to construct models predictive of symptomatic response using profiles even prior to viral exposure.
View Article and Find Full Text PDF