Extracellular vesicles (EVs) are heterogeneous entities secreted by cells into their microenvironment and systemic circulation. Circulating EVs carry functional small RNAs and other molecular footprints from their cell of origin, and thus have evident applications in liquid biopsy, therapeutics, and intercellular communication. Yet, the complete transcriptomic landscape of EVs is poorly characterized due to critical limitations including variable protocols used for EV-RNA extraction, quality control, cDNA library preparation, sequencing technologies, and bioinformatic analyses.
View Article and Find Full Text PDFBackground: Clinical trials are vital for developing new therapies but can also delay drug development. Efficient trial data management, optimized trial protocol, and accurate patient identification are critical for reducing trial timelines. Natural language processing (NLP) has the potential to achieve these objectives.
View Article and Find Full Text PDFMotivation: The integration of vast, complex biological data with computational models offers profound insights and predictive accuracy. Yet, such models face challenges: poor generalization and limited labeled data.
Results: To overcome these difficulties in binary classification tasks, we developed the Method for Optimal Classification by Aggregation (MOCA) algorithm, which addresses the problem of generalization by virtue of being an ensemble learning method and can be used in problems with limited or no labeled data.
Characterizing the effect of combination therapies is vital for treating diseases like cancer. We introduce correlated drug action (CDA), a baseline model for the study of drug combinations in both cell cultures and patient populations, which assumes that the efficacy of drugs in a combination may be correlated. We apply temporal CDA (tCDA) to clinical trial data, and demonstrate the utility of this approach in identifying possible synergistic combinations and others that can be explained in terms of monotherapies.
View Article and Find Full Text PDFEvery year, 11% of infants are born preterm with significant health consequences, with the vaginal microbiome a risk factor for preterm birth. We crowdsource models to predict (1) preterm birth (PTB; <37 weeks) or (2) early preterm birth (ePTB; <32 weeks) from 9 vaginal microbiome studies representing 3,578 samples from 1,268 pregnant individuals, aggregated from public raw data via phylogenetic harmonization. The predictive models are validated on two independent unpublished datasets representing 331 samples from 148 pregnant individuals.
View Article and Find Full Text PDFBackground: Although optimal sequencing of systemic therapy in cancer care is critical to achieving maximal clinical benefit, there is a lack of analysis of treatment sequencing in advanced non-small cell lung cancer (aNSCLC) in real-world settings.
Methods: A retrospective cohort study of 13,340 lung cancer patients within the Mount Sinai Health System (MSHS) was performed. Systemic therapy data of aNSCLC in 2,106 patients was the starting point in our analysis to investigate how treatment sequencing has evolved, the impact of sequencing patterns on clinical outcomes, and the effectiveness of 2 line chemotherapy after patients progressed on immune checkpoint inhibitor (ICI)-based therapy as the 1 line of therapy (LOT).
Globally, every year about 11% of infants are born preterm, defined as a birth prior to 37 weeks of gestation, with significant and lingering health consequences. Multiple studies have related the vaginal microbiome to preterm birth. We present a crowdsourcing approach to predict: (a) preterm or (b) early preterm birth from 9 publicly available vaginal microbiome studies representing 3,578 samples from 1,268 pregnant individuals, aggregated from raw sequences via an open-source tool, MaLiAmPi.
View Article and Find Full Text PDFImportance: An automated, accurate method is needed for unbiased assessment quantifying accrual of joint space narrowing and erosions on radiographic images of the hands and wrists, and feet for clinical trials, monitoring of joint damage over time, assisting rheumatologists with treatment decisions. Such a method has the potential to be directly integrated into electronic health records.
Objectives: To design and implement an international crowdsourcing competition to catalyze the development of machine learning methods to quantify radiographic damage in rheumatoid arthritis (RA).
Circulating extracellular vesicles (EVs) contain molecular footprints-lipids, proteins, RNA, and DNA-from their cell of origin. Consequently, EV-associated RNA and proteins have gained widespread interest as liquid-biopsy biomarkers. Yet, an integrative proteo-transcriptomic landscape of EVs and comparison with their cell of origin remains obscure.
View Article and Find Full Text PDFWe now know RNA can survive the harsh environment of biofluids when encapsulated in vesicles or by associating with lipoproteins or RNA binding proteins. These extracellular RNA (exRNA) play a role in intercellular signaling, serve as biomarkers of disease, and form the basis of new strategies for disease treatment. The Extracellular RNA Communication Consortium (ERCC) hosted a two-day online workshop (April 19-20, 2021) on the unique challenges of exRNA data analysis.
View Article and Find Full Text PDFBinary classification is one of the central problems in machine-learning research and, as such, investigations of its general statistical properties are of interest. We studied the ranking statistics of items in binary classification problems and observed that there is a formal and surprising relationship between the probability of a sample belonging to one of the two classes and the Fermi-Dirac distribution determining the probability that a fermion occupies a given single-particle quantum state in a physical system of noninteracting fermions. Using this equivalence, it is possible to compute a calibrated probabilistic output for binary classifiers.
View Article and Find Full Text PDFIdentification of pregnancies at risk of preterm birth (PTB), the leading cause of newborn deaths, remains challenging given the syndromic nature of the disease. We report a longitudinal multi-omics study coupled with a DREAM challenge to develop predictive models of PTB. The findings indicate that whole-blood gene expression predicts ultrasound-based gestational ages in normal and complicated pregnancies (r = 0.
View Article and Find Full Text PDFBackground: Assistive automatic seizure detection can empower human annotators to shorten patient monitoring data review times. We present a proof-of-concept for a seizure detection system that is sensitive, automated, patient-specific, and tunable to maximise sensitivity while minimizing human annotation times. The system uses custom data preparation methods, deep learning analytics and electroencephalography (EEG) data.
View Article and Find Full Text PDFSummary: The advent of high-throughput technologies has provided researchers with measurements of thousands of molecular entities and enable the investigation of the internal regulatory apparatus of the cell. However, network inference from high-throughput data is far from being a solved problem. While a plethora of different inference methods have been proposed, they often lead to non-overlapping predictions, and many of them lack user-friendly implementations to enable their broad utilization.
View Article and Find Full Text PDFSingle-cell RNA-sequencing (scRNAseq) technologies are rapidly evolving. Although very informative, in standard scRNAseq experiments, the spatial organization of the cells in the tissue of origin is lost. Conversely, spatial RNA-seq technologies designed to maintain cell localization have limited throughput and gene coverage.
View Article and Find Full Text PDFOur ability to discover effective drug combinations is limited, in part by insufficient understanding of how the transcriptional response of two monotherapies results in that of their combination. We analyzed matched time course RNAseq profiling of cells treated with single drugs and their combinations and found that the transcriptional signature of the synergistic combination was unique relative to that of either constituent monotherapy. The sequential activation of transcription factors in time in the gene regulatory network was implicated.
View Article and Find Full Text PDFThe advent of microfluidics in the 1990s promised a revolution in multiple industries from healthcare to chemical processing. Deterministic lateral displacement (DLD) is a continuous-flow microfluidic particle separation method discovered in 2004 that has been applied successfully and widely to the separation of blood cells, yeast, spores, bacteria, viruses, DNA, droplets, and more. Deterministic lateral displacement is conceptually simple and can deliver consistent performance over a wide range of flow rates and particle concentrations.
View Article and Find Full Text PDFCancer is driven by genomic alterations, but the processes causing this disease are largely performed by proteins. However, proteins are harder and more expensive to measure than genes and transcripts. To catalyze developments of methods to infer protein levels from other omics measurements, we leveraged crowdsourcing via the NCI-CPTAC DREAM proteogenomic challenge.
View Article and Find Full Text PDFImportance: Mammography screening currently relies on subjective human interpretation. Artificial intelligence (AI) advances could be used to increase mammography screening accuracy by reducing missed cancers and false positives.
Objective: To evaluate whether AI can overcome human mammography interpretation limitations with a rigorous, unbiased evaluation of machine learning algorithms.