Publications by authors named "Gustavo Stolovitzky"

Extracellular vesicles (EVs) are heterogeneous entities secreted by cells into their microenvironment and systemic circulation. Circulating EVs carry functional small RNAs and other molecular footprints from their cell of origin, and thus have evident applications in liquid biopsy, therapeutics, and intercellular communication. Yet, the complete transcriptomic landscape of EVs is poorly characterized due to critical limitations including variable protocols used for EV-RNA extraction, quality control, cDNA library preparation, sequencing technologies, and bioinformatic analyses.

View Article and Find Full Text PDF
Article Synopsis
  • All cells, regardless of being eukaryotic or prokaryotic, release extracellular vesicles (EVs) for various functions like communication and waste disposal, with small EVs containing small RNAs that may serve as important disease markers.
  • This study focuses on identifying unannotated small RNAs in EVs from prostate cancer and benign tissues, overcoming limitations of previous sequencing methods to explore the 'dark matter' of genomes and their role in gene expression regulation.
  • Researchers found that these novel EV-associated small RNAs, termed EV-UGRs, showed a significant reduction in aggressive prostate cancer, but their expression increased after treatment, potentially promising for fluid-based diagnostics in cancer screening.
View Article and Find Full Text PDF

Background: Clinical trials are vital for developing new therapies but can also delay drug development. Efficient trial data management, optimized trial protocol, and accurate patient identification are critical for reducing trial timelines. Natural language processing (NLP) has the potential to achieve these objectives.

View Article and Find Full Text PDF

Motivation: The integration of vast, complex biological data with computational models offers profound insights and predictive accuracy. Yet, such models face challenges: poor generalization and limited labeled data.

Results: To overcome these difficulties in binary classification tasks, we developed the Method for Optimal Classification by Aggregation (MOCA) algorithm, which addresses the problem of generalization by virtue of being an ensemble learning method and can be used in problems with limited or no labeled data.

View Article and Find Full Text PDF
Article Synopsis
  • Ground-glass opacities (GGOs) on CT scans may signal lung cancer, and leveraging electronic health records filled with unstructured notes can aid in managing these nodules effectively.
  • Researchers developed an advanced deep learning natural language processing (NLP) tool to extract detailed GGO features from radiology notes of over 13,000 lung cancer patients, achieving high levels of precision and recall in their analysis.
  • The longitudinal study of GGO status showed that about 16.8% of patients experienced increased size of GGOs, while 72.3% had stable conditions, indicating the tool's efficacy in monitoring and analyzing GGO progression over time.
View Article and Find Full Text PDF
Article Synopsis
  • - The study addresses the difficulty in identifying rare genetic disorders in children due to issues like incomplete records and varied symptoms, aiming to create an algorithm called PheIndex using electronic medical records.
  • - PheIndex was developed with 13 expert-established criteria and validated through chart reviews, successfully identifying 1,088 children at risk among over 93,000 live births with strong performance metrics: 90% sensitivity, 97% specificity, and 94% accuracy.
  • - The algorithm serves as a tool for healthcare providers to detect potential rare genetic disorders, prompting them to consider further diagnostic testing or referrals to genetic specialists.
View Article and Find Full Text PDF

Characterizing the effect of combination therapies is vital for treating diseases like cancer. We introduce correlated drug action (CDA), a baseline model for the study of drug combinations in both cell cultures and patient populations, which assumes that the efficacy of drugs in a combination may be correlated. We apply temporal CDA (tCDA) to clinical trial data, and demonstrate the utility of this approach in identifying possible synergistic combinations and others that can be explained in terms of monotherapies.

View Article and Find Full Text PDF

Every year, 11% of infants are born preterm with significant health consequences, with the vaginal microbiome a risk factor for preterm birth. We crowdsource models to predict (1) preterm birth (PTB; <37 weeks) or (2) early preterm birth (ePTB; <32 weeks) from 9 vaginal microbiome studies representing 3,578 samples from 1,268 pregnant individuals, aggregated from public raw data via phylogenetic harmonization. The predictive models are validated on two independent unpublished datasets representing 331 samples from 148 pregnant individuals.

View Article and Find Full Text PDF

Background: Although optimal sequencing of systemic therapy in cancer care is critical to achieving maximal clinical benefit, there is a lack of analysis of treatment sequencing in advanced non-small cell lung cancer (aNSCLC) in real-world settings.

Methods: A retrospective cohort study of 13,340 lung cancer patients within the Mount Sinai Health System (MSHS) was performed. Systemic therapy data of aNSCLC in 2,106 patients was the starting point in our analysis to investigate how treatment sequencing has evolved, the impact of sequencing patterns on clinical outcomes, and the effectiveness of 2 line chemotherapy after patients progressed on immune checkpoint inhibitor (ICI)-based therapy as the 1 line of therapy (LOT).

View Article and Find Full Text PDF

Globally, every year about 11% of infants are born preterm, defined as a birth prior to 37 weeks of gestation, with significant and lingering health consequences. Multiple studies have related the vaginal microbiome to preterm birth. We present a crowdsourcing approach to predict: (a) preterm or (b) early preterm birth from 9 publicly available vaginal microbiome studies representing 3,578 samples from 1,268 pregnant individuals, aggregated from raw sequences via an open-source tool, MaLiAmPi.

View Article and Find Full Text PDF

Importance: An automated, accurate method is needed for unbiased assessment quantifying accrual of joint space narrowing and erosions on radiographic images of the hands and wrists, and feet for clinical trials, monitoring of joint damage over time, assisting rheumatologists with treatment decisions. Such a method has the potential to be directly integrated into electronic health records.

Objectives: To design and implement an international crowdsourcing competition to catalyze the development of machine learning methods to quantify radiographic damage in rheumatoid arthritis (RA).

View Article and Find Full Text PDF

Circulating extracellular vesicles (EVs) contain molecular footprints-lipids, proteins, RNA, and DNA-from their cell of origin. Consequently, EV-associated RNA and proteins have gained widespread interest as liquid-biopsy biomarkers. Yet, an integrative proteo-transcriptomic landscape of EVs and comparison with their cell of origin remains obscure.

View Article and Find Full Text PDF

We now know RNA can survive the harsh environment of biofluids when encapsulated in vesicles or by associating with lipoproteins or RNA binding proteins. These extracellular RNA (exRNA) play a role in intercellular signaling, serve as biomarkers of disease, and form the basis of new strategies for disease treatment. The Extracellular RNA Communication Consortium (ERCC) hosted a two-day online workshop (April 19-20, 2021) on the unique challenges of exRNA data analysis.

View Article and Find Full Text PDF

Binary classification is one of the central problems in machine-learning research and, as such, investigations of its general statistical properties are of interest. We studied the ranking statistics of items in binary classification problems and observed that there is a formal and surprising relationship between the probability of a sample belonging to one of the two classes and the Fermi-Dirac distribution determining the probability that a fermion occupies a given single-particle quantum state in a physical system of noninteracting fermions. Using this equivalence, it is possible to compute a calibrated probabilistic output for binary classifiers.

View Article and Find Full Text PDF
Article Synopsis
  • Early detection of liver cancer (HCC) is inadequate, and there is a crucial need for better biomarkers; extracellular vesicles (EVs) containing small RNA (exRNA) might provide a solution.
  • Researchers isolated EVs and performed genome-wide sequencing to identify novel small RNA clusters (smRCs) that are overexpressed in the blood of HCC patients, with significant specificity and sensitivity for early detection.
  • The study suggests that these unannotated smRCs could be developed into a minimally invasive blood test for HCC monitoring, paving the way for improved cancer biomarker research.
View Article and Find Full Text PDF

Identification of pregnancies at risk of preterm birth (PTB), the leading cause of newborn deaths, remains challenging given the syndromic nature of the disease. We report a longitudinal multi-omics study coupled with a DREAM challenge to develop predictive models of PTB. The findings indicate that whole-blood gene expression predicts ultrasound-based gestational ages in normal and complicated pregnancies (r = 0.

View Article and Find Full Text PDF
Article Synopsis
  • Accurately identifying and quantifying RNA isoforms in cancer is crucial for understanding genetic variations, analyzing biological pathways, and developing biomarkers.
  • The ICGC-TCGA DREAM SMC-RNA challenge was a collaborative project aimed at evaluating methods for RNA isoform quantification and fusion detection using RNA sequencing data, concluding in 2018 with results from 77 fusion detection and 65 isoform quantification submissions.
  • The challenge provided a collection of benchmark entries and detailed leaderboards, emphasizing the use of containerized workflows for easy accessibility and reproducibility of the methods developed, with supplementary information on the peer review process.
View Article and Find Full Text PDF
Article Synopsis
  • Despite extensive research, many human kinases remain undrugged, highlighting the need for effective methods to discover new compound-kinase interactions.
  • This study benchmarks various predictive algorithms for kinase inhibitor potencies using unpublished bioactivity data, finding that ensemble models outperform single-dose assays.
  • The research identifies unexpected activities in lesser-studied kinases, and provides open-source resources that enhance our understanding of druggable kinases.
View Article and Find Full Text PDF

Background: Assistive automatic seizure detection can empower human annotators to shorten patient monitoring data review times. We present a proof-of-concept for a seizure detection system that is sensitive, automated, patient-specific, and tunable to maximise sensitivity while minimizing human annotation times. The system uses custom data preparation methods, deep learning analytics and electroencephalography (EEG) data.

View Article and Find Full Text PDF

Summary: The advent of high-throughput technologies has provided researchers with measurements of thousands of molecular entities and enable the investigation of the internal regulatory apparatus of the cell. However, network inference from high-throughput data is far from being a solved problem. While a plethora of different inference methods have been proposed, they often lead to non-overlapping predictions, and many of them lack user-friendly implementations to enable their broad utilization.

View Article and Find Full Text PDF

Single-cell RNA-sequencing (scRNAseq) technologies are rapidly evolving. Although very informative, in standard scRNAseq experiments, the spatial organization of the cells in the tissue of origin is lost. Conversely, spatial RNA-seq technologies designed to maintain cell localization have limited throughput and gene coverage.

View Article and Find Full Text PDF

Our ability to discover effective drug combinations is limited, in part by insufficient understanding of how the transcriptional response of two monotherapies results in that of their combination. We analyzed matched time course RNAseq profiling of cells treated with single drugs and their combinations and found that the transcriptional signature of the synergistic combination was unique relative to that of either constituent monotherapy. The sequential activation of transcription factors in time in the gene regulatory network was implicated.

View Article and Find Full Text PDF

The advent of microfluidics in the 1990s promised a revolution in multiple industries from healthcare to chemical processing. Deterministic lateral displacement (DLD) is a continuous-flow microfluidic particle separation method discovered in 2004 that has been applied successfully and widely to the separation of blood cells, yeast, spores, bacteria, viruses, DNA, droplets, and more. Deterministic lateral displacement is conceptually simple and can deliver consistent performance over a wide range of flow rates and particle concentrations.

View Article and Find Full Text PDF

Cancer is driven by genomic alterations, but the processes causing this disease are largely performed by proteins. However, proteins are harder and more expensive to measure than genes and transcripts. To catalyze developments of methods to infer protein levels from other omics measurements, we leveraged crowdsourcing via the NCI-CPTAC DREAM proteogenomic challenge.

View Article and Find Full Text PDF

Importance: Mammography screening currently relies on subjective human interpretation. Artificial intelligence (AI) advances could be used to increase mammography screening accuracy by reducing missed cancers and false positives.

Objective: To evaluate whether AI can overcome human mammography interpretation limitations with a rigorous, unbiased evaluation of machine learning algorithms.

View Article and Find Full Text PDF