Techniques to produce and evaluate realistic multivariate synthetic data.

Sci Rep

Department of Biostatistics and Bioinformatics, Moffitt Cancer Center and Research Institute, 12902 Bruce B. Downs Blvd, Tampa, FL, 33612, USA.

Published: July 2023

Data modeling requires a sufficient sample size for reproducibility. A small sample size can inhibit model evaluation. A synthetic data generation technique addressing this small sample size problem is evaluated: from the space of arbitrarily distributed samples, a subgroup (class) has a latent multivariate normal characteristic; synthetic data can be generated from this class with univariate kernel density estimation (KDE); and synthetic samples are statistically like their respective samples. Three samples (n = 667) were investigated with 10 input variables (X). KDE was used to augment the sample size in X. Maps produced univariate normal variables in Y. Principal component analysis in Y produced uncorrelated variables in T, where the probability density functions were approximated as normal and characterized; synthetic data was generated with normally distributed univariate random variables in T. Reversing each step produced synthetic data in Y and X. All samples were approximately multivariate normal in Y, permitting the generation of synthetic data. Probability density function and covariance comparisons showed similarity between samples and synthetic samples. A class of samples has a latent normal characteristic. For such samples, this approach offers a solution to the small sample size problem. Further studies are required to understand this latent class.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10382509PMC
http://dx.doi.org/10.1038/s41598-023-38832-0DOI Listing

Publication Analysis

Top Keywords

synthetic data
24
sample size
20
small sample
12
samples
9
synthetic
8
size problem
8
multivariate normal
8
normal characteristic
8
data generated
8
synthetic samples
8

Similar Publications

Pancreatic ductal adenocarcinoma (PDAC) is characterized by a dense extracellular matrix (ECM) exhibiting high stiffness and fast stress relaxation. In this work, gelatin-based viscoelastic hydrogels were developed to mimic the compositions, stiffness, and fast stress relaxation of PDAC tissues. The hydrogels were cross-linked by gelatin-norbornene-boronic acid (GelNB-BA), thiolated macromers, and a 1,2-diol-containing linear synthetic polymer PHD.

View Article and Find Full Text PDF

Prion diseases, particularly sporadic cases, pose a challenge due to their complex nature and heterogeneity. The underlying mechanism of the spontaneous conversion from PrPC to PrPSc, the hallmark of prion diseases, remains elusive. To shed light on this process and the involvement of cofactors, we have developed an in vitro system that faithfully mimics spontaneous prion misfolding using minimal components.

View Article and Find Full Text PDF

Aim: o point out how novel analysis tools of AI can make sense of the data acquired during OL and OC diagnosis and treatment in an effort to help improve and standardize the patient pathway for these disease.

Material And Methods: ultilizing programmed detection of heterogeneus OL and OC habitats through radiomics and correlate to imaging based tumor grading plus a literature review.

Results: new analysis pipelines have been generated for integrating imaging and patient demographic data and identify new multi-omic biomarkers of response prediction and tumour grading using cutting-edge artificial intelligence (AI) in OL and OC.

View Article and Find Full Text PDF

Next-generation sequencing has revealed the disruptive reality that advanced/metastatic cancers have complex and individually distinct genomic landscapes, necessitating a rethinking of treatment strategies and clinical trial designs. Indeed, the molecular reclassification of cancer suggests that it is the molecular underpinnings of the disease, rather than the tissue of origin, that mostly drives outcomes. Consequently, oncology clinical trials have evolved from standard phase 1, 2, and 3 tissue-specific studies; to tissue-specific, biomarker-driven trials; to tissue-agnostic trials untethered from histology (all drug-centered designs); and, ultimately, to patient-centered, N-of-1 precision medicine studies in which each patient receives a personalized, biomarker-matched therapy/combination of drugs.

View Article and Find Full Text PDF

Traditional drug discovery methods such as wet-lab testing, validations, and synthetic techniques are time-consuming and expensive. Artificial Intelligence (AI) approaches have progressed to the point where they can have a significant impact on the drug discovery process. Using massive volumes of open data, artificial intelligence methods are revolutionizing the pharmaceutical industry.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!