Background: Monogenic diseases have been shown to contribute to complex disease risk and may hold new insights into the underlying biological mechanism of Inflammatory Bowel Disease (IBD).
Methods: We analyzed Mendelian disease associations with IBD using over 55 million patients from the Optum's deidentified electronic health records dataset database. Using the significant Mendelian diseases, we performed pathway enrichment analysis and constructed a model using gene expression datasets to differentiate Crohn's disease (CD), ulcerative colitis (UC), and healthy patient samples.
Discovery of robust diagnostic or prognostic biomarkers is a key to optimizing therapeutic benefit for select patient cohorts - an idea commonly referred to as precision medicine. Most discovery studies to derive such markers from high-dimensional transcriptomics datasets are weakly powered with sample sizes in the tens of patients. Therefore, highly regularized statistical approaches are essential to making generalizable predictions.
View Article and Find Full Text PDFSummary: Gene-based supervised machine learning classification models have been widely used to differentiate disease states, predict disease progression and determine effective treatment options. However, many of these classifiers are sensitive to noise and frequently do not replicate in external validation sets. For complex, heterogeneous diseases, these classifiers are further limited by being unable to capture varying combinations of genes that lead to the same phenotype.
View Article and Find Full Text PDF