Purpose: Chronic low back pain (cLBP) is a common health condition worldwide and a leading cause of disability with an estimated lifetime prevalence of 80-90% in industrialized countries. However, we have had limited success in treating cLBP likely due to its non-specific heterogeneous nature that goes beyond detectable anatomical changes. We propose that omics technologies as precision medicine tools are well suited to provide insight into its pathophysiology and provide diagnostic markers and therapeutic targets.
View Article and Find Full Text PDFInt Arch Occup Environ Health
October 2022
Purpose: Exposures related to beryllium (Be) are an enduring concern among workers in the nuclear weapons and other high-tech industries, calling for regular and rigorous biological monitoring. Conventional biomonitoring of Be in urine is not informative of cumulative exposure nor health outcomes. Biomarkers of exposure to Be based on non-invasive biomonitoring could help refine disease risk assessment.
View Article and Find Full Text PDFModeling factors influencing disease phenotypes, from biomarker profiling study datasets, is a critical task in biomedicine. Such datasets are typically generated from high-throughput 'omic' technologies, which help examine disease mechanisms at an unprecedented resolution. These datasets are challenging because they are high-dimensional.
View Article and Find Full Text PDFFinding optimal blood pressure (BP) target and BP treatment after acute ischemic or hemorrhagic strokes is an area of controversy and a significant unmet need in the critical care of stroke victims. Numerous large prospective clinical trials have been done to address this question but have generated neutral or conflicting results. One major limitation that may have contributed to so many neutral or conflicting clinical trial results is the "one-size fit all" approach to BP targets, while the optimal BP target likely varies between individuals.
View Article and Find Full Text PDFAim: To develop a framework to incorporate background domain knowledge into classification rule learning for knowledge discovery in biomedicine.
Methods: Bayesian rule learning (BRL) is a rule-based classifier that uses a greedy best-first search over a space of Bayesian belief-networks (BN) to find the optimal BN to explain the input dataset, and then infers classification rules from this BN. BRL uses a Bayesian score to evaluate the quality of BNs.
Deep neural networks are increasingly being used in both supervised learning for classification tasks and unsupervised learning to derive complex patterns from the input data. However, the successful implementation of deep neural networks using neuroimaging datasets requires adequate sample size for training and well-defined signal intensity based structural differentiation. There is a lack of effective automated diagnostic tools for the reliable detection of brain dysmaturation in the neonatal period, related to small sample size and complex undifferentiated brain structures, despite both translational research and clinical importance.
View Article and Find Full Text PDFThe comprehensibility of good predictive models learned from high-dimensional gene expression data is attractive because it can lead to biomarker discovery. Several good classifiers provide comparable predictive performance but differ in their abilities to summarize the observed data. We extend a Bayesian Rule Learning (BRL-GSS) algorithm, previously shown to be a significantly better predictor than other classical approaches in this domain.
View Article and Find Full Text PDFMany clinical research datasets have a large percentage of missing values that directly impacts their usefulness in yielding high accuracy classifiers when used for training in supervised machine learning. While missing value imputation methods have been shown to work well with smaller percentages of missing values, their ability to impute sparse clinical research data can be problem specific. We previously attempted to learn quantitative guidelines for ordering cardiac magnetic resonance imaging during the evaluation for pediatric cardiomyopathy, but missing data significantly reduced our usable sample size.
View Article and Find Full Text PDFHuman microbiome data from genomic sequencing technologies is fast accumulating, giving us insights into bacterial taxa that contribute to health and disease. The predictive modeling of such microbiota count data for the classification of human infection from parasitic worms, such as helminths, can help in the detection and management across global populations. Real-world datasets of microbiome experiments are typically sparse, containing hundreds of measurements for bacterial species, of which only a few are detected in the bio-specimens that are analyzed.
View Article and Find Full Text PDFBackground: Adenocarcinoma (ADC) and squamous cell carcinoma (SCC) are the most prevalent histological types among lung cancers. Distinguishing between these subtypes is critically important because they have different implications for prognosis and treatment. Normally, histopathological analyses are used to distinguish between the two, where the tissue samples are collected based on small endoscopic samples or needle aspirations.
View Article and Find Full Text PDFBackground: Pediatric cardiomyopathies are a rare, yet heterogeneous group of pathologies of the myocardium that are routinely examined clinically using Cardiovascular Magnetic Resonance Imaging (cMRI). This gold standard powerful non-invasive tool yields high resolution temporal images that characterize myocardial tissue. The complexities associated with the annotation of images and extraction of markers, necessitate the development of efficient workflows to acquire, manage and transform this data into actionable knowledge for patient care to reduce mortality and morbidity.
View Article and Find Full Text PDFAMIA Jt Summits Transl Sci Proc
August 2015
In this era of precision medicine, understanding the epigenetic differences in lung cancer subtypes could lead to personalized therapies by possibly reversing these alterations. Traditional methods for analyzing microarray data rely on the use of known pathways. We propose a novel workflow, called Junction trees to Knowledge (J2K) framework, for creating interpretable graphical representations that can be derived directly from in silico analysis of microarray data.
View Article and Find Full Text PDFBackground: Most 'transcriptomic' data from microarrays are generated from small sample sizes compared to the large number of measured biomarkers, making it very difficult to build accurate and generalizable disease state classification models. Integrating information from different, but related, 'transcriptomic' data may help build better classification models. However, most proposed methods for integrative analysis of 'transcriptomic' data cannot incorporate domain knowledge, which can improve model performance.
View Article and Find Full Text PDFAMIA Jt Summits Transl Sci Proc
February 2015
Accurate disease classification and biomarker discovery remain challenging tasks in biomedicine. In this paper, we develop and test a practical approach to combining evidence from multiple models when making predictions using selective Bayesian model averaging of probabilistic rules. This method is implemented within a Bayesian Rule Learning system and compared to model selection when applied to twelve biomedical datasets using the area under the ROC curve measure of performance.
View Article and Find Full Text PDFA major challenge in the diagnosis and treatment of brain tumors is tissue heterogeneity leading to mixed treatment response. Additionally, they are often difficult or at very high risk for biopsy, further hindering the clinical management process. To overcome this, novel advanced imaging methods are increasingly being adapted clinically to identify useful noninvasive biomarkers capable of disease stage characterization and treatment response prediction.
View Article and Find Full Text PDFBackground: Computational methods for mining of biomedical literature can be useful in augmenting manual searches of the literature using keywords for disease-specific biomarker discovery from biofluids. In this work, we develop and apply a semi-automated literature mining method to mine abstracts obtained from PubMed to discover putative biomarkers of breast and lung cancers in specific biofluids.
Methodology: A positive set of abstracts was defined by the terms 'breast cancer' and 'lung cancer' in conjunction with 14 separate 'biofluids' (bile, blood, breastmilk, cerebrospinal fluid, mucus, plasma, saliva, semen, serum, synovial fluid, stool, sweat, tears, and urine), while a negative set of abstracts was defined by the terms '(biofluid) NOT breast cancer' or '(biofluid) NOT lung cancer.
Background: Esophageal adenocarcinoma (EAC) is associated with a dismal prognosis. The identification of cancer biomarkers can advance the possibility for early detection and better monitoring of tumor progression and/or response to therapy. The authors present results from the development of a serum-based, 4-protein (biglycan, myeloperoxidase, annexin-A6, and protein S100-A9) biomarker panel for EAC.
View Article and Find Full Text PDFThis editorial provides insights into how informatics can attract highly trained students by involving them in science, technology, engineering, and math (STEM) training at the high school level and continuing to provide mentorship and research opportunities through the formative years of their education. Our central premise is that the trajectory necessary to be expert in the emergent fields in front of them requires acceleration at an early time point. Both pathology (and biomedical) informatics are new disciplines which would benefit from involvement by students at an early stage of their education.
View Article and Find Full Text PDFCharacterization of regional left ventricular (LV) function may have application in prognosticating timely response and informing choice therapy in patients with ischemic cardiomyopathy. The purpose of this study is to characterize LV function through a systematic analysis of 4D (3D + time) endocardial motion over the cardiac cycle in an effort to define objective, clinically useful metrics of pathological remodeling and declining cardiac performance, using standard cardiac MRI data for two distinct patient cohorts accessed from CardiacAtlas.org: a) MESA - a cohort of asymptomatic patients; and b) DETERMINE - a cohort of symptomatic patients with a history of ischemic heart disease (IHD) or myocardial infarction.
View Article and Find Full Text PDFAMIA Annu Symp Proc
September 2015
Mining high dimensional biomedical data with existing classifiers is challenging and the predictions are often inaccurate. We investigated the use of Bayesian Logistic Regression (B-LR) for mining such data to predict and classify various disease conditions. The analysis was done on twelve biomedical datasets with binary class variables and the performance of B-LR was compared to those from other popular classifiers on these datasets with 10-fold cross validation using the WEKA data mining toolkit.
View Article and Find Full Text PDFAMIA Jt Summits Transl Sci Proc
December 2013
Technology is constantly evolving, necessitating the development of workflows for efficient use of high-dimensional data. We develop and test an empirical workflow for predictive modeling based on single nucleotide polymorphisms (SNP) from genome-wide association study (GWAS) datasets. To this aim, we use as a case study SNP-based prediction of survival for non-small cell lung cancer (NSCLC) with a Bayesian rule learner system (BRL+).
View Article and Find Full Text PDFPeptide and protein identification via tandem mass spectrometry (MS/MS) lies at the heart of proteomic characterization of biological samples. Several algorithms are able to search, score, and assign peptides to large MS/MS datasets. Most popular methods, however, underutilize the intensity information available in the tandem mass spectrum due to the complex nature of the peptide fragmentation process, thus contributing to loss of potential identifications.
View Article and Find Full Text PDFAMIA Jt Summits Transl Sci Proc
August 2012
We propose a novel method called Partitioning based Adaptive Irrelevant Feature Eliminator (PAIFE) for dimensionality reduction in high-dimensional biomedical datasets. PAIFE evaluates feature-target relationships over not only a whole dataset, but also the partitioned subsets and is extremely effective in identifying features whose relevancies to the target are conditional on certain other features. PAIFE adaptively employs the most appropriate feature evaluation strategy, statistical test and parameter instantiation.
View Article and Find Full Text PDFIntroduction: Clinical decision making in the setting of computed tomography (CT) screening could benefit from accessible biomarkers that help predict the level of lung cancer risk in high-risk individuals with indeterminate pulmonary nodules.
Methods: To identify candidate serum biomarkers, we measured 70 cancer-related proteins by Luminex xMAP (Luminex Corporation) multiplexed immunoassays in a training set of sera from 56 patients with biopsy-proven primary non-small-cell lung cancer and 56 age-, sex-, and smoking-matched CT-screened controls.
Results: We identified a panel of 10 serum biomarkers-prolactin, transthyretin, thrombospondin-1, E-selectin, C-C motif chemokine 5, macrophage migration inhibitory factor, plasminogen activator inhibitor, receptor tyrosine-protein kinase, erbb-2, cytokeratin fragment 21.