Genomics-based technologies produce large amounts of data. To interpret the results and identify the most important variates related to phenotypes of interest, various multivariate regression and variate selection methods are used. Although inspected for statistical performance, the relevance of multivariate models in interpreting biological data sets often remains elusive. We compare various multivariate regression and variate selection methods applied to a nutrigenomics data set in terms of performance, utility and biological interpretability. The studied data set comprised hepatic transcriptome (10,072 predictor variates) and plasma protein concentrations [2 dependent variates: Leptin (LEP) and Tissue inhibitor of metalloproteinase 1 (TIMP-1)] collected during a high-fat diet study in ApoE3Leiden mice. The multivariate regression methods used were: partial least squares "PLS"; a genetic algorithm-based multiple linear regression, "GA-MLR"; two least-angle shrinkage methods, "LASSO" and "ELASTIC NET"; and a variant of PLS that uses covariance-based variate selection, "CovProc." Two methods of ranking the genes for Gene Set Enrichment Analysis (GSEA) were also investigated: either by their correlation with the protein data or by the stability of the PLS regression coefficients. The regression methods performed similarly, with CovProc and GA performing the best and worst, respectively (R-squared values based on "double cross-validation" predictions of 0.762 and 0.451 for LEP; and 0.701 and 0.482 for TIMP-1). CovProc, LASSO and ELASTIC NET all produced parsimonious regression models and consistently identified small subsets of variates, with high commonality between the methods. Comparison of the gene ranking approaches found a high degree of agreement, with PLS-based ranking finding fewer significant gene sets. We recommend the use of CovProc for variate selection, in tandem with univariate methods, and the use of correlation-based ranking for GSEA-like pathway analysis methods.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3380194 | PMC |
http://dx.doi.org/10.1007/s12263-012-0288-4 | DOI Listing |
J Cancer Policy
January 2025
Institute of Health, Jimma University, Jimma, Ethiopia.
Cervical cancer is the second most prevalent disease among Ethiopian women of reproductive age and a serious gynecological malignancy affecting women regionally. About, 3,235 deaths and 4,648 new cases are reported nationwide each year. Precancerous cervical screening programs face many difficulties in settings with limited resources, despite their severity, such as a lack of medical supplies and equipment, poorly trained healthcare workers, a heavy workload for current staff, low professional compliance, and insufficient support from medical facilities.
View Article and Find Full Text PDFBiostatistics
December 2024
Department of Biostatistics, Yale University, 300 George St, New Haven, CT 06511, United States.
Progress in neuroscience has provided unprecedented opportunities to advance our understanding of brain alterations and their correspondence to phenotypic profiles. With data collected from various imaging techniques, studies have integrated different types of information ranging from brain structure, function, or metabolism. More recently, an emerging way to categorize imaging traits is through a metric hierarchy, including localized node-level measurements and interactive network-level metrics.
View Article and Find Full Text PDFJ Cardiothorac Vasc Anesth
December 2024
Surgical Anesthesia Center, The First People's Hospital of Longquanyi District, Chengdu, China. Electronic address:
Background: The incidence, mortality, and readmission rates for acute heart failure (AHF) are high, and the in-hospital mortality for AHF patients in the intensive care unit (ICU) is higher. However, there is currently no method to accurately predict the mortality of AHF patients.
Methods: The Medical Information Mart for Intensive Care Ⅳ (MIMIC-Ⅳ) database was used to perform a retrospective study.
Foods
December 2024
College of Mechanical and Electronic Engineering, Nanjing Forestry University, Nanjing 210037, China.
Rapid and accurate detection of protein content is essential for ensuring the quality of maize. Near-infrared spectroscopy (NIR) technology faces limitations due to surface effects and sample homogeneity issues when measuring the protein content of whole maize grains. Focusing on maize grain powder can significantly improve the quality of data and the accuracy of model predictions.
View Article and Find Full Text PDFSpectrochim Acta A Mol Biomol Spectrosc
December 2024
College of Artificial Intelligence, Nankai University, Tianjin 300350, China.
The main objective of this study was to evaluate the potential of near infrared (NIR) spectroscopy and machine learning in detecting microplastics (MPs) in chicken feed. The application of machine learning techniques in building optimal classification models for MPs-contaminated chicken feeds was explored. 80 chicken feed samples with non-contaminated and 240 MPs-contaminated chicken feed samples including polypropylene (PP), polyvinyl chloride (PVC), and polyethylene terephthalate (PET) were prepared, and the NIR diffuse reflectance spectra of all the samples were collected.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!