Motivation: This work addresses two common issues in building classification models for biological or medical studies: learning a sparse model, where only a subset of a large number of possible predictors is used, and training in the presence of missing data. This work focuses on supervised generative binary classification models, specifically linear discriminant analysis (LDA). The parameters are determined using an expectation maximization algorithm to both address missing data and introduce priors to promote sparsity. The proposed algorithm, expectation-maximization sparse discriminant analysis (EM-SDA), produces a sparse LDA model for datasets with and without missing data.
Results: EM-SDA is tested via simulations and case studies. In the simulations, EM-SDA is compared with nearest shrunken centroids (NSCs) and sparse discriminant analysis (SDA) with k-nearest neighbors for imputation for varying mechanism and amount of missing data. In three case studies using published biomedical data, the results are compared with NSC and SDA models with four different types of imputation, all of which are common approaches in the field. EM-SDA is more accurate and sparse than competing methods both with and without missing data in most of the experiments. Furthermore, the EM-SDA results are mostly consistent between the missing and full cases. Biological relevance of the resulting models, as quantified via a literature search, is also presented.
Availability And Implementation: A Matlab implementation published under GNU GPL v.3 license is available at http://web.mit.edu/braatzgroup/links.html .
Contact: braatz@mit.edu.
Supplementary Information: Supplementary data are available at Bioinformatics online.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1093/bioinformatics/btx224 | DOI Listing |
Sci Rep
December 2024
School of Biosciences, University of Kent, Canterbury, Kent, CT2 7NZ, UK.
Worldwide museums hold collections of eggshells representing material for descriptive studies. However, an obstacle to this is the lack of information about the original contents and weight of the entire egg (W). This study aimed to fill this gap though development of a methodological mechanism for calculating the volume of the egg interior (V), its density (D) and W.
View Article and Find Full Text PDFJ Pediatr Hematol Oncol
January 2025
Department of Ophthalmology, Hamilton Eye Institute, University of Tennessee Health Science Center.
This quality improvement initiative aimed to reduce the no-show rate at a hospital-based tertiary sickle cell ophthalmology clinic. Missed appointments place a significant burden on the healthcare system, resulting in prolonged waiting times and underutilized clinical resources that impact the quality of care provided. Individuals with sickle cell disease commonly require multiple appointments to address the myriads of comorbidities associated with their disease.
View Article and Find Full Text PDFFront Bioinform
December 2024
Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, United States.
Primates, consisting of apes, monkeys, tarsiers, and lemurs, are among the most charismatic and well-studied animals on Earth, yet there is no taxonomically complete molecular timetree for the group. Combining the latest large-scale genomic primate phylogeny of 205 recognized species with the 400-species literature consensus tree available from TimeTree.org yields a phylogeny of just 405 primates, with 50 species still missing despite having molecular sequence data in the NCBI GenBank.
View Article and Find Full Text PDFCureus
December 2024
General Surgery, Aneurin Bevan University Health Board, Newport, GBR.
Aim: To assess recent colonoscopies and CT scans in conjunction with the feacal immunochemical test (FIT) for possibly downgrading urgent suspected cancer (USC) referrals.
Methods: A retrospective single-centre study was conducted, including all USC referrals for colonoscopy in 2022, excluding anal cancers. The CT and colonoscopy findings for a two-year period prior to the referral, along with the FIT result (if done), were noted.
Noncoding RNA Res
April 2025
Institute of Environmental and Agricultural Biology (X-BIO), Tyumen State University, 625003, Tyumen, Russia.
Eusociality, characterized by reproductive division of labor, cooperative brood care, and multi-generational cohabitation, represents a pinnacle of complex social evolution, most notably manifested within the Hymenoptera order including bees, ants, and wasps. The molecular underpinnings underlying these sophisticated social structures remain an enigma, with noncoding RNAs (ncRNAs) emerging as crucial regulatory players. This article delves into the roles of ncRNAs in exerting epigenetic control during the development and maintenance of Hymenopteran eusociality.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!