Sparse feature tables, in which many features are present in very few samples, are common in big biological data (e.g. metagenomics). Ignoring issues of zero-laden datasets can result in biased statistical estimates and decreased power in downstream analyses. Zeros are also a particular issue for compositional data analysis using log-ratios since the log of zero is undefined. Researchers typically deal with this issue by removing low frequency features, but the thresholds for removal differ markedly between studies with little or no justification. Here, we present CurvCut, an unsupervised data-driven approach with human confirmation for rare-feature removal. CurvCut implements two distinct approaches for determining natural breaks in the feature distributions: a method based on curvature analysis borrowed from thermodynamics and the Fisher-Jenks statistical method. Our results show that CurvCut rapidly identifies data-specific breaks in these distributions that can be used as cutoff points for low-frequency feature removal that maximizes feature retention. We show that CurvCut works across different biological data types and rapidly generates clear visual results that allow researchers to confirm and apply feature removal cutoffs to individual datasets.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10768885 | PMC |
http://dx.doi.org/10.1093/nargab/lqad110 | DOI Listing |
Int J Surg
January 2025
Aging Research Center, Department of Neurobiology, Care Sciences and Society, Karolinska Institutet, Stockholm, Sweden.
Introduction: Lung function has been associated with cognitive decline and dementia, but the extent to which lung function impacts brain structural changes remains unclear. We aimed to investigate the association of lung function with structural macro- and micro-brain changes across mid- and late-life.
Methods: The study included a total of 37 164 neurologic disorder-free participants aged 40-70 years from the UK Biobank, who underwent brain MRI scans 9 years after baseline.
J Gerontol B Psychol Sci Soc Sci
January 2025
Department of Sociology, Vrije Universiteit Amsterdam, The Netherlands.
Objectives: Older people are increasingly entering their later years in stepfamilies. Because adult children play a central role in older parents' support networks, there is concern that the generally weaker intergenerational ties found in stepfamilies may imply an impending deficit in the care available to stepparents. It is currently unclear whether there are differences across stepfamily types including stepfamilies with only biological children.
View Article and Find Full Text PDFJ Physiol
January 2025
Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, UK.
The mechanisms that drive placental dysfunction in pregnancies complicated by hypoxia and fetal growth restriction remain poorly understood. Changes to mitochondrial respiration contribute to cellular dysfunction in conditions of hypoxia and have been implicated in the pathoaetiology of pregnancy complications, such as pre-eclampsia. We used bespoke isobaric hypoxic chambers and a combination of functional, molecular and imaging techniques to study cellular metabolism and mitochondrial dynamics in sheep undergoing hypoxic pregnancy.
View Article and Find Full Text PDFJ Vis Exp
January 2025
Division of Molecular Neurogenetics, German Cancer Research Center (DKFZ);
Glioblastoma (GBM) is described as a group of highly malignant primary brain tumors and stands as one of the most lethal malignancies. The genetic and cellular characteristics of GBM have been a focal point of ongoing research, revealing that it is a group of heterogeneous diseases with variations in RNA expression, DNA methylation, or cellular composition. Despite the wealth of molecular data available, the lack of transferable pre-clinic models has limited the application of this information to disease classification rather than treatment stratification.
View Article and Find Full Text PDFJ Cachexia Sarcopenia Muscle
February 2025
Division of Pulmonary Medicine, Department of Medicine, Keio University School of Medicine, Tokyo, Japan.
Background: Chest computed tomography (CT) is a valuable tool for diagnosing and predicting the severity of coronavirus disease 2019 (COVID-19) and assessing extrapulmonary organs. Reduced muscle mass and visceral fat accumulation are important features of a body composition phenotype in which obesity and muscle loss coexist, but their relationship with COVID-19 outcomes remains unclear. In this study, we aimed to investigate the association between the erector spinae muscle (ESM) to epicardial adipose tissue (EAT) ratio (ESM/EAT) on chest CT and disease severity in patients with COVID-19.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!