Background: Data anonymization is an important building block for ensuring privacy and fosters the reuse of data. However, transforming the data in a way that preserves the privacy of subjects while maintaining a high degree of data quality is challenging and particularly difficult when processing complex datasets that contain a high number of attributes. In this article we present how we extended the open source software ARX to improve its support for high-dimensional, biomedical datasets.

Findings: For improving ARX's capability to find optimal transformations when processing high-dimensional data, we implement 2 novel search algorithms. The first is a greedy top-down approach and is oriented on a formally implemented bottom-up search. The second is based on a genetic algorithm. We evaluated the algorithms with different datasets, transformation methods, and privacy models. The novel algorithms mostly outperformed the previously implemented bottom-up search. In addition, we extended the GUI to provide a high degree of usability and performance when working with high-dimensional datasets.

Conclusion: With our additions we have significantly enhanced ARX's ability to handle high-dimensional data in terms of processing performance as well as usability and thus can further facilitate data sharing.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8489190PMC
http://dx.doi.org/10.1093/gigascience/giab068DOI Listing

Publication Analysis

Top Keywords

high-dimensional biomedical
8
data
8
high degree
8
high-dimensional data
8
implemented bottom-up
8
bottom-up search
8
high-dimensional
5
scalable software
4
software solution
4
solution anonymizing
4

Similar Publications

Multiple Sclerosis: Glial Cell Diversity in Time and Space.

Glia

December 2024

Department of Neurology, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany.

Multiple sclerosis (MS) is the most prevalent human inflammatory disease of the central nervous system with demyelination and glial scar formation as pathological hallmarks. Glial cells are key drivers of lesion progression in MS with roles in both tissue damage and repair depending on the surrounding microenvironment and the functional state of the individual glial subtype. In this review, we describe recent developments in the context of glial cell diversity in MS summarizing key findings with respect to pathological and maladaptive functions related to disease-associated glial subtypes.

View Article and Find Full Text PDF

Coupling of state space modules and attention mechanisms: An input-aware multi-contrast MRI synthesis method.

Med Phys

December 2024

Jiangsu Key Laboratory for Biomaterials and Devices, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China.

Background: Medical imaging plays a pivotal role in the real-time monitoring of patients during the diagnostic and therapeutic processes. However, in clinical scenarios, the acquisition of multi-modal imaging protocols is often impeded by a number of factors, including time and economic costs, the cooperation willingness of patients, imaging quality, and even safety concerns.

Purpose: We proposed a learning-based medical image synthesis method to simplify the acquisition of multi-contrast MRI.

View Article and Find Full Text PDF

Spectral flow cytometry provides greater insights into cellular heterogeneity by simultaneous measurement of up to 50 markers. However, analyzing such high-dimensional (HD) data is complex through traditional manual gating strategy. To address this gap, we developed CAFE as an open-source Python-based web application with a graphical user interface.

View Article and Find Full Text PDF

High-dimensional cytometry (HDC) is a powerful technology for studying single-cell phenotypes in complex biological systems. Although technological developments and affordability have made HDC broadly available in recent years, technological advances were not coupled with an adequate development of analytical methods that can take full advantage of the complex data generated. While several analytical platforms and bioinformatics tools have become available for the analysis of HDC data, these are either web-hosted with limited scalability or designed for expert computational biologists, making their use unapproachable for wet lab scientists.

View Article and Find Full Text PDF

THOR: a TMB heterogeneity-adaptive optimization model predicts immunotherapy response using clonal genomic features in group-structured data.

Brief Bioinform

November 2024

Department of Biomedical Engineering, College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, 29 Jiangjun Avenue, Jiangning, Nanjing 211106, China.

With the increasing number of indications for immune checkpoint inhibitors in early and advanced cancers, the prospect of a tumor-agnostic biomarker to prioritize patients is compelling. Tumor mutation burden (TMB) is a widely endorsed biomarker that quantifies nonsynonymous mutations within tumor DNA, essential for neoantigen production, which, in turn, correlates with the immune response and guides decision-making. However, the general clinical application of TMB-relying on simple mutational counts targeted at a single endpoint-does not adequately capture the complex clonal structure of tumors nor the multifaceted nature of prognostic indicators.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!