An Open Source Python Library for Anonymizing Sensitive Data.

Judith Sáinz-Pardo Díaz Álvaro López García

Sci Data

Instituto de Física de Cantabria (IFCA), CSIC-UC Avda. los Castros s/n, 39005, Santander, Spain.

Published: November 2024

Open science is a fundamental pillar to promote scientific progress and collaboration, based on the principles of open data, open source and open access. However, the requirements for publishing and sharing open data are in many cases difficult to meet in compliance with strict data protection regulations. Consequently, researchers need to rely on proven methods that allow them to anonymize their data without sharing it with third parties. To this end, this paper presents the implementation of a Python library for the anonymization of sensitive tabular data. This framework provides users with a wide range of anonymization methods that can be applied on the given dataset, including the set of identifiers, quasi-identifiers, generalization hierarchies and allowed level of suppression, along with the sensitive attribute and the level of anonymity required. The library has been implemented following best practices for integration and continuous development, as well as the use of workflows to test code coverage based on unit and functional tests.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11599594	PMC
http://dx.doi.org/10.1038/s41597-024-04019-z	DOI Listing

Publication Analysis

Top Keywords

open source

python library

data open

open data

open

data

source python

library anonymizing

anonymizing sensitive

sensitive data

Similar Publications

Open-source Large Language Models can Generate Labels from Radiology Reports for Training Convolutional Neural Networks.

Acad Radiol

January 2025

Department of Radiology and Nuclear Medicine, German Heart Center Munich, Lazarettstraße 36, 80636 Munich, Germany (K.K.B.).

Fares Al Mohamad Leonhard Donle Felix Dorfner Laura Romanescu Kristin Drechsler

Rationale And Objectives: Training Convolutional Neural Networks (CNN) requires large datasets with labeled data, which can be very labor-intensive to prepare. Radiology reports contain a lot of potentially useful information for such tasks. However, they are often unstructured and cannot be directly used for training.

View Article and Find Full Text PDF

Similar Publications

The IBEX Imaging Knowledge-Base: A Community Resource Enabling Adoption and Development of Immunofluoresence Imaging Methods.

ArXiv

December 2024

Ziv Yaniv Ifeanyichukwu U Anidi Leanne Arakkal Armando J Arroyo-Mejías Rebecca T Beuschel

The iterative bleaching extends multiplexity (IBEX) Knowledge-Base is a central portal for researchers adopting IBEX and related 2D and 3D immunofluorescence imaging methods. The design of the Knowledge-Base is modeled after efforts in the open-source software community and includes three facets: a development platform (GitHub), static website, and service for data archiving. The Knowledge-Base facilitates the practice of open science throughout the research life cycle by providing validation data for recommended and non-recommended reagents, e.

View Article and Find Full Text PDF

Similar Publications

Digital Monitoring of Anemia Control Measures in a District Using Anemia Mukt Bharat Health Management Information System Indicators.

Cureus

December 2024

Centre for Population Research, Institute of Economic Growth, Delhi University, New Delhi, IND.

Surabhi Puri Kapil Yadav Shashi Kant Sanjay Rai William Joe

Introduction: Anemia is a severe public health problem in India, affecting more than 50% of individuals across most age groups. The Anemia Mukt Bharat (AMB) program, with a target of a three-percentage point reduction in anemia prevalence per year, developed a monitoring mechanism based on a set of 18 indicators and six key performance indicators (KPIs) derived from routine reporting in the Health Management Information System (HMIS). The study's objective was to assess the status of anemia control measures in the district of Faridabad, Haryana, India, using AMB HMIS indicators from April 2018 to March 2019.

View Article and Find Full Text PDF

Similar Publications

Me-LLaMA: Medical Foundation Large Language Models for Comprehensive Text Analysis and Beyond.

Res Sq

December 2024

Qianqian Xie Qingyu Chen Aokun Chen Cheng Peng Yan Hu

Recent advancements in large language models (LLMs) like ChatGPT and LLaMA have shown significant potential in medical applications, but their effectiveness is limited by a lack of specialized medical knowledge due to general-domain training. In this study, we developed Me-LLaMA, a new family of open-source medical LLMs that uniquely integrate extensive domain-specific knowledge with robust instruction-following capabilities. Me-LLaMA comprises foundation models (Me-LLaMA 13B and 70B) and their chat-enhanced versions, developed through comprehensive continual pretraining and instruction tuning of LLaMA2 models using both biomedical literature and clinical notes.

View Article and Find Full Text PDF

Similar Publications

MorphoDiff: Cellular Morphology Painting with Diffusion Models.

bioRxiv

December 2024

Zeinab Navidi Jun Ma Esteban A Miglietta Le Liu Anne E Carpenter

Understanding cellular responses to external stimuli is critical for parsing biological mechanisms and advancing therapeutic development. High-content image-based assays provide a cost-effective approach to examine cellular phenotypes induced by diverse interventions, which offers valuable insights into biological processes and cellular states. In this paper, we introduce MorphoDiff, a generative pipeline to predict high-resolution cell morphological responses under different conditions based on perturbation encoding.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!