Background: A unique study identifier serves as a key for linking research data about a study subject without revealing protected health information in the identifier. While sufficient for single-site and limited-scale studies, the use of common unique study identifiers has several drawbacks for large multicenter studies, where thousands of research participants may be recruited from multiple sites. An important property of study identifiers is error tolerance (or validatable), in that inadvertent editing mistakes during their transmission and use will most likely result in invalid study identifiers.

Objective: This paper introduces a novel method called "Randomized N-gram Hashing (NHash)," for generating unique study identifiers in a distributed and validatable fashion, in multicenter research. NHash has a unique set of properties: (1) it is a pseudonym serving the purpose of linking research data about a study participant for research purposes; (2) it can be generated automatically in a completely distributed fashion with virtually no risk for identifier collision; (3) it incorporates a set of cryptographic hash functions based on N-grams, with a combination of additional encryption techniques such as a shift cipher; (d) it is validatable (error tolerant) in the sense that inadvertent edit errors will mostly result in invalid identifiers.

Methods: NHash consists of 2 phases. First, an intermediate string using randomized N-gram hashing is generated. This string consists of a collection of N-gram hashes f1, f2, ..., fk. The input for each function fi has 3 components: a random number r, an integer n, and input data m. The result, fi(r, n, m), is an n-gram of m with a starting position s, which is computed as (r mod |m|), where |m| represents the length of m. The output for Step 1 is the concatenation of the sequence f1(r1, n1, m1), f2(r2, n2, m2), ..., fk(rk, nk, mk). In the second phase, the intermediate string generated in Phase 1 is encrypted using techniques such as shift cipher. The result of the encryption, concatenated with the random number r, is the final NHash study identifier.

Results: We performed experiments using a large synthesized dataset comparing NHash with random strings, and demonstrated neglegible probability for collision. We implemented NHash for the Center for SUDEP Research (CSR), a National Institute for Neurological Disorders and Stroke-funded Center Without Walls for Collaborative Research in the Epilepsies. This multicenter collaboration involves 14 institutions across the United States and Europe, bringing together extensive and diverse expertise to understand sudden unexpected death in epilepsy patients (SUDEP).

Conclusions: The CSR Data Repository has successfully used NHash to link deidentified multimodal clinical data collected in participating CSR institutions, meeting all desired objectives of NHash.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4704892PMC
http://dx.doi.org/10.2196/medinform.4959DOI Listing

Publication Analysis

Top Keywords

unique study
16
study identifiers
16
n-gram hashing
12
study
9
nhash
8
randomized n-gram
8
linking data
8
data study
8
will result
8
result invalid
8

Similar Publications

There is a widespread perception that China's digital censorship distances its people from the global internet, and the Chinese Communist Party, through state-controlled media, is the main gatekeeper of information about foreign affairs. Our analysis of narratives about the Russo-Ukrainian War circulating on the Chinese social media platform Weibo challenges this view. Comparing narratives on Weibo with 8.

View Article and Find Full Text PDF

Dissecting the cellular architecture and genetic circuitry of the soybean seed.

Proc Natl Acad Sci U S A

January 2025

Department of Plant Biology, College of Biological Sciences, University of California, Davis, CA 95616.

Seeds are complex structures composed of three regions, embryo, endosperm, and seed coat, with each further divided into subregions that consist of tissues, cell layers, and cell types. Although the seed is well characterized anatomically, much less is known about the genetic circuitry that dictates its spatial complexity. To address this issue, we profiled mRNAs from anatomically distinct seed subregions at several developmental stages.

View Article and Find Full Text PDF

Geometrically modulated contact forces enable hula hoop levitation.

Proc Natl Acad Sci U S A

January 2025

Applied Mathematics Laboratory, Courant Institute of Mathematical Sciences, Department of Mathematics, New York University, New York, NY 10012.

Mechanical systems with moving points of contact-including rolling, sliding, and impacts-are common in engineering applications and everyday experiences. The challenges in analyzing such systems are compounded when an object dynamically explores the complex surface shape of a moving structure, as arises in familiar but poorly understood contexts such as hula hooping. We study this activity as a unique form of mechanical levitation against gravity and identify the conditions required for the stable suspension of an object rolling around a gyrating body.

View Article and Find Full Text PDF

Malignant peripheral nerve sheath tumors (MPNSTs) are aggressive sarcomas and the primary cause of mortality in patients with neurofibromatosis type 1 (NF1). These malignancies develop within preexisting benign lesions called plexiform neurofibromas (PNs). PNs are solely driven by biallelic loss eliciting RAS pathway activation, and they respond favorably to MEK inhibitor therapy.

View Article and Find Full Text PDF

Background: Large language models (LLMs) have been proposed as valuable tools in medical education and practice. The Chinese National Nursing Licensing Examination (CNNLE) presents unique challenges for LLMs due to its requirement for both deep domain-specific nursing knowledge and the ability to make complex clinical decisions, which differentiates it from more general medical examinations. However, their potential application in the CNNLE remains unexplored.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!