The Global Alliance for Genomics and Health (GA4GH) created the Beacon Project as a means of testing the willingness of data holders to share genetic data in the simplest technical context-a query for the presence of a specified nucleotide at a given position within a chromosome. Each participating site (or "beacon") is responsible for assuring that genomic data are exposed through the Beacon service only with the permission of the individual to whom the data pertains and in accordance with the GA4GH policy and standards.While recognizing the inference risks associated with large-scale data aggregation, and the fact that some beacons contain sensitive phenotypic associations that increase privacy risk, the GA4GH adjudged the risk of re-identification based on the binary yes/no allele-presence query responses as acceptable.
View Article and Find Full Text PDFIn this paper we proposed a framework: PRivacy-preserving EstiMation of Individual admiXture (PREMIX) using Intel software guard extensions (SGX). SGX is a suite of software and hardware architectures to enable efficient and secure computation over confidential data. PREMIX enables multiple sites to securely collaborate on estimating individual admixture within a secure enclave inside Intel SGX.
View Article and Find Full Text PDFComparing genomes of closely related genotypes from populations with distinct demographic histories can help reveal the impact of effective population size on genome evolution. For this purpose, we present a high quality genome assembly of (PA42), and compare this with the first sequenced genome of this species (TCO), which was derived from an isolate from a population with >90% reduction in nucleotide diversity. PA42 has numerous similarities to TCO at the gene level, with an average amino acid sequence identity of 98.
View Article and Find Full Text PDFMotivation: We introduce PRINCESS, a privacy-preserving international collaboration framework for analyzing rare disease genetic data that are distributed across different continents. PRINCESS leverages Software Guard Extensions (SGX) and hardware for trustworthy computation. Unlike a traditional international collaboration model, where individual-level patient DNA are physically centralized at a single site, PRINCESS performs a secure and distributed computation over encrypted data, fulfilling institutional policies and regulations for protected health information.
View Article and Find Full Text PDFTransposable elements (TEs) constitute a substantial portion of many eukaryotic genomes, and can in principle contribute to evolutionary innovation as well as genomic deterioration. Daphnia pulex serves as a useful model for studying TE dynamics as a potential cause and/or consequence of asexuality. We analyzed insertion polymorphisms of TEs in 20 sexual and 20 asexual isolates of D.
View Article and Find Full Text PDFAcute coronary syndrome (ACS) is a life-threatening disease that affects more than half a million people in United States. We currently lack molecular biomarkers to distinguish the unstable angina (UA) and acute myocardial infarction (AMI), which are the two subtypes of ACS. MicroRNAs play significant roles in biological processes and serve as good candidates for biomarkers.
View Article and Find Full Text PDFBackground: Medical concepts are inherently ambiguous and error-prone due to human fallibility, which makes it hard for them to be fully used by classical machine learning methods (eg, for tasks like early stage disease prediction).
Objective: Our work was to create a new machine-friendly representation that resembles the semantics of medical concepts. We then developed a sequential predictive model for medical events based on this new representation.
IEEE J Biomed Health Inform
September 2017
Applications of genomic studies are spreading rapidly in many domains of science and technology such as healthcare, biomedical research, direct-to-consumer services, and legal and forensic. However, there are a number of obstacles that make it hard to access and process a big genomic database for these applications. First, sequencing genomic sequence is a time consuming and expensive process.
View Article and Find Full Text PDFThe outsourcing of genomic data into public cloud computing settings raises concerns over privacy and security. Significant advancements in secure computation methods have emerged over the past several years, but such techniques need to be rigorously evaluated for their ability to support the analysis of human genomic data in an efficient and cost-effective manner. With respect to public cloud environments, there are concerns about the inadvertent exposure of human genomic data to unauthorized users.
View Article and Find Full Text PDFAccessing and integrating human genomic data with phenotypes are important for biomedical research. Making genomic data accessible for research purposes, however, must be handled carefully to avoid leakage of sensitive individual information to unauthorized parties and improper use of data. In this article, we focus on data sharing within the scope of data accessibility for research.
View Article and Find Full Text PDFBackground: Online health community (OHC) moderators help facilitate conversations and provide information to members. However, the necessity of the moderator in helping members achieve goals by providing the support they need remains unclear, with some prior research suggesting that moderation is unnecessary or even harmful for close-knit OHCs. Similarly, members' perceptions of moderator roles are underexplored.
View Article and Find Full Text PDFPAtients Like My gEnome (PALME) is a webservice that matches patients based on their genome and healthcare profiles. We support two types of inputs: (1) dual query (a variant + phenotype), and (2) genome sequences. For the first type of queries, we will show the patient profile matching the inputs.
View Article and Find Full Text PDFBMC Med Inform Decis Mak
July 2016
Background: Accurately assessing pain for those who cannot make self-report of pain, such as minimally responsive or severely brain-injured patients, is challenging. In this paper, we attempted to address this challenge by answering the following questions: (1) if the pain has dependency structures in electronic signals and if so, (2) how to apply this pattern in predicting the state of pain. To this end, we have been investigating and comparing the performance of several machine learning techniques.
View Article and Find Full Text PDFBMC Med Inform Decis Mak
July 2016
Background: In biomedical research, data sharing and information exchange are very important for improving quality of care, accelerating discovery, and promoting the meaningful secondary use of clinical data. A big concern in biomedical data sharing is the protection of patient privacy because inappropriate information leakage can put patient privacy at risk.
Methods: In this study, we deployed a grid logistic regression framework based on Secure Multi-party Computation (SMAC-GLORE).
Biomed Res Int
February 2017
The advent of the human genome sequence and the resulting ~20,000 genes provide a crucial framework for a transition from traditional biology to an integrative "OMICs" arena (Lander et al., 2001; Venter et al., 2001; Kitano, 2002).
View Article and Find Full Text PDFObjectives: We assessed the real-world effectiveness and safety of vedolizumab (VDZ) in moderate-severe Crohn's disease (CD).
Methods: Retrospective cohort study of seven medical centers, from May 2014 to December 2015. Adults with moderate-severe CD treated with VDZ, with follow-up after initiation of therapy, were included.
Proc ACM Int Conf Inf Knowl Manag
October 2015
Differential privacy has recently become a de facto standard for private statistical data release. Many algorithms have been proposed to generate differentially private histograms or synthetic data. However, most of them focus on "one-time" release of a static dataset and do not adequately address the increasing need of releasing series of dynamic datasets in real time.
View Article and Find Full Text PDFBMC Med Inform Decis Mak
October 2016
Background: The increasing availability of genome data motivates massive research studies in personalized treatment and precision medicine. Public cloud services provide a flexible way to mitigate the storage and computation burden in conducting genome-wide association studies (GWAS). However, data privacy has been widely concerned when sharing the sensitive information in a cloud environment.
View Article and Find Full Text PDFBMC Med Inform Decis Mak
October 2016
Background: The biomedical community benefits from the increasing availability of genomic data to support meaningful scientific research, e.g., Genome-Wide Association Studies (GWAS).
View Article and Find Full Text PDFDue to the limited solubility of phosphorus (P) in soil, understanding its binding in fine colloids is vital to better forecast P dynamics and losses in agricultural systems. We hypothesized that water-dispersible P is present as nanoparticles and that iron (Fe) plays a crucial role for P binding to these nanoparticles. To test this, we isolated water-dispersible fine colloids (WDFC) from an arable topsoil (Haplic Luvisol, Germany) and assessed colloidal P forms after asymmetric flow field-flow fractionation coupled with ultraviolet and an inductively coupled plasma mass spectrometer, with and without removal of amorphous and crystalline Fe oxides using oxalate and dithionite, respectively.
View Article and Find Full Text PDFJ Am Med Inform Assoc
November 2015
Biomedical Informatics is a growing interdisciplinary field in which research topics and citation trends have been evolving rapidly in recent years. To analyze these data in a fast, reproducible manner, automation of certain processes is needed. JAMIA is a "generalist" journal for biomedical informatics.
View Article and Find Full Text PDFObjective: To develop an accurate logistic regression (LR) algorithm to support federated data analysis of vertically partitioned distributed data sets.
Material And Methods: We propose a novel technique that solves the binary LR problem by dual optimization to obtain a global solution for vertically partitioned data. We evaluated this new method, VERTIcal Grid lOgistic regression (VERTIGO), in artificial and real-world medical classification problems in terms of the area under the receiver operating characteristic curve, calibration, and computational complexity.
Motivation: Genome-wide association studies (GWAS) have been widely used in discovering the association between genotypes and phenotypes. Human genome data contain valuable but highly sensitive information. Unprotected disclosure of such information might put individual's privacy at risk.
View Article and Find Full Text PDFAMIA Jt Summits Transl Sci Proc
August 2015
Automatically assigning MeSH (Medical Subject Headings) to articles is an active research topic. Recent work demonstrated the feasibility of improving the existing automated Medical Text Indexer (MTI) system, developed at the National Library of Medicine (NLM). Encouraged by this work, we propose a novel data-driven approach that uses semantic distances in the MeSH ontology for automated MeSH assignment.
View Article and Find Full Text PDFProceedings VLDB Endowment
August 2014
Differential privacy has recently emerged in private statistical data release as one of the strongest privacy guarantees. Releasing synthetic data that mimic original data with Differential privacy provides a promising way for privacy preserving data sharing and analytics while providing a rigorous privacy guarantee. However, to this date there is no open-source tools that allow users to generate differentially private synthetic data, in particular, for high dimensional and large domain data.
View Article and Find Full Text PDF