Publications by authors named "Kesselman C"

Purpose: To develop and test a deep learning (DL) algorithm for detecting referable glaucoma in the Los Angeles County (LAC) Department of Health Services (DHS) teleretinal screening program.

Methods: Fundus photographs and patient-level labels of referable glaucoma (defined as cup-to-disc ratio [CDR] ≥ 0.6) provided by 21 trained optometrist graders were obtained from the LAC DHS teleretinal screening program.

View Article and Find Full Text PDF

IHMCIF (github.com/ihmwg/IHMCIF) is a data information framework that supports archiving and disseminating macromolecular structures determined by integrative or hybrid modeling (IHM), and making them Findable, Accessible, Interoperable, and Reusable (FAIR). IHMCIF is an extension of the Protein Data Bank Exchange/macromolecular Crystallographic Information Framework (PDBx/mmCIF) that serves as the framework for the Protein Data Bank (PDB) to archive experimentally determined atomic structures of biological macromolecules and their complexes with one another and small molecule ligands (e.

View Article and Find Full Text PDF

The Common Fund Data Ecosystem (CFDE) has created a flexible system of data federation that enables researchers to discover datasets from across the US National Institutes of Health Common Fund without requiring that data owners move, reformat, or rehost those data. This system is centered on a catalog that integrates detailed descriptions of biomedical datasets from individual Common Fund Programs' Data Coordination Centers (DCCs) into a uniform metadata model that can then be indexed and searched from a centralized portal. This Crosscut Metadata Model (C2M2) supports the wide variety of data types and metadata terms used by individual DCCs and can readily describe nearly all forms of biomedical research data.

View Article and Find Full Text PDF

The broad sharing of research data is widely viewed as critical for the speed, quality, accessibility, and integrity of science. Despite increasing efforts to encourage data sharing, both the quality of shared data and the frequency of data reuse remain stubbornly low. We argue here that a significant reason for this unfortunate state of affairs is that the organization of research results in the findable, accessible, interoperable, and reusable (FAIR) form required for reuse is too often deferred to the end of a research project when preparing publications-by which time essential details are no longer accessible.

View Article and Find Full Text PDF

Despite much creative work on methods and tools, reproducibility-the ability to repeat the computational steps used to obtain a research result-remains elusive. One reason for these difficulties is that extant tools for capturing research processes, while powerful, often fail to capture vital connections as research projects grow in extent and complexity. We explain here how these interstitial connections can be preserved via simple methods that integrate easily with current work practices to capture basic information about every data product consumed or produced in a project.

View Article and Find Full Text PDF

The FaceBase Consortium, funded by the National Institute of Dental and Craniofacial Research of the National Institutes of Health, was established in 2009 with the recognition that dental and craniofacial research are increasingly data-intensive disciplines. Data sharing is critical for the validation and reproducibility of results as well as to enable reuse of data. In service of these goals, data ought to be FAIR: Findable, Accessible, Interoperable, and Reusable.

View Article and Find Full Text PDF

Defining the structural and functional changes in the nervous system underlying learning and memory represents a major challenge for modern neuroscience. Although changes in neuronal activity following memory formation have been studied [B. F.

View Article and Find Full Text PDF

Structures of many complex biological assemblies are increasingly determined using integrative approaches, in which data from multiple experimental methods are combined. A standalone system, called PDB-Dev, has been developed for archiving integrative structures and making them publicly available. Here, the data standards and software tools that support PDB-Dev are described along with the new and updated components of the PDB-Dev data-collection, processing and archiving infrastructure.

View Article and Find Full Text PDF

Comprehensive modeling of a whole cell requires an integration of vast amounts of information on various aspects of the cell and its parts. To divide and conquer this task, we introduce Bayesian metamodeling, a general approach to modeling complex systems by integrating a collection of heterogeneous input models. Each input model can in principle be based on any type of data and can describe a different aspect of the modeled system using any mathematical representation, scale, and level of granularity.

View Article and Find Full Text PDF

The FaceBase Consortium was established by the National Institute of Dental and Craniofacial Research in 2009 as a 'big data' resource for the craniofacial research community. Over the past decade, researchers have deposited hundreds of annotated and curated datasets on both normal and disordered craniofacial development in FaceBase, all freely available to the research community on the FaceBase Hub website. The Hub has developed numerous visualization and analysis tools designed to promote integration of multidisciplinary data while remaining dedicated to the FAIR principles of data management (findability, accessibility, interoperability and reusability) and providing a faceted search infrastructure for locating desired data efficiently.

View Article and Find Full Text PDF

Characterizing the tissue-specific binding sites of transcription factors (TFs) is essential to reconstruct gene regulatory networks and predict functions for non-coding genetic variation. DNase-seq footprinting enables the prediction of genome-wide binding sites for hundreds of TFs simultaneously. Despite the public availability of high-quality DNase-seq data from hundreds of samples, a comprehensive, up-to-date resource for the locations of genomic footprints is lacking.

View Article and Find Full Text PDF

Database evolution is a notoriously difficult task, and it is exacerbated by the necessity to evolve database-dependent applications. As science becomes increasingly dependent on sophisticated data management, the need to evolve an array of database-driven systems will only intensify. In this paper, we present an architecture for data-centric ecosystems that allows the components to seamlessly co-evolve by centralizing the models and mappings at the data service and pushing model-adaptive interactions to the database clients.

View Article and Find Full Text PDF

Sharing of bioinformatics data within research communities holds the promise of facilitating more rapid discovery, yet the volume of data is growing at a pace exponentially greater than what traditional biocuration can support. We present here an approach that we have used to empower data producing researchers to curate high quality shared data that is ready for reuse and re-analysis.

View Article and Find Full Text PDF

Big biomedical data create exciting opportunities for discovery, but make it difficult to capture analyses and outputs in forms that are findable, accessible, interoperable, and reusable (FAIR). In response, we describe tools that make it easy to capture, and assign identifiers to, data and code throughout the data lifecycle. We illustrate the use of these tools via a case study involving a multi-step analysis that creates an atlas of putative transcription factor binding sites from terabytes of ENCODE DNase I hypersensitive sites sequencing data.

View Article and Find Full Text PDF

Predictive analytics in health is a complex, transdisciplinary field requiring collaboration across diverse scientific and stakeholder groups. Pilot implementation of participatory research to foster team science in predictive analytics through a partnered-symposium and funding competition. In total, 85 stakeholders were engaged across diverse translational domains, with a significant increase in perceived importance of early inclusion of patients and communities in research.

View Article and Find Full Text PDF

Creating and maintaining an accurate description of data assets and the relationships between assets is a critical aspect of making data findable, accessible, interoperable, and reusable (FAIR). Typically, such metadata are created and maintained in a data catalog by a curator as part of data publication. However, allowing metadata to be created and maintained by data producers as the data is generated rather then waiting for publication can have significant advantages in terms of productivity and repeatability.

View Article and Find Full Text PDF

The pace of discovery in eScience is increasingly dependent on a scientist's ability to acquire, curate, integrate, analyze, and share large and diverse collections of data. It is all too common for investigators to spend inordinate amounts of time developing ad hoc procedures to manage their data. In previous work, we presented Deriva, a Scientific Asset Management System, designed to accelerate data driven discovery.

View Article and Find Full Text PDF

Human kidney function is underpinned by approximately 1,000,000 nephrons, although the number varies substantially, and low nephron number is linked to disease. Human kidney development initiates around 4 weeks of gestation and ends around 34-37 weeks of gestation. Over this period, a reiterative inductive process establishes the nephron complement.

View Article and Find Full Text PDF

Exploring neuroanatomical sex differences using a multivariate statistical learning approach can yield insights that cannot be derived with univariate analysis. While gross differences in total brain volume are well-established, uncovering the more subtle, regional sex-related differences in neuroanatomy requires a multivariate approach that can accurately model spatial complexity as well as the interactions between neuroanatomical features. Here, we developed a multivariate statistical learning model using a support vector machine (SVM) classifier to predict sex from MRI-derived regional neuroanatomical features from a single-site study of 967 healthy youth from the Philadelphia Neurodevelopmental Cohort (PNC).

View Article and Find Full Text PDF

(Re)Building a Kidney is a National Institute of Diabetes and Digestive and Kidney Diseases-led consortium to optimize approaches for the isolation, expansion, and differentiation of appropriate kidney cell types and the integration of these cells into complex structures that replicate human kidney function. The ultimate goals of the consortium are two-fold: to develop and implement strategies for engineering of replacement kidney tissue, and to devise strategies to stimulate regeneration of nephrons to restore failing kidney function. Projects within the consortium will answer fundamental questions regarding human gene expression in the developing kidney, essential signaling crosstalk between distinct cell types of the developing kidney, how to derive the many cell types of the kidney through directed differentiation of human pluripotent stem cells, which bioengineering or scaffolding strategies have the most potential for kidney tissue formation, and basic parameters of the regenerative response to injury.

View Article and Find Full Text PDF

Background: A unique archive of Big Data on Parkinson's Disease is collected, managed and disseminated by the Parkinson's Progression Markers Initiative (PPMI). The integration of such complex and heterogeneous Big Data from multiple sources offers unparalleled opportunities to study the early stages of prevalent neurodegenerative processes, track their progression and quickly identify the efficacies of alternative treatments. Many previous human and animal studies have examined the relationship of Parkinson's disease (PD) risk to trauma, genetics, environment, co-morbidities, or life style.

View Article and Find Full Text PDF

The FaceBase Consortium, funded by the National Institute of Dental and Craniofacial Research, National Institutes of Health, is designed to accelerate understanding of craniofacial developmental biology by generating comprehensive data resources to empower the research community, exploring high-throughput technology, fostering new scientific collaborations among researchers and human/computer interactions, facilitating hypothesis-driven research and translating science into improved health care to benefit patients. The resources generated by the FaceBase projects include a number of dynamic imaging modalities, genome-wide association studies, software tools for analyzing human facial abnormalities, detailed phenotyping, anatomical and molecular atlases, global and specific gene expression patterns, and transcriptional profiling over the course of embryonic and postnatal development in animal models and humans. The integrated data visualization tools, faceted search infrastructure, and curation provided by the FaceBase Hub offer flexible and intuitive ways to interact with these multidisciplinary data.

View Article and Find Full Text PDF

Modern biomedical data collection is generating exponentially more data in a multitude of formats. This flood of complex data poses significant opportunities to discover and understand the critical interplay among such diverse domains as genomics, proteomics, metabolomics, and phenomics, including imaging, biometrics, and clinical data. The Big Data for Discovery Science Center is taking an "-ome to home" approach to discover linkages between these disparate data sources by mining existing databases of proteomic and genomic data, brain images, and clinical assessments.

View Article and Find Full Text PDF