Publications by Marcin Joachimiak

Publications by authors named "Marcin Joachimiak"

Page 1 of 2

AI-readiness for Biomedical Data: Bridge2AI Recommendations.

Timothy Clark Harry Caufield Jillian A Parker Sadnan Al Manir Edilberto Amorim Marcin Joachimiak

bioRxiv

November 2024

Article Synopsis

Biomedical research is increasingly integrating artificial intelligence (AI) and machine learning (ML) to tackle complex challenges, necessitating a focus on ethical and explainable AI (XAI) due to the complexities of deep learning methods.
The NIH's Bridge2AI program is working on creating new flagship datasets aimed at enhancing AI/ML applications in biomedicine while establishing best practices, tools, standards, and criteria for assessing the data's AI readiness, including legal and ethical considerations.
The article outlines foundational criteria developed by the NIH Bridge2AI Standards Working Group to ensure the scientific rigor and ethical use of AI in biomedical research, emphasizing the need for ongoing adaptation as the field evolves.

View Article and Find Full Text PDF

Dynamic Retrieval Augmented Generation of Ontologies using Artificial Intelligence (DRAGON-AI).

Sabrina Toro Anna V Anagnostopoulos Susan M Bello Kai Blumberg Rhiannon Cameron Marcin P Joachimiak

J Biomed Semantics

October 2024

Article Synopsis

Ontologies are key for managing consensus knowledge in areas like biomedical, environmental, and food sciences, but creating and maintaining them requires significant resources and collaboration among experts.
The Dynamic Retrieval Augmented Generation of Ontologies using AI (DRAGON-AI) leverages Large Language Models and Retrieval Augmented Generation to automate the generation of ontology components, showing high precision in relationship creation and ability to produce acceptable definitions.
While DRAGON-AI can significantly support ontology development, expert curators remain essential for overseeing the quality and accuracy of the generated content.

View Article and Find Full Text PDF

Integrating biological knowledge for mechanistic inference in the host-associated microbiome.

Brook E Santangelo Madison Apgar Angela Sofia Burkhart Colorado Casey G Martin John Sterrett Marcin P Joachimiak

Front Microbiol

April 2024

Advances in high-throughput technologies have enhanced our ability to describe microbial communities as they relate to human health and disease. Alongside the growth in sequencing data has come an influx of resources that synthesize knowledge surrounding microbial traits, functions, and metabolic potential with knowledge of how they may impact host pathways to influence disease phenotypes. These knowledge bases can enable the development of mechanistic explanations that may underlie correlations detected between microbial communities and disease.

View Article and Find Full Text PDF

An open source knowledge graph ecosystem for the life sciences.

Tiffany J Callahan Ignacio J Tripodi Adrianne L Stefanski Luca Cappelletti Sanya B Taneja Marcin P Joachimiak

Sci Data

April 2024

Article Synopsis

Translational research needs data from different levels of biological systems, but combining that data is tough for scientists.
New technologies help gather more data, but researchers struggle to organize all the information effectively.
PheKnowLator is a tool that helps scientists create customizable knowledge graphs easily, making it better for managing complex health information without slowing down their work.

View Article and Find Full Text PDF

Estimating geographic variation of infection fatality ratios during epidemics.

Joshua Ladau Eoin L Brodie Nicola Falco Ishan Bansal Elijah B Hoffman Marcin P Joachimiak

Infect Dis Model

June 2024

Objectives: We aim to estimate geographic variability in total numbers of infections and infection fatality ratios (IFR; the number of deaths caused by an infection per 1,000 infected people) when the availability and quality of data on disease burden are limited during an epidemic.

Methods: We develop a noncentral hypergeometric framework that accounts for differential probabilities of positive tests and reflects the fact that symptomatic people are more likely to seek testing. We demonstrate the robustness, accuracy, and precision of this framework, and apply it to the United States (U.

View Article and Find Full Text PDF

Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES): a method for populating knowledge bases using zero-shot learning.

J Harry Caufield Harshad Hegde Vincent Emonet Nomi L Harris Marcin P Joachimiak

Bioinformatics

March 2024

Motivation: Creating knowledge bases and ontologies is a time consuming task that relies on manual curation. AI/NLP approaches can assist expert curators in populating these knowledge bases, but current approaches rely on extensive training data, and are not able to populate arbitrarily complex nested knowledge schemas.

Results: Here we present Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES), a Knowledge Extraction approach that relies on the ability of Large Language Models (LLMs) to perform zero-shot learning and general-purpose query answering from flexible prompts and return information conforming to a specified schema.

View Article and Find Full Text PDF

GRAPE for fast and scalable graph processing and random-walk-based embedding.

Luca Cappelletti Tommaso Fontana Elena Casiraghi Vida Ravanmehr Tiffany J Callahan Marcin P Joachimiak

Nat Comput Sci

June 2023

Graph representation learning methods opened new avenues for addressing complex, real-world problems represented by graphs. However, many graphs used in these applications comprise millions of nodes and billions of edges and are beyond the capabilities of current methods and software implementations. We present GRAPE (Graph Representation Learning, Prediction and Evaluation), a software resource for graph processing and embedding that is able to scale with big graphs by using specialized and smart data structures, algorithms, and a fast parallel implementation of random-walk-based methods.

View Article and Find Full Text PDF

A bacterial sensor taxonomy across earth ecosystems for machine learning applications.

Helen Park Marcin P Joachimiak Sean P Jungbluth Ziming Yang William J Riehl

mSystems

January 2024

Microbial communities have evolved to colonize all ecosystems of the planet, from the deep sea to the human gut. Microbes survive by sensing, responding, and adapting to immediate environmental cues. This process is driven by signal transduction proteins such as histidine kinases, which use their sensing domains to bind or otherwise detect environmental cues and "transduce" signals to adjust internal processes.

View Article and Find Full Text PDF

KG-Hub-building and exchanging biological knowledge graphs.

J Harry Caufield Tim Putman Kevin Schaper Deepak R Unni Harshad Hegde Marcin P Joachimiak

Bioinformatics

July 2023

Motivation: Knowledge graphs (KGs) are a powerful approach for integrating heterogeneous data and making inferences in biology and many other domains, but a coherent solution for constructing, exchanging, and facilitating the downstream use of KGs is lacking.

Results: Here we present KG-Hub, a platform that enables standardized construction, exchange, and reuse of KGs. Features include a simple, modular extract-transform-load pattern for producing graphs compliant with Biolink Model (a high-level data model for standardizing biological data), easy integration of any OBO (Open Biological and Biomedical Ontologies) ontology, cached downloads of upstream data sources, versioned and automatically updated builds with stable URLs, web-browsable storage of KG artifacts on cloud infrastructure, and easy reuse of transformed subgraphs across projects.

View Article and Find Full Text PDF

Gene Set Summarization Using Large Language Models.

Marcin P Joachimiak J Harry Caufield Nomi L Harris Hyeongsik Kim Christopher J Mungall

ArXiv

July 2024

Molecular biologists frequently interpret gene lists derived from high-throughput experiments and computational analysis. This is typically done as a statistical enrichment analysis that measures the over- or under-representation of biological function terms associated with genes or their properties, based on curated assertions from a knowledge base (KB) such as the Gene Ontology (GO). Interpreting gene lists can also be framed as a textual summarization task, enabling Large Language Models (LLMs) to use scientific texts directly and avoid reliance on a KB.

View Article and Find Full Text PDF

Developing a Knowledge Graph for Pharmacokinetic Natural Product-Drug Interactions.

Sanya B Taneja Tiffany J Callahan Mary F Paine Sandra L Kane-Gill Halil Kilicoglu Marcin P Joachimiak

J Biomed Inform

April 2023

Background: Pharmacokinetic natural product-drug interactions (NPDIs) occur when botanical or other natural products are co-consumed with pharmaceutical drugs. With the growing use of natural products, the risk for potential NPDIs and consequent adverse events has increased. Understanding mechanisms of NPDIs is key to preventing or minimizing adverse events.

View Article and Find Full Text PDF

Why was this cited? Explainable machine learning applied to COVID-19 research literature.

Lucie Beranová Marcin P Joachimiak Tomáš Kliegr Gollam Rabby Vilém Sklenák

Scientometrics

April 2022

Multiple studies have investigated bibliometric factors predictive of the citation count a research article will receive. In this article, we go beyond bibliometric data by using a range of machine learning techniques to find patterns predictive of citation count using both article content and available metadata. As the input collection, we use the CORD-19 corpus containing research articles-mostly from biology and medicine-applicable to the COVID-19 crisis.

View Article and Find Full Text PDF

Supervised learning with word embeddings derived from PubMed captures latent knowledge about protein kinases and cancer.

Vida Ravanmehr Hannah Blau Luca Cappelletti Tommaso Fontana Leigh Carmody Marcin Joachimiak

NAR Genom Bioinform

December 2021

Article Synopsis

Research on inhibiting protein kinases (PKs) has been crucial in cancer therapy, with about 8% of PKs targeted by FDA-approved drugs and numerous inhibitors in clinical trials.
A new approach using natural language processing and machine learning is presented to analyze relationships between PKs and various cancers, predicting which PKs to inhibit for effective treatment.
This method represents PKs and cancers as 100-dimensional vectors derived from PubMed abstracts, and uses data from clinical trials to accurately forecast PK-cancer associations, aiding in the design of targeted clinical trials for novel therapies.

View Article and Find Full Text PDF

Correction for Vangay et al., "Microbiome Metadata Standards: Report of the National Microbiome Data Collaborative's Workshop and Follow-On Activities".

Pajau Vangay Josephine Burgin Anjanette Johnston Kristen L Beck Daniel C Berrios Marcin P Joachimiak

mSystems

May 2021

View Article and Find Full Text PDF

Microbiome Metadata Standards: Report of the National Microbiome Data Collaborative's Workshop and Follow-On Activities.

Pajau Vangay Josephine Burgin Anjanette Johnston Kristen L Beck Daniel C Berrios Marcin P Joachimiak

mSystems

February 2021

Microbiome samples are inherently defined by the environment in which they are found. Therefore, data that provide context and enable interpretation of measurements produced from biological samples, often referred to as metadata, are critical. Important contributions have been made in the development of community-driven metadata standards; however, these standards have not been uniformly embraced by the microbiome research community.

View Article and Find Full Text PDF

Zinc against COVID-19? Symptom surveillance and deficiency risk groups.

Marcin P Joachimiak

PLoS Negl Trop Dis

January 2021

A wide variety of symptoms is associated with Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) infection, and these symptoms can overlap with other conditions and diseases. Knowing the distribution of symptoms across diseases and individuals can support clinical actions on timelines shorter than those for drug and vaccine development. Here, we focus on zinc deficiency symptoms, symptom overlap with other conditions, as well as zinc effects on immune health and mechanistic zinc deficiency risk groups.

View Article and Find Full Text PDF

KG-COVID-19: A Framework to Produce Customized Knowledge Graphs for COVID-19 Response.

Justin T Reese Deepak Unni Tiffany J Callahan Luca Cappelletti Vida Ravanmehr Marcin P Joachimiak

Patterns (N Y)

January 2021

Integrated, up-to-date data about SARS-CoV-2 and COVID-19 is crucial for the ongoing response to the COVID-19 pandemic by the biomedical research community. While rich biological knowledge exists for SARS-CoV-2 and related viruses (SARS-CoV, MERS-CoV), integrating this knowledge is difficult and time-consuming, since much of it is in siloed databases or in textual format. Furthermore, the data required by the research community vary drastically for different tasks; the optimal data for a machine learning task, for example, is much different from the data used to populate a browsable user interface for clinicians.

View Article and Find Full Text PDF

KG-COVID-19: a framework to produce customized knowledge graphs for COVID-19 response.

Justin Reese Deepak Unni Tiffany J Callahan Luca Cappelletti Vida Ravanmehr Marcin P Joachimiak

bioRxiv

August 2020

Unlabelled: Integrated, up-to-date data about SARS-CoV-2 and coronavirus disease 2019 (COVID-19) is crucial for the ongoing response to the COVID-19 pandemic by the biomedical research community. While rich biological knowledge exists for SARS-CoV-2 and related viruses (SARS-CoV, MERS-CoV), integrating this knowledge is difficult and time consuming, since much of it is in siloed databases or in textual format. Furthermore, the data required by the research community varies drastically for different tasks - the optimal data for a machine learning task, for example, is much different from the data used to populate a browsable user interface for clinicians.

View Article and Find Full Text PDF

How many rare diseases are there?

Melissa Haendel Nicole Vasilevsky Deepak Unni Cristian Bologa Nomi Harris Marcin P Joachimiak

Nat Rev Drug Discov

February 2020

A lack of robust knowledge of the number of rare diseases and the number of people affected by them limits the development of approaches to ameliorate the substantial cumulative burden of rare diseases. Here, we call for coordinated efforts to more precisely define rare diseases.

View Article and Find Full Text PDF

The Monarch Initiative in 2019: an integrative data and analytic platform connecting phenotypes to genotypes across species.

Kent A Shefchek Nomi L Harris Michael Gargano Nicolas Matentzoglu Deepak Unni Marcin Joachimiak

Nucleic Acids Res

January 2020

In biology and biomedicine, relating phenotypic outcomes with genetic variation and environmental factors remains a challenge: patient phenotypes may not match known diseases, candidate variants may be in genes that haven't been characterized, research organisms may not recapitulate human or veterinary diseases, environmental factors affecting disease outcomes are unknown or undocumented, and many resources must be queried to find potentially significant phenotypic associations. The Monarch Initiative (https://monarchinitiative.org) integrates information on genes, variants, genotypes, phenotypes and diseases in a variety of species, and allows powerful ontology-based search.

View Article and Find Full Text PDF

Semantic integration of clinical laboratory tests from electronic health records for deep phenotyping and biomarker discovery.

Xingmin Aaron Zhang Amy Yates Nicole Vasilevsky J P Gourdine Tiffany J Callahan Marcin P Joachimiak

NPJ Digit Med

May 2019

Electronic Health Record (EHR) systems typically define laboratory test results using the Laboratory Observation Identifier Names and Codes (LOINC) and can transmit them using Fast Healthcare Interoperability Resource (FHIR) standards. LOINC has not yet been semantically integrated with computational resources for phenotype analysis. Here, we provide a method for mapping LOINC-encoded laboratory test results transmitted in FHIR standards to Human Phenotype Ontology (HPO) terms.

View Article and Find Full Text PDF

KBase: The United States Department of Energy Systems Biology Knowledgebase.

Adam P Arkin Robert W Cottingham Christopher S Henry Nomi L Harris Rick L Stevens Marcin P Joachimiak

Nat Biotechnol

July 2018

View Article and Find Full Text PDF

Effects of genetic variation on the E. coli host-circuit interface.

Stefano Cardinale Marcin Pawel Joachimiak Adam Paul Arkin

Cell Rep

July 2013

Predictable operation of engineered biological circuitry requires the knowledge of host factors that compete or interfere with designed function. Here, we perform a detailed analysis of the interaction between constitutive expression from a test circuit and cell-growth properties in a subset of genetic variants of the bacterium Escherichia coli. Differences in generic cellular parameters such as ribosome availability and growth rate are the main determinants (89%) of strain-specific differences of circuit performance in laboratory-adapted strains but are responsible for only 35% of expression variation across 88 mutants of E.

View Article and Find Full Text PDF

Characterization of NaCl tolerance in Desulfovibrio vulgaris Hildenborough through experimental evolution.

Aifen Zhou Edward Baidoo Zhili He Aindrila Mukhopadhyay Jason K Baumohl Marcin P Joachimiak

ISME J

September 2013

Article Synopsis

Researchers evolved a strain of Desulfovibrio vulgaris (ES9-11) to withstand higher levels of NaCl by culturing it for 1200 generations in saline conditions.
The study found that the NaCl-evolved strain showed enhanced tolerance compared to a control strain, with significant changes in gene expression related to amino acid synthesis, energy production, and reduced motility.
Key findings include the role of glutamate as a primary osmoprotectant, increased membrane fluidity from specific fatty acids, and an overall mechanism involving osmolyte accumulation and sodium ion exclusion that contribute to increased NaCl tolerance.

View Article and Find Full Text PDF

Deletion of the Desulfovibrio vulgaris carbon monoxide sensor invokes global changes in transcription.

Lara Rajeev Kristina L Hillesland Grant M Zane Aifen Zhou Marcin P Joachimiak

J Bacteriol

November 2012

The carbon monoxide-sensing transcriptional factor CooA has been studied only in hydrogenogenic organisms that can grow using CO as the sole source of energy. Homologs for the canonical CO oxidation system, including CooA, CO dehydrogenase (CODH), and a CO-dependent Coo hydrogenase, are present in the sulfate-reducing bacterium Desulfovibrio vulgaris, although it grows only poorly on CO. We show that D.

View Article and Find Full Text PDF