Publications by authors named "Marcin Joachimiak"

Article Synopsis
  • Biomedical research is increasingly integrating artificial intelligence (AI) and machine learning (ML) to tackle complex challenges, necessitating a focus on ethical and explainable AI (XAI) due to the complexities of deep learning methods.
  • The NIH's Bridge2AI program is working on creating new flagship datasets aimed at enhancing AI/ML applications in biomedicine while establishing best practices, tools, standards, and criteria for assessing the data's AI readiness, including legal and ethical considerations.
  • The article outlines foundational criteria developed by the NIH Bridge2AI Standards Working Group to ensure the scientific rigor and ethical use of AI in biomedical research, emphasizing the need for ongoing adaptation as the field evolves.
View Article and Find Full Text PDF
Article Synopsis
  • Ontologies are key for managing consensus knowledge in areas like biomedical, environmental, and food sciences, but creating and maintaining them requires significant resources and collaboration among experts.
  • The Dynamic Retrieval Augmented Generation of Ontologies using AI (DRAGON-AI) leverages Large Language Models and Retrieval Augmented Generation to automate the generation of ontology components, showing high precision in relationship creation and ability to produce acceptable definitions.
  • While DRAGON-AI can significantly support ontology development, expert curators remain essential for overseeing the quality and accuracy of the generated content.
View Article and Find Full Text PDF

Advances in high-throughput technologies have enhanced our ability to describe microbial communities as they relate to human health and disease. Alongside the growth in sequencing data has come an influx of resources that synthesize knowledge surrounding microbial traits, functions, and metabolic potential with knowledge of how they may impact host pathways to influence disease phenotypes. These knowledge bases can enable the development of mechanistic explanations that may underlie correlations detected between microbial communities and disease.

View Article and Find Full Text PDF
Article Synopsis
  • Translational research needs data from different levels of biological systems, but combining that data is tough for scientists.
  • New technologies help gather more data, but researchers struggle to organize all the information effectively.
  • PheKnowLator is a tool that helps scientists create customizable knowledge graphs easily, making it better for managing complex health information without slowing down their work.
View Article and Find Full Text PDF

Objectives: We aim to estimate geographic variability in total numbers of infections and infection fatality ratios (IFR; the number of deaths caused by an infection per 1,000 infected people) when the availability and quality of data on disease burden are limited during an epidemic.

Methods: We develop a noncentral hypergeometric framework that accounts for differential probabilities of positive tests and reflects the fact that symptomatic people are more likely to seek testing. We demonstrate the robustness, accuracy, and precision of this framework, and apply it to the United States (U.

View Article and Find Full Text PDF

Motivation: Creating knowledge bases and ontologies is a time consuming task that relies on manual curation. AI/NLP approaches can assist expert curators in populating these knowledge bases, but current approaches rely on extensive training data, and are not able to populate arbitrarily complex nested knowledge schemas.

Results: Here we present Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES), a Knowledge Extraction approach that relies on the ability of Large Language Models (LLMs) to perform zero-shot learning and general-purpose query answering from flexible prompts and return information conforming to a specified schema.

View Article and Find Full Text PDF

Graph representation learning methods opened new avenues for addressing complex, real-world problems represented by graphs. However, many graphs used in these applications comprise millions of nodes and billions of edges and are beyond the capabilities of current methods and software implementations. We present GRAPE (Graph Representation Learning, Prediction and Evaluation), a software resource for graph processing and embedding that is able to scale with big graphs by using specialized and smart data structures, algorithms, and a fast parallel implementation of random-walk-based methods.

View Article and Find Full Text PDF

Microbial communities have evolved to colonize all ecosystems of the planet, from the deep sea to the human gut. Microbes survive by sensing, responding, and adapting to immediate environmental cues. This process is driven by signal transduction proteins such as histidine kinases, which use their sensing domains to bind or otherwise detect environmental cues and "transduce" signals to adjust internal processes.

View Article and Find Full Text PDF

Motivation: Knowledge graphs (KGs) are a powerful approach for integrating heterogeneous data and making inferences in biology and many other domains, but a coherent solution for constructing, exchanging, and facilitating the downstream use of KGs is lacking.

Results: Here we present KG-Hub, a platform that enables standardized construction, exchange, and reuse of KGs. Features include a simple, modular extract-transform-load pattern for producing graphs compliant with Biolink Model (a high-level data model for standardizing biological data), easy integration of any OBO (Open Biological and Biomedical Ontologies) ontology, cached downloads of upstream data sources, versioned and automatically updated builds with stable URLs, web-browsable storage of KG artifacts on cloud infrastructure, and easy reuse of transformed subgraphs across projects.

View Article and Find Full Text PDF

Molecular biologists frequently interpret gene lists derived from high-throughput experiments and computational analysis. This is typically done as a statistical enrichment analysis that measures the over- or under-representation of biological function terms associated with genes or their properties, based on curated assertions from a knowledge base (KB) such as the Gene Ontology (GO). Interpreting gene lists can also be framed as a textual summarization task, enabling Large Language Models (LLMs) to use scientific texts directly and avoid reliance on a KB.

View Article and Find Full Text PDF

Background: Pharmacokinetic natural product-drug interactions (NPDIs) occur when botanical or other natural products are co-consumed with pharmaceutical drugs. With the growing use of natural products, the risk for potential NPDIs and consequent adverse events has increased. Understanding mechanisms of NPDIs is key to preventing or minimizing adverse events.

View Article and Find Full Text PDF

Multiple studies have investigated bibliometric factors predictive of the citation count a research article will receive. In this article, we go beyond bibliometric data by using a range of machine learning techniques to find patterns predictive of citation count using both article content and available metadata. As the input collection, we use the CORD-19 corpus containing research articles-mostly from biology and medicine-applicable to the COVID-19 crisis.

View Article and Find Full Text PDF
Article Synopsis
  • Research on inhibiting protein kinases (PKs) has been crucial in cancer therapy, with about 8% of PKs targeted by FDA-approved drugs and numerous inhibitors in clinical trials.
  • A new approach using natural language processing and machine learning is presented to analyze relationships between PKs and various cancers, predicting which PKs to inhibit for effective treatment.
  • This method represents PKs and cancers as 100-dimensional vectors derived from PubMed abstracts, and uses data from clinical trials to accurately forecast PK-cancer associations, aiding in the design of targeted clinical trials for novel therapies.
View Article and Find Full Text PDF

Microbiome samples are inherently defined by the environment in which they are found. Therefore, data that provide context and enable interpretation of measurements produced from biological samples, often referred to as metadata, are critical. Important contributions have been made in the development of community-driven metadata standards; however, these standards have not been uniformly embraced by the microbiome research community.

View Article and Find Full Text PDF

A wide variety of symptoms is associated with Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) infection, and these symptoms can overlap with other conditions and diseases. Knowing the distribution of symptoms across diseases and individuals can support clinical actions on timelines shorter than those for drug and vaccine development. Here, we focus on zinc deficiency symptoms, symptom overlap with other conditions, as well as zinc effects on immune health and mechanistic zinc deficiency risk groups.

View Article and Find Full Text PDF

Integrated, up-to-date data about SARS-CoV-2 and COVID-19 is crucial for the ongoing response to the COVID-19 pandemic by the biomedical research community. While rich biological knowledge exists for SARS-CoV-2 and related viruses (SARS-CoV, MERS-CoV), integrating this knowledge is difficult and time-consuming, since much of it is in siloed databases or in textual format. Furthermore, the data required by the research community vary drastically for different tasks; the optimal data for a machine learning task, for example, is much different from the data used to populate a browsable user interface for clinicians.

View Article and Find Full Text PDF

Unlabelled: Integrated, up-to-date data about SARS-CoV-2 and coronavirus disease 2019 (COVID-19) is crucial for the ongoing response to the COVID-19 pandemic by the biomedical research community. While rich biological knowledge exists for SARS-CoV-2 and related viruses (SARS-CoV, MERS-CoV), integrating this knowledge is difficult and time consuming, since much of it is in siloed databases or in textual format. Furthermore, the data required by the research community varies drastically for different tasks - the optimal data for a machine learning task, for example, is much different from the data used to populate a browsable user interface for clinicians.

View Article and Find Full Text PDF

A lack of robust knowledge of the number of rare diseases and the number of people affected by them limits the development of approaches to ameliorate the substantial cumulative burden of rare diseases. Here, we call for coordinated efforts to more precisely define rare diseases.

View Article and Find Full Text PDF

In biology and biomedicine, relating phenotypic outcomes with genetic variation and environmental factors remains a challenge: patient phenotypes may not match known diseases, candidate variants may be in genes that haven't been characterized, research organisms may not recapitulate human or veterinary diseases, environmental factors affecting disease outcomes are unknown or undocumented, and many resources must be queried to find potentially significant phenotypic associations. The Monarch Initiative (https://monarchinitiative.org) integrates information on genes, variants, genotypes, phenotypes and diseases in a variety of species, and allows powerful ontology-based search.

View Article and Find Full Text PDF

Electronic Health Record (EHR) systems typically define laboratory test results using the Laboratory Observation Identifier Names and Codes (LOINC) and can transmit them using Fast Healthcare Interoperability Resource (FHIR) standards. LOINC has not yet been semantically integrated with computational resources for phenotype analysis. Here, we provide a method for mapping LOINC-encoded laboratory test results transmitted in FHIR standards to Human Phenotype Ontology (HPO) terms.

View Article and Find Full Text PDF

Predictable operation of engineered biological circuitry requires the knowledge of host factors that compete or interfere with designed function. Here, we perform a detailed analysis of the interaction between constitutive expression from a test circuit and cell-growth properties in a subset of genetic variants of the bacterium Escherichia coli. Differences in generic cellular parameters such as ribosome availability and growth rate are the main determinants (89%) of strain-specific differences of circuit performance in laboratory-adapted strains but are responsible for only 35% of expression variation across 88 mutants of E.

View Article and Find Full Text PDF
Article Synopsis
  • Researchers evolved a strain of Desulfovibrio vulgaris (ES9-11) to withstand higher levels of NaCl by culturing it for 1200 generations in saline conditions.
  • The study found that the NaCl-evolved strain showed enhanced tolerance compared to a control strain, with significant changes in gene expression related to amino acid synthesis, energy production, and reduced motility.
  • Key findings include the role of glutamate as a primary osmoprotectant, increased membrane fluidity from specific fatty acids, and an overall mechanism involving osmolyte accumulation and sodium ion exclusion that contribute to increased NaCl tolerance.
View Article and Find Full Text PDF

The carbon monoxide-sensing transcriptional factor CooA has been studied only in hydrogenogenic organisms that can grow using CO as the sole source of energy. Homologs for the canonical CO oxidation system, including CooA, CO dehydrogenase (CODH), and a CO-dependent Coo hydrogenase, are present in the sulfate-reducing bacterium Desulfovibrio vulgaris, although it grows only poorly on CO. We show that D.

View Article and Find Full Text PDF