Publications by authors named "Leihong Wu"

Pharmacogenomics (PGx) holds the promise of personalizing medical treatments based on individual genetic profiles, thereby enhancing drug efficacy and safety. However, the current landscape of PGx research is hindered by fragmented data sources, time-consuming manual data extraction processes, and the need for comprehensive and up-to-date information. This study aims to address these challenges by evaluating the ability of Large Language Models (LLMs), specifically Llama3.

View Article and Find Full Text PDF

Background: Artificial intelligence (AI) is rapidly being adopted to build products and aid in the decision-making process across industries. However, AI systems have been shown to exhibit and even amplify biases, causing a growing concern among people worldwide. Thus, investigating methods of measuring and mitigating bias within these AI-powered tools is necessary.

View Article and Find Full Text PDF

Introduction: The accurate identification and timely updating of adverse reactions in drug labeling are crucial for patient safety and effective drug use. Postmarketing surveillance plays a pivotal role in identifying previously undetected adverse events (AEs) that emerge when a drug is used in broader and more diverse patient populations. However, traditional methods of updating drug labeling with new AE information have been manual, time consuming, and error prone.

View Article and Find Full Text PDF

Text summarization is crucial in scientific research, drug discovery and development, regulatory review, and more. This task demands domain expertise, language proficiency, semantic prowess, and conceptual skill. The recent advent of large language models (LLMs), such as ChatGPT, offers unprecedented opportunities to automate this process.

View Article and Find Full Text PDF

In the rapidly evolving field of artificial intelligence (AI), explainability has been traditionally assessed in a post-modeling process and is often subjective. In contrary, many quantitative metrics have been routinely used to assess a model's performance. We proposed a unified formular named PERForm, by incorporating explainability as a weight into the existing statistical metrics to provide an integrated and quantitative measure of both predictivity and explainability to guide model selection, application, and evaluation.

View Article and Find Full Text PDF

Regulatory agencies consistently deal with extensive document reviews, ranging from product submissions to both internal and external communications. Large Language Models (LLMs) like ChatGPT can be invaluable tools for these tasks, however present several challenges, particularly the proprietary information, combining customized function with specific review needs, and transparency and explainability of the model's output. Hence, a localized and customized solution is imperative.

View Article and Find Full Text PDF

The US drug labeling document contains essential information on drug efficacy and safety, making it a crucial regulatory resource for Food and Drug Administration (FDA) drug reviewers. Due to its extensive volume and the presence of free-text, conventional text mining analysis have encountered challenges in processing these data. Recent advances in artificial intelligence (AI) for natural language processing (NLP) have provided an unprecedented opportunity to identify key information from drug labeling, thereby enhancing safety reviews and support for regulatory decisions.

View Article and Find Full Text PDF

Artificial intelligence (AI) is increasingly being used in decision making across various industries, including the public health arena. Bias in any decision-making process can significantly skew outcomes, and AI systems have been shown to exhibit biases at times. The potential for AI systems to perpetuate and even amplify biases is a growing concern.

View Article and Find Full Text PDF

The pathology of animal studies is crucial for toxicity evaluations and regulatory assessments, but the manual examination of slides by pathologists remains time-consuming and requires extensive training. One inherent challenge in this process is the interobserver variability, which can compromise the consistency and accuracy of a study. Artificial intelligence (AI) has demonstrated its ability to automate similar examinations in clinical applications with enhanced efficiency, consistency, and accuracy.

View Article and Find Full Text PDF

The US Food and Drug Administration (FDA) regulatory process often involves several reviewers who focus on sets of information related to their respective areas of review. Accordingly, manufacturers that provide submission packages to regulatory agencies are instructed to organize the contents using a structure that enables the information to be easily allocated, retrieved, and reviewed. However, this practice is not always followed correctly; as such, some documents are not well structured, with similar information spreading across different sections, hindering the efficient access and review of all of the relevant data as a whole.

View Article and Find Full Text PDF

Background: There are 2 general types of total ankle replacement (TAR) designs with respect to the polyethylene insert, mobile-bearing (MB) and fixed-bearing (FB) TARs. The aim of this study is to compare polyethylene-related adverse events (AEs), particularly revisions, reported for MB TARs and FB TARs using the US Food and Drug Administration's (FDA's) Manufacturer and User Facility Device Experience (MAUDE) database.

Methods: A text mining method was applied to the medical device reporting (MDR) in the MAUDE database from 1991 to 2020, followed by manual reviews to identify, characterize, and describe all polyethylene-related AEs, including revisions, reported for MB and FB TARs.

View Article and Find Full Text PDF

In the field of regulatory science, reviewing literature is an essential and important step, which most of the time is conducted by manually reading hundreds of articles. Although this process is highly time-consuming and labor-intensive, most output of this process is not well transformed into machine-readable format. The limited availability of data has largely constrained the artificial intelligence (AI) system development to facilitate this literature reviewing in the regulatory process.

View Article and Find Full Text PDF

Campylobacter coli is a leading bacterial cause of human gastroenteritis. We reported the circularized 1.8-Mbp complete genome of MLST type 1055 C.

View Article and Find Full Text PDF

Food samples are routinely screened for food-contaminating beetles (i.e., pantry beetles) due to their adverse impact on the economy, environment, public health and safety.

View Article and Find Full Text PDF

COVID-19 can lead to multiple severe outcomes including neurological and psychological impacts. However, it is challenging to manually scan hundreds of thousands of COVID-19 articles on a regular basis. To update our knowledge, provide sound science to the public, and communicate effectively, it is critical to have an efficient means of following the most current published data.

View Article and Find Full Text PDF

The primary objective of the FDA-led Sequencing and Quality Control Phase 2 (SEQC2) project is to develop standard analysis protocols and quality control metrics for use in DNA testing to enhance scientific research and precision medicine. This study reports a targeted next-generation sequencing (NGS) method that will enable more accurate detection of actionable mutations in circulating tumor DNA (ctDNA) clinical specimens. To accomplish this, a synthetic internal standard spike-in was designed for each actionable mutation target, suitable for use in NGS following hybrid capture enrichment and unique molecular index (UMI) or non-UMI library preparation.

View Article and Find Full Text PDF

The United States Food and Drug Administration (FDA) regulates a broad range of consumer products, which account for about 25% of the United States market. The FDA regulatory activities often involve producing and reading of a large number of documents, which is time consuming and labor intensive. To support regulatory science at FDA, we evaluated artificial intelligence (AI)-based natural language processing (NLP) of regulatory documents for text classification and compared deep learning-based models with a conventional keywords-based model.

View Article and Find Full Text PDF

Clinical applications of precision oncology require accurate tests that can distinguish true cancer-specific mutations from errors introduced at each step of next-generation sequencing (NGS). To date, no bulk sequencing study has addressed the effects of cross-site reproducibility, nor the biological, technical and computational factors that influence variant identification. Here we report a systematic interrogation of somatic mutations in paired tumor-normal cell lines to identify factors affecting detection reproducibility and accuracy at six different centers.

View Article and Find Full Text PDF

Background: Oncopanel genomic testing, which identifies important somatic variants, is increasingly common in medical practice and especially in clinical trials. Currently, there is a paucity of reliable genomic reference samples having a suitably large number of pre-identified variants for properly assessing oncopanel assay analytical quality and performance. The FDA-led Sequencing and Quality Control Phase 2 (SEQC2) consortium analyze ten diverse cancer cell lines individually and their pool, termed Sample A, to develop a reference sample with suitably large numbers of coding positions with known (variant) positives and negatives for properly evaluating oncopanel analytical performance.

View Article and Find Full Text PDF

Background: Targeted sequencing using oncopanels requires comprehensive assessments of accuracy and detection sensitivity to ensure analytical validity. By employing reference materials characterized by the U.S.

View Article and Find Full Text PDF

Circulating tumor DNA (ctDNA) sequencing is being rapidly adopted in precision oncology, but the accuracy, sensitivity and reproducibility of ctDNA assays is poorly understood. Here we report the findings of a multi-site, cross-platform evaluation of the analytical performance of five industry-leading ctDNA assays. We evaluated each stage of the ctDNA sequencing workflow with simulations, synthetic DNA spike-in experiments and proficiency testing on standardized, cell-line-derived reference samples.

View Article and Find Full Text PDF

Identifying the exact species of pantry beetle responsible for food contamination, is imperative in assessing the risks associated with contamination scenarios. Each beetle species is known to have unique patterns on their hardened forewings (known as elytra) through which they can be identified. Currently, this is done through manual microanalysis of the insect or their fragments in contaminated food samples.

View Article and Find Full Text PDF

Exposure to cigarette smoke (CS) is strongly associated with impaired mucociliary clearance (MCC), which has been implicated in the pathogenesis of CS-induced respiratory diseases, such as chronic obstructive pulmonary diseases (COPD). In this study, we aimed to identify microRNAs (miRNAs) that are associated with impaired MCC caused by CS in an in vitro human air-liquid-interface (ALI) airway tissue model. ALI cultures were exposed to CS (diluted with 0.

View Article and Find Full Text PDF

Selecting a model in predictive toxicology often involves a trade-off between prediction performance and explainability: should we sacrifice the model performance to gain explainability or vice versa. Here we present a comprehensive study to assess algorithm and feature influences on model performance in chemical toxicity research. We conducted over 5000 models for a Tox21 bioassay data set of 65 assays and ∼7600 compounds.

View Article and Find Full Text PDF

The mechanisms leading to organ level toxicities are poorly understood. In this study, we applied an integrated approach to deduce the molecular targets and biological pathways involved in chemically induced toxicity for eight common human organ level toxicity end points (carcinogenicity, cardiotoxicity, developmental toxicity, hepatotoxicity, nephrotoxicity, neurotoxicity, reproductive toxicity, and skin toxicity). Integrated analysis of in vitro assay data, molecular targets and pathway annotations from the literature, and toxicity-molecular target associations derived from text mining, combined with machine learning techniques, were used to generate molecular targets for each of the organ level toxicity end points.

View Article and Find Full Text PDF