Publications by authors named "Eric Fosler-Lussier"

Introduction: Advances in natural language understanding have facilitated the development of Virtual Standardized Patients (VSPs) that may soon rival human patients in conversational ability. We describe herein the development of an artificial intelligence (AI) system for VSPs enabling students to practice their history taking skills.

Methods: Our system consists of (1) Automated Speech Recognition (ASR), (2) hybrid AI for question identification, (3) classifier to choose between the two systems, and (4) automated speech generation.

View Article and Find Full Text PDF

Embeddings of words and concepts capture syntactic and semantic regularities of language; however, they have seen limited use as tools to study characteristics of different corpora and how they relate to one another. We introduce TextEssence, an interactive system designed to enable comparative analysis of corpora using embeddings. TextEssence includes visual, neighbor-based, and similarity-based modes of embedding analysis in a lightweight, web-based interface.

View Article and Find Full Text PDF

Linking clinical narratives to standardized vocabularies and coding systems is a key component of unlocking the information in medical text for analysis. However, many domains of medical concepts, such as functional outcomes and social determinants of health, lack well-developed terminologies that can support effective coding of medical text. We present a framework for developing natural language processing (NLP) technologies for automated coding of medical information in under-studied domains, and demonstrate its applicability through a case study on physical mobility function.

View Article and Find Full Text PDF

Objectives: Normalizing mentions of medical concepts to standardized vocabularies is a fundamental component of clinical text analysis. Ambiguity-words or phrases that may refer to different concepts-has been extensively researched as part of information extraction from biomedical literature, but less is known about the types and frequency of ambiguity in clinical text. This study characterizes the distribution and distinct types of ambiguity exhibited by benchmark clinical concept normalization datasets, in order to identify directions for advancing medical concept normalization research.

View Article and Find Full Text PDF

Exploration and analysis of potential data sources is a significant challenge in the application of NLP techniques to novel information domains. We describe HARE, a system for highlighting relevant information in document collections to support ranking and triage, which provides tools for post-processing and qualitative analysis for model development and tuning. We apply HARE to the use case of narrative descriptions of mobility information in clinical data, and demonstrate its utility in comparing candidate embedding features.

View Article and Find Full Text PDF

Practicing a medical history using standardized patients is an essential component of medical school curricula. Recent advances in technology now allow for newer approaches for practicing and assessing communication skills. We describe herein a virtual standardized patient (VSP) system that allows students to practice their history taking skills and receive immediate feedback.

View Article and Find Full Text PDF

Clinical trial coordinators refer to both structured and unstructured sources of data when evaluating a subject for eligibility. While some eligibility criteria can be resolved using structured data, some require manual review of clinical notes. An important step in automating the trial screening process is to be able to identify the right data source for resolving each criterion.

View Article and Find Full Text PDF

Sentence boundary detection (SBD) is a critical preprocessing task for many natural language processing (NLP) applications. However, there has been little work on evaluating how well existing methods for SBD perform in the clinical domain. We evaluate five popular off-the-shelf NLP toolkits on the task of SBD in various kinds of text using a diverse set of corpora, including the GENIA corpus of biomedical abstracts, a corpus of clinical notes used in the 2010 i2b2 shared task, and two general-domain corpora (the British National Corpus and Switchboard).

View Article and Find Full Text PDF

Clinical trials are essential for determining whether new interventions are effective. In order to determine the eligibility of patients to enroll into these trials, clinical trial coordinators often perform a manual review of clinical notes in the electronic health record of patients. This is a very time-consuming and exhausting task.

View Article and Find Full Text PDF

The second track of the 2014 i2b2 challenge asked participants to automatically identify risk factors for heart disease among diabetic patients using natural language processing techniques for clinical notes. This paper describes a rule-based system developed using a combination of regular expressions, concepts from the Unified Medical Language System (UMLS), and freely-available resources from the community. With a performance (F1=90.

View Article and Find Full Text PDF

Electronic health records capture patient information using structured controlled vocabularies and unstructured narrative text. While structured data typically encodes lab values, encounters and medication lists, unstructured data captures the physician's interpretation of the patient's condition, prognosis, and response to therapeutic intervention. In this paper, we demonstrate that information extraction from unstructured clinical narratives is essential to most clinical applications.

View Article and Find Full Text PDF

Objective: To summarize literature describing approaches aimed at automatically identifying patients with a common phenotype.

Materials And Methods: We performed a review of studies describing systems or reporting techniques developed for identifying cohorts of patients with specific phenotypes. Every full text article published in (1) Journal of American Medical Informatics Association, (2) Journal of Biomedical Informatics, (3) Proceedings of the Annual American Medical Informatics Association Symposium, and (4) Proceedings of Clinical Research Informatics Conference within the past 3 years was assessed for inclusion in the review.

View Article and Find Full Text PDF

The manual annotation of clinical narratives is an important step for training and validating the performance of automated systems that utilize these clinical narratives. We build an annotation specification to capture medical events, and coreferences and temporal relations between medical events in clinical text. Unfortunately, the process of clinical data annotation is both time consuming and costly.

View Article and Find Full Text PDF

Most computational models of word segmentation are trained and tested on transcripts of speech, rather than the speech itself, and assume that speech is converted into a sequence of symbols prior to word segmentation. We present a way of representing speech corpora that avoids this assumption, and preserves acoustic variation present in speech. We use this new representation to re-evaluate a key computational model of word segmentation.

View Article and Find Full Text PDF

Function words, especially frequently occurring ones such as (the, that, and, and of), vary widely in pronunciation. Understanding this variation is essential both for cognitive modeling of lexical production and for computer speech recognition and synthesis. This study investigates which factors affect the forms of function words, especially whether they have a fuller pronunciation (e.

View Article and Find Full Text PDF

Synopsis of recent research by authors named "Eric Fosler-Lussier"

  • - Eric Fosler-Lussier's recent research focuses on the integration of artificial intelligence and natural language processing in healthcare, emphasizing tools like Virtual Standardized Patients (VSPs) that enhance medical training through realistic conversational interactions.
  • - His studies highlight the challenges and advancements in automated coding of medical concepts, specifically addressing under-studied domains within health records, and propose frameworks for improving the normalization and analysis of clinical text data.
  • - Fosler-Lussier also investigates linguistic aspects of speech processing, exploring how variations in language affect information retrieval and understanding in clinical contexts, thereby contributing to the intersection of healthcare, education, and computational linguistics.