Attention-based Imputation of Missing Values in Electronic Health Records Tabular Data.

Proc (IEEE Int Conf Healthc Inform)

Department of Computer Science, Tennessee State University, Nashville, TN, United States.

Published: June 2024

The imputation of missing values (IMV) in electronic health records tabular data is crucial to enable machine learning for patient-specific predictive modeling. While IMV methods are developed in biostatistics and recently in machine learning, deep learning-based solutions have shown limited success in learning tabular data. This paper proposes a novel attention-based missing value imputation framework that learns to reconstruct data with missing values leveraging between-feature (self-attention) or between-sample attentions. We adopt data manipulation methods used in contrastive learning to improve the generalization of the trained imputation model. The proposed self-attention imputation method outperforms state-of-the-art statistical and machine learning-based (decision-tree) imputation methods, reducing the normalized root mean squared error by 18.4% to 74.7% on five tabular data sets and 52.6% to 82.6% on two electronic health records data sets. The proposed attention-based missing value imputation method shows superior performance across a wide range of missingness (10% to 50%) when the values are missing completely at random.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11463999PMC
http://dx.doi.org/10.1109/ichi61247.2024.00030DOI Listing

Publication Analysis

Top Keywords

tabular data
16
missing values
12
electronic health
12
health records
12
imputation missing
8
records tabular
8
machine learning
8
attention-based missing
8
missing imputation
8
imputation method
8

Similar Publications

BbGSD: Black-boned Sheep Genome SNP Database.

Database (Oxford)

January 2025

College of Big Data, Yunnan Agricultural University, 452 Fengyuan Road, Panlong District, Kunming, Yunnan 650201, China.

Lanping black-boned (LPBB) sheep are a unique and rare ruminant species, characterized by black pigmentation in the skin and internal organs. Thus far, LPBB are the only known animal with heritable melanin characteristics besides the black-boned chicken, and the only mammal known to contain a large amount of melanin in the body. LPBB have therefore attracted substantial research attention, due to their potential contribution to medicine.

View Article and Find Full Text PDF

In cybersecurity, anomaly detection in tabular data is essential for ensuring information security. While traditional machine learning and deep learning methods have shown some success, they continue to face significant challenges in terms of generalization. To address these limitations, this paper presents an innovative method for tabular data anomaly detection based on large language models, called "Tabular Anomaly Detection via Guided Prompts" (TAD-GP).

View Article and Find Full Text PDF

Introduction: Escalation to single- or multiple-inhaler triple therapy (SITT; MITT) is a recommended option for patients with asthma who remain uncontrolled by medium-dose inhaled corticosteroid/long-acting β-agonist; however, characterization of elderly users of triple therapy is limited. This real-world cohort study describes demographics and clinical characteristics of elderly patients with asthma with and without comorbid chronic obstructive pulmonary disease (COPD) who are new users of triple therapy, and asthma treatment patterns preceding triple therapy initiation.

Methods: This retrospective cohort study used administrative claims data from the Optum Clinformatics Data Mart database.

View Article and Find Full Text PDF

Background: Workers may be exposed to different infectious agents, putting them at risk of developing occupational diseases. This can occur in many ways, through deliberate use of specific microorganisms or through potential exposure from close contact with biological material. Infection prevention and control measures against biohazards can reduce the risk of infection among workers.

View Article and Find Full Text PDF

MambaTab: A Plug-and-Play Model for Learning Tabular Data.

Proc (IEEE Conf Multimed Inf Process Retr)

August 2024

Department of Computer Science, University of Kentucky, Lexington, KY, USA.

Despite the prevalence of images and texts in machine learning, tabular data remains widely used across various domains. Existing deep learning models, such as convolutional neural networks and transformers, perform well however demand extensive preprocessing and tuning limiting accessibility and scalability. This work introduces an innovative approach based on a structured state-space model (SSM), MambaTab, for tabular data.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!