A Compressed Language Model Embedding Dataset of ICD 10 CM Descriptions.

medRxiv

Department of Medicine VA Greater Los Angeles/UCLA, Los Angeles, USA.

Published: May 2023

This paper presents novel datasets providing numerical representations of ICD-10-CM codes by generating description embeddings using a large language model followed by a dimension reduction via autoencoder. The embeddings serve as informative input features for machine learning models by capturing relationships among categories and preserving inherent context information. The model generating the data was validated in two ways. First, the dimension reduction was validated using an autoencoder, and secondly, a supervised model was created to estimate the ICD-10-CM hierarchical categories. Results show that the dimension of the data can be reduced to as few as 10 dimensions while maintaining the ability to reproduce the original embeddings, with the fidelity decreasing as the reduced-dimension representation decreases. Multiple compression levels are provided, allowing users to choose as per their requirements. The readily available datasets of ICD-10-CM codes are anticipated to be highly valuable for researchers in biomedical informatics, enabling more advanced analyses in the field. This approach has the potential to significantly improve the utility of ICD-10-CM codes in the biomedical domain.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10168496PMC
http://dx.doi.org/10.1101/2023.04.24.23289046DOI Listing

Publication Analysis

Top Keywords

icd-10-cm codes
12
language model
8
dimension reduction
8
compressed language
4
model
4
model embedding
4
embedding dataset
4
dataset icd
4
icd descriptions
4
descriptions paper
4

Similar Publications

Background: Human trafficking (HT) survivors are at risk for substance use disorder (SUD), although assessing the SUD epidemiology of HT survivors is difficult. This study used data from the 2019 to 2021 Nationwide Emergency Department Sample to estimate the prevalence of SUD for HT survivors utilizing emergency departments (ED) in the United States of America (US).

Methods: We included visits for patients aged 12-64 years with any International Classification of Diseases 10th Revision, Clinical Modification (ICD-10-CM) codes documenting HT as a cause of morbidity ( 1,688,  141) or history of HT ( 2,524,  218).

View Article and Find Full Text PDF

The critical role of tumor size in predicting lymph node metastasis in early-stage colorectal cancer.

Am J Surg

December 2024

Department of Colorectal Surgery, Digestive Disease and Surgery Institute, Cleveland Clinic, Cleveland, OH, USA. Electronic address:

Background: Main purpose of this study is to investigate impact of tumor size on risk of lymph node metastasis (LNM) in pT1-stage colorectal cancer (CRC), focusing on colon, rectosigmoid junction, and rectum.

Method: Patients diagnosed with primary pT1 CRC between 2015 and 2019 were selected from National Cancer Database, utilizing International Classification of Diseases for Oncology, Third Edition (ICD-O-3) codes. We analyzed factors influencing LNM using uni- and multivariate analysis, then isolated tumor size to study its impact on LNM.

View Article and Find Full Text PDF

Odds of Metastatic Disease at Diagnosis of Primary Bone and Soft-Tissue Sarcomas of the Extremity and Pelvis Based on Preferred Language and Socioeconomic Factors.

J Am Acad Orthop Surg

December 2024

From the Vagelos College of Physicians of Surgeons, Columbia University, New York, NY (Garcia), and Department of Orthopedic Surgery, Columbia University Irving Medical Center, New York, NY (Tyler).

Introduction: The odds of metastatic disease at diagnosis of bone (BS) and soft-tissue sarcomas (STS) of the extremities and pelvis may vary among patients due to several factors. There is limited research comparing the rates of metastatic disease at diagnosis in patients from different demographic and socioeconomic backgrounds.

Methods: Patients with a primary BS or STS of the extremity or pelvis were identified using International Classification of Diseases codes.

View Article and Find Full Text PDF

Importance: A growing body of literature suggests the presence of a prodromal period with nonspecific signs and symptoms before onset of multiple sclerosis (MS).

Objective: To systematically assess diseases and symptoms diagnosed in the 5 years before a first MS- or central nervous system (CNS) demyelinating disease-related diagnostic code in pediatric patients compared with controls without MS and controls with another immune-mediated disorder, juvenile idiopathic arthritis (JIA).

Design, Setting, And Participants: This population-based, matched case-control study included children and adolescents (aged <18 years) in Germany with statutory health insurance from January 2010 to December 2020.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!