Publications by authors named "Dalianis H"

Many state-of-the-art results in natural language processing (NLP) rely on large pre-trained language models (PLMs). These models consist of large amounts of parameters that are tuned using vast amounts of training data. These factors cause the models to memorize parts of their training data, making them vulnerable to various privacy attacks.

View Article and Find Full Text PDF

Background: Computer-assisted clinical coding (CAC) tools are designed to help clinical coders assign standardized codes, such as the ICD-10 (International Statistical Classification of Diseases, Tenth Revision), to clinical texts, such as discharge summaries. Maintaining the integrity of these standardized codes is important both for the functioning of health systems and for ensuring data used for secondary purposes are of high quality. Clinical coding is an error-prone cumbersome task, and the complexity of modern classification systems such as the ICD-11 (International Classification of Diseases, Eleventh Revision) presents significant barriers to implementation.

View Article and Find Full Text PDF

The lack of relevant annotated datasets represents one key limitation in the application of Natural Language Processing techniques in a broad number of tasks, among them Protected Health Information (PHI) identification in Norwegian clinical text. In this work, the possibility of exploiting resources from Swedish, a very closely related language, to Norwegian is explored. The Swedish dataset is annotated with PHI information.

View Article and Find Full Text PDF

With the recent advances in natural language processing and deep learning, the development of tools that can assist medical coders in ICD-10 diagnosis coding and increase their efficiency in coding discharge summaries is significantly more viable than before. To that end, one important component in the development of these models is the datasets used to train them. In this study, such datasets are presented, and it is shown that one of them can be used to develop a BERT-based language model that can consistently perform well in assigning ICD-10 codes to discharge summaries written in Swedish.

View Article and Find Full Text PDF

Sepsis is a leading cause of mortality and early identification improves survival. With increasing digitalization of health care data automated sepsis prediction models hold promise to aid in prompt recognition. Most previous studies have focused on the intensive care unit (ICU) setting.

View Article and Find Full Text PDF

We developed and validated a set of fully automated surveillance algorithms for healthcare-onset CDI using electronic health records. In a validation data set of 750 manually annotated admissions, the algorithm based on (ICD-10) code A04.7 had insufficient sensitivity.

View Article and Find Full Text PDF

Multi-label classification according to the International Classification of Diseases (ICD) is an Extreme Multi-label Classification task aiming to categorise health records according to a set of relevant ICD codes. We implemented PlaBERT, a new multi-label text classification head with per-label attention, on top of a BERT model. The model assessment is conducted on Electronic Health Records, conveying Discharge Summaries in three languages - English, Spanish, and Swedish.

View Article and Find Full Text PDF

Objectives: Sequential Organ Failure Assessment score is the basis of the Sepsis-3 criteria and requires arterial blood gas analysis to assess respiratory function. Peripheral oxygen saturation is a noninvasive alternative but is not included in neither Sequential Organ Failure Assessment score nor Sepsis-3. We aimed to assess the association between worst peripheral oxygen saturation during onset of suspected infection and mortality.

View Article and Find Full Text PDF

Background: Surveillance for healthcare-associated infections such as healthcare-associated urinary tract infections (HA-UTI) is important for directing resources and evaluating interventions. However, traditional surveillance methods are resource-intensive and subject to bias.

Aim: To develop and validate a fully automated surveillance algorithm for HA-UTI using electronic health record (EHR) data.

View Article and Find Full Text PDF

Background: The electronic medical record (EMR) offers unique possibilities for clinical research, but some important patient attributes are not readily available due to its unstructured properties. We applied text mining using machine learning to enable automatic classification of unstructured information on smoking status from Swedish EMR data.

Methods: Data on patients' smoking status from EMRs were used to develop 32 different predictive models that were trained using Weka, changing sentence frequency, classifier type, tokenization, and attribute selection in a database of 85,000 classified sentences.

View Article and Find Full Text PDF

Sensitive data is normally required to develop rule-based or train machine learning-based models for de-identifying electronic health record (EHR) clinical notes; and this presents important problems for patient privacy. In this study, we add non-sensitive public datasets to EHR training data; (i) scientific medical text and (ii) Wikipedia word vectors. The data, all in Swedish, is used to train a deep learning model using recurrent neural networks.

View Article and Find Full Text PDF

Background: Surveillance of sepsis incidence is important for directing resources and evaluating quality-of-care interventions. The aim was to develop and validate a fully-automated Sepsis-3 based surveillance system in non-intensive care wards using electronic health record (EHR) data, and demonstrate utility by determining the burden of hospital-onset sepsis and variations between wards.

Methods: A rule-based algorithm was developed using EHR data from a cohort of all adult patients admitted at an academic centre between July 2012 and December 2013.

View Article and Find Full Text PDF

This article describes the development and evaluation of a set of knowledge patterns that provide guidelines and implications of design for developers of mental health portals. The knowledge patterns were based on three foundations: (1) knowledge integration of language technology approaches; (2) experiments with language technology applications and (3) user studies of portal interaction. A mixed-methods approach was employed for the evaluation of the knowledge patterns: formative workshops with knowledge pattern experts and summative surveys with experts in specific domains.

View Article and Find Full Text PDF

Background: Natural language processing applied to clinical text or aimed at a clinical outcome has been thriving in recent years. This paper offers the first broad overview of clinical Natural Language Processing (NLP) for languages other than English. Recent studies are summarized to offer insights and outline opportunities in this area.

View Article and Find Full Text PDF

To enable secondary use of healthcare data in a privacy-preserving manner, there is a need for methods capable of automatically identifying protected health information (PHI) in clinical text. To that end, learning predictive models from labeled examples has emerged as a promising alternative to rule-based systems. However, little is known about differences with respect to PHI prevalence in different types of clinical notes and how potential domain differences may affect the performance of predictive models trained on one particular type of note and applied to another.

View Article and Find Full Text PDF

Objective: The goal of this study is to investigate entity recognition within Electronic Health Records (EHRs) focusing on Spanish and Swedish. Of particular importance is a robust representation of the entities. In our case, we utilized unsupervised methods to generate such representations.

View Article and Find Full Text PDF

Obscuring protected health information (PHI) in the clinical text of health records facilitates the secondary use of healthcare data in a privacy-preserving manner. Although automatic de-identification of clinical text using machine learning holds much promise, little is known about the relative prevalence of PHI in different types of clinical text and whether there is a need for domain adaptation when learning predictive models from one particular domain and applying it to another. In this study, we address these questions by training a predictive model and using it to estimate the prevalence of PHI in clinical text written (1) in different clinical specialties, (2) in different types of notes (i.

View Article and Find Full Text PDF

Hospital-acquired infections pose a significant risk to patient health, while their surveillance is an additional workload for hospital staff. Our overall aim is to build a surveillance system that reliably detects all patient records that potentially include hospital-acquired infections. This is to reduce the burden of having the hospital staff manually check patient records.

View Article and Find Full Text PDF

Background: Learning deep representations of clinical events based on their distributions in electronic health records has been shown to allow for subsequent training of higher-performing predictive models compared to the use of shallow, count-based representations. The predictive performance may be further improved by utilizing multiple representations of the same events, which can be obtained by, for instance, manipulating the representation learning procedure. The question, however, remains how to make best use of a set of diverse representations of clinical events - modeled in an ensemble of semantic spaces - for the purpose of predictive modeling.

View Article and Find Full Text PDF

Objectives: We present a review of recent advances in clinical Natural Language Processing (NLP), with a focus on semantic analysis and key subtasks that support such analysis.

Methods: We conducted a literature review of clinical NLP research from 2008 to 2014, emphasizing recent publications (2012-2014), based on PubMed and ACL proceedings as well as relevant referenced publications from the included papers.

Results: Significant articles published within this time-span were included and are discussed from the perspective of semantic analysis.

View Article and Find Full Text PDF

For the purpose of post-marketing drug safety surveillance, which has traditionally relied on the voluntary reporting of individual cases of adverse drug events (ADEs), other sources of information are now being explored, including electronic health records (EHRs), which give us access to enormous amounts of longitudinal observations of the treatment of patients and their drug use. Adverse drug events, which can be encoded in EHRs with certain diagnosis codes, are, however, heavily underreported. It is therefore important to develop capabilities to process, by means of computational methods, the more unstructured EHR data in the form of clinical notes, where clinicians may describe and reason around suspected ADEs.

View Article and Find Full Text PDF

Detection of early symptoms in cervical cancer is crucial for early treatment and survival. To find symptoms of cervical cancer in clinical text, Named Entity Recognition is needed. In this paper the Clinical Entity Finder, a machine-learning tool trained on annotated clinical text from a Swedish internal medicine emergency unit, is evaluated on cervical cancer records.

View Article and Find Full Text PDF

Objective: The ability of a cue-based system to accurately assert whether a disorder is affirmed, negated, or uncertain is dependent, in part, on its cue lexicon. In this paper, we continue our study of porting an assertion system (pyConTextNLP) from English to Swedish (pyConTextSwe) by creating an optimized assertion lexicon for clinical Swedish.

Methods And Material: We integrated cues from four external lexicons, along with generated inflections and combinations.

View Article and Find Full Text PDF

Automatic recognition of clinical entities in the narrative text of health records is useful for constructing applications for documentation of patient care, as well as for secondary usage in the form of medical knowledge extraction. There are a number of named entity recognition studies on English clinical text, but less work has been carried out on clinical text in other languages. This study was performed on Swedish health records, and focused on four entities that are highly relevant for constructing a patient overview and for medical hypothesis generation, namely the entities: Disorder, Finding, Pharmaceutical Drug and Body Structure.

View Article and Find Full Text PDF