Background: One of the main challenges in the maintenance of registries is to keep a high follow-up rate and a reliable strategy to limit dropout is currently lacking. Aim of this study was to utilize machine learning (ML) models to highlight the characteristics of patients who are most likely to drop out, and to evaluate the potential cost effectiveness of the implementation of a follow-up system based on the obtained data.
Methods: All patients recruited in the local spine surgery registry were included and demographic, peri- and postoperative data were collected.
Background: The frequency of hip and knee arthroplasty surgeries has been rising steadily in recent decades. This trend is attributed to an aging population, leading to increased demands on healthcare systems. Fast Track (FT) surgical protocols, perioperative procedures designed to expedite patient recovery and early mobilization, have demonstrated efficacy in reducing hospital stays, convalescence periods, and associated costs.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
November 2024
In this article we propose a conceptual framework to study ensembles of conformal predictors (CP), that we call Ensemble Predictors (EP). Our approach is inspired by the application of imprecise probabilities in information fusion. Based on the proposed framework, we study, for the first time in the literature, the theoretical properties of CP ensembles in a general setting, by focusing on simple and commonly used possibilistic combination rules.
View Article and Find Full Text PDFThis paper examines a kind of explainable AI, centered around what we term pro-hoc explanations, that is a form of support that consists of offering alternative explanations (one for each possible outcome) instead of a specific post-hoc explanation following specific advice. Specifically, our support mechanism utilizes explanations by examples, featuring analogous cases for each category in a binary setting. Pro-hoc explanations are an instance of what we called frictional AI, a general class of decision support aimed at achieving a useful compromise between the increase of decision effectiveness and the mitigation of cognitive risks, such as over-reliance, automation bias and deskilling.
View Article and Find Full Text PDFThis paper proposes a user study aimed at evaluating the impact of Class Activation Maps (CAMs) as an eXplainable AI (XAI) method in a radiological diagnostic task, the detection of thoracolumbar (TL) fractures from vertebral X-rays. In particular, we focus on two oft-neglected features of CAMs, that is granularity and coloring, in terms of what features, lower-level vs higher-level, should the maps highlight and adopting which coloring scheme, to bring better impact to the decision-making process, both in terms of diagnostic accuracy (that is effectiveness) and of user-centered dimensions, such as perceived confidence and utility (that is satisfaction), depending on case complexity, AI accuracy, and user expertise. Our findings show that lower-level features CAMs, which highlight more focused anatomical landmarks, are associated with higher diagnostic accuracy than higher-level features CAMs, particularly among experienced physicians.
View Article and Find Full Text PDFIn this paper, we study human-AI collaboration protocols, a design-oriented construct aimed at establishing and evaluating how humans and AI can collaborate in cognitive tasks. We applied this construct in two user studies involving 12 specialist radiologists (the knee MRI study) and 44 ECG readers of varying expertise (the ECG study), who evaluated 240 and 20 cases, respectively, in different collaboration configurations. We confirm the utility of AI support but find that XAI can be associated with a "white-box paradox", producing a null or detrimental effect.
View Article and Find Full Text PDFTotal hip (THA) and total knee (TKA) arthroplasty procedures have steadily increased over the past few decades, and their use is expected to grow further, mainly due to an increasing number of elderly patients. Cost-containment strategies, supporting a rapid recovery with a positive functional outcomes, high patient satisfaction, and enhanced patient reported outcomes, are needed. A Fast Track surgical procedure (FT) is a coordinated perioperative approach aimed at expediting early mobilization and recovery following surgery and, accordingly, shortening the length of hospital stay (LOS), convalescence and costs.
View Article and Find Full Text PDFHuman Activity Recognition (HAR) has been studied extensively, yet current approaches are not capable of generalizing across different domains (i.e., subjects, devices, or datasets) with acceptable performance.
View Article and Find Full Text PDFComput Methods Programs Biomed
June 2022
Background and Objective Evaluation of AI-based decision support systems (AI-DSS) is of critical importance in practical applications, nonetheless common evaluation metrics fail to properly consider relevant and contextual information. In this article we discuss a novel utility metric, the weighted Utility (wU), for the evaluation of AI-DSS, which is based on the raters' perceptions of their annotation hesitation and of the relevance of the training cases. Methods We discuss the relationship between the proposed metric and other previous proposals; and we describe the application of the proposed metric for both model evaluation and optimization, through three realistic case studies.
View Article and Find Full Text PDFStud Health Technol Inform
May 2022
We propose a re-calibration method for Machine Learning models, based on computing confidence intervals for the predicted confidence scores. We show the effectiveness of the proposed method on a COVID-19 diagnosis benchmark.
View Article and Find Full Text PDFIn this article, we discuss the development of prognostic machine learning (ML) models for COVID-19 progression, by focusing on the task of predicting ICU admission within (any of) the next 5 days. On the basis of 6,625 complete blood count (CBC) tests from 1,004 patients, of which 18% were admitted to intensive care unit (ICU), we created four ML models, by adopting a robust development procedure which was designed to minimize risks of bias and over-fitting, according to reference guidelines. The best model, a support vector machine, had an AUC of .
View Article and Find Full Text PDFPurpose: The rRT-PCR for COVID-19 diagnosis is affected by long turnaround time, potential shortage of reagents, high false-negative rates and high costs. Routine hematochemical tests are a faster and less expensive alternative for diagnosis. Thus, Machine Learning (ML) has been applied to hematological parameters to develop diagnostic tools and help clinicians in promptly managing positive patients.
View Article and Find Full Text PDFUnlabelled: Background and Objective Medical machine learning (ML) models tend to perform better on data from the same cohort than on new data, often due to overfitting, or co-variate shifts. For these reasons, external validation (EV) is a necessary practice in the evaluation of medical ML. However, there is still a gap in the literature on how to interpret EV results and hence assess the robustness of ML models.
View Article and Find Full Text PDFObjectives: The European Biological Variation Study (EuBIVAS), which includes 91 healthy volunteers from five European countries, estimated high-quality biological variation (BV) data for several measurands. Previous EuBIVAS papers reported no significant differences among laboratories/population; however, they were focused on specific set of measurands, without a comprehensive general look. The aim of this paper is to evaluate the homogeneity of EuBIVAS data considering multivariate information applying the Principal Component Analysis (PCA), a machine learning unsupervised algorithm.
View Article and Find Full Text PDFThis editorial aims to contribute to the current debate about the quality of studies that apply machine learning (ML) methodologies to medical data to extract value from them and provide clinicians with viable and useful tools supporting everyday care practices. We propose a practical checklist to help authors to self assess the quality of their contribution and to help reviewers to recognize and appreciate high-quality medical ML studies by distinguishing them from the mere application of ML techniques to medical data.
View Article and Find Full Text PDFMedical errors have a huge impact on clinical practice in terms of economic and human costs. As a result, technology-based solutions, such as those grounded in artificial intelligence (AI) or collective intelligence (CI), have attracted increasing interest as a means of reducing error rates and their impacts. Previous studies have shown that a combination of individual opinions based on rules, weighting mechanisms, or other CI solutions could improve diagnostic accuracy with respect to individual doctors.
View Article and Find Full Text PDFTreatment and prevention of cardiovascular diseases often rely on Electrocardiogram (ECG) interpretation. Dependent on the physician's variability, ECG interpretation is subjective and prone to errors. Machine learning models are often developed and used to support doctors; however, their lack of interpretability stands as one of the main drawbacks of their widespread operation.
View Article and Find Full Text PDFPurpose: The integration of Artificial Intelligence into medical practices has recently been advocated for the promise to bring increased efficiency and effectiveness to these practices. Nonetheless, little research has so far been aimed at understanding the best human-AI interaction protocols in collaborative tasks, even in currently more viable settings, like independent double-reading screening tasks.
Methods: To this aim, we report about a retrospective case-control study, involving 12 board-certified radiologists, in the detection of knee lesions by means of Magnetic Resonance Imaging, in which we simulated the serial combination of two Deep Learning models with humans in eight double-reading protocols.
Background: The Lombardy region, Italy, has been severely affected by COVID-19. During the epidemic peak, in March 2020, patients needing intensive care unit treatments were approximately 10% of those infected. This fraction decreased to approximately 2% in the second part of April, and to 0.
View Article and Find Full Text PDFObjectives: The rRT-PCR test, the current gold standard for the detection of coronavirus disease (COVID-19), presents with known shortcomings, such as long turnaround time, potential shortage of reagents, false-negative rates around 15-20%, and expensive equipment. The hematochemical values of routine blood exams could represent a faster and less expensive alternative.
Methods: Three different training data set of hematochemical values from 1,624 patients (52% COVID-19 positive), admitted at San Raphael Hospital (OSR) from February to May 2020, were used for developing machine learning (ML) models: the complete OSR dataset (72 features: complete blood count (CBC), biochemical, coagulation, hemogasanalysis and CO-Oxymetry values, age, sex and specific symptoms at triage) and two sub-datasets (COVID-specific and CBC dataset, 32 and 21 features respectively).
BMC Med Inform Decis Mak
September 2020
Background: We focus on the importance of interpreting the quality of the labeling used as the input of predictive models to understand the reliability of their output in support of human decision-making, especially in critical domains, such as medicine.
Methods: Accordingly, we propose a framework distinguishing the reference labeling (or Gold Standard) from the set of annotations from which it is usually derived (the Diamond Standard). We define a set of quality dimensions and related metrics: representativeness (are the available data representative of its reference population?); reliability (do the raters agree with each other in their ratings?); and accuracy (are the raters' annotations a true representation?).
Background: Despite the vagueness and uncertainty that is intrinsic in any medical act, interpretation and decision (including acts of data reporting and representation of relevant medical conditions), still little research has focused on how to explicitly take this uncertainty into account. In this paper, we focus on the representation of a general and wide-spread medical terminology, which is grounded on a traditional and well-established convention, to represent severity of health conditions (for instance, pain, visible signs), ranging from Absent to Extreme. Specifically, we will study how both potential patients and doctors perceive the different levels of the terminology in both quantitative and qualitative terms, and if the embedded user knowledge could improve the representation of ordinal values in the construction of machine learning models.
View Article and Find Full Text PDFThe COVID-19 pandemia due to the SARS-CoV-2 coronavirus, in its first 4 months since its outbreak, has to date reached more than 200 countries worldwide with more than 2 million confirmed cases (probably a much higher number of infected), and almost 200,000 deaths. Amplification of viral RNA by (real time) reverse transcription polymerase chain reaction (rRT-PCR) is the current gold standard test for confirmation of infection, although it presents known shortcomings: long turnaround times (3-4 hours to generate results), potential shortage of reagents, false-negative rates as large as 15-20%, the need for certified laboratories, expensive equipment and trained personnel. Thus there is a need for alternative, faster, less expensive and more accessible tests.
View Article and Find Full Text PDFStud Health Technol Inform
June 2020
In this paper, we present and discuss two new measures of inter- and intra-rater agreement to assess the reliability of the raters, and hence of their labeling, in multi-rater setings, which are common in the production of ground truth for machine learning models. Our proposal is more conservative of other existing agreement measures, as it considers a more articulated notion of agreement by chance, based on an empirical estimation of the precision (or reliability) of the single raters involved. We discuss the measures in light of a realistic annotation tasks that involved 13 expert radiologists in labeling the MRNet dataset.
View Article and Find Full Text PDFStud Health Technol Inform
June 2020
As widely known, regular accuracy is a misleading and shallow indicator of the performance of a predictive model, especially in real-life domains like medicine, where decisions affect health or life. In this paper we present and discuss a new accuracy measure, the H-accuracy, as a more conservative alternative to regular accuracy, which we claim is more informative in the medical domain (and others of similar needs) for the elements it encompasses. In particular, the proposed measure takes into account important information such as the complexity of the cases and the case prevalance in the population.
View Article and Find Full Text PDF