IEEE Trans Pattern Anal Mach Intell
July 2023
Classic embedded feature selection algorithms are often divided in two large groups: tree-based algorithms and LASSO variants. Both approaches are focused in different aspects: while the tree-based algorithms provide a clear explanation about which variables are being used to trigger a certain output, LASSO-like approaches sacrifice a detailed explanation in favor of increasing its accuracy. In this paper, we present a novel embedded feature selection algorithm, called End-to-End Feature Selection (E2E-FS), that aims to provide both accuracy and explainability in a clever way.
View Article and Find Full Text PDFThe number of interconnected devices, such as personal wearables, cars, and smart-homes, surrounding us every day has recently increased. The Internet of Things devices monitor many processes, and have the capacity of using machine learning models for pattern recognition, and even making decisions, with the added advantage of diminishing network congestion by allowing computations near to the data sources. The main restriction is the low computation capacity of these devices.
View Article and Find Full Text PDFIn this study, we analyze the capability of several state of the art machine learning methods to predict whether patients diagnosed with CoVid-19 (CoronaVirus disease 2019) will need different levels of hospital care assistance (regular hospital admission or intensive care unit admission), during the course of their illness, using only demographic and clinical data. For this research, a data set of 10,454 patients from 14 hospitals in Galicia (Spain) was used. Each patient is characterized by 833 variables, two of which are age and gender and the other are records of diseases or conditions in their medical history.
View Article and Find Full Text PDFFeature selection is a preprocessing technique that identifies the key features of a given problem. It has traditionally been applied in a wide range of problems that include biological data processing, finance, and intrusion detection systems. In particular, feature selection has been successfully used in medical applications, where it can not only reduce dimensionality but also help us understand the causes of a disease.
View Article and Find Full Text PDFThe current situation in microarray data analysis and prospects for the future are briefly discussed in this chapter, in which the competition between microarray technologies and high-throughput technologies is considered under a data analysis view. The up-to-date limitations of DNA microarrays are important to forecast challenges and future trends in microarray data analysis; these include data analysis techniques associated with an increasing sample sizes, new feature selection methods, deep learning techniques, covariate significance testing as well as false discovery rate methods, among other procedures for a better interpretability of the results.
View Article and Find Full Text PDFA typical characteristic of microarray data is that it has a very high number of features (in the order of thousands) while the number of examples is usually less than 100. In the context of microarray classification, this poses a challenge for machine learning methods, which can suffer overfitting and thus degradation in their performance. A common solution is to apply a dimensionality reduction technique before classification, to reduce the number of features.
View Article and Find Full Text PDFThe advent of DNA microarray datasets has stimulated a new line of research both in bioinformatics and in machine learning. This type of data is used to collect information from tissue and cell samples regarding gene expression differences that could be useful for disease diagnosis or for distinguishing specific types of tumor. Microarray data classification is a difficult challenge for machine learning researchers due to its high number of features and the small sample sizes.
View Article and Find Full Text PDFBackground: The All Patient-Refined Diagnosis-Related Groups (APR-DRGs) system has adjusted the basic DRG structure by incorporating four severity of illness (SOI) levels, which are used for determining hospital payment. A comprehensive report of all relevant diagnoses, namely the patient's underlying co-morbidities, is a key factor for ensuring that SOI determination will be adequate.
Objective: In this study, we aimed to characterise the individual impact of co-morbidities on APR-DRG classification and hospital funding in the context of respiratory and cardiovascular diseases.
Medicine will experience many changes in the coming years because the so-called "medicine of the future" will be increasingly proactive, featuring four basic elements: predictive, personalized, preventive, and participatory. Drivers for these changes include the digitization of data in medicine and the availability of computational tools that deal with massive volumes of data. Thus, the need to apply machine-learning methods to medicine has increased dramatically in recent years while facing challenges related to an unprecedented large number of clinically relevant features and highly specific diagnostic tests.
View Article and Find Full Text PDFImportance: Published definitions of plus disease in retinopathy of prematurity (ROP) reference arterial tortuosity and venous dilation within the posterior pole based on a standard published photograph. One possible explanation for limited interexpert reliability for a diagnosis of plus disease is that experts deviate from the published definitions.
Objective: To identify vascular features used by experts for diagnosis of plus disease through quantitative image analysis.
Purpose: We developed and evaluated the performance of a novel computer-based image analysis system for grading plus disease in retinopathy of prematurity (ROP), and identified the image features, shapes, and sizes that best correlate with expert diagnosis.
Methods: A dataset of 77 wide-angle retinal images from infants screened for ROP was collected. A reference standard diagnosis was determined for each image by combining image grading from 3 experts with the clinical diagnosis from ophthalmoscopic examination.
Background: Heart failure (HF) manifests as at least two subtypes. The current paradigm distinguishes the two by using both the metric ejection fraction (EF) and a constraint for end-diastolic volume. About half of all HF patients exhibit preserved EF.
View Article and Find Full Text PDFDry eye is a symptomatic disease which affects a wide range of population and has a negative impact on their daily activities. Its diagnosis can be achieved by analyzing the interference patterns of the tear film lipid layer and by classifying them into one of the Guillon categories. The manual process done by experts is not only affected by subjective factors but is also very time consuming.
View Article and Find Full Text PDFGene-expression microarray is a novel technology that allows the examination of tens of thousands of genes at a time. For this reason, manual observation is not feasible and machine learning methods are progressing to face these new data. Specifically, since the number of genes is very high, feature selection methods have proven valuable to deal with these unbalanced-high dimensionality and low cardinality-data sets.
View Article and Find Full Text PDF