Computer-aided diagnosis of lung cancer: the effect of training data sets on classification accuracy of lung nodules.

Phys Med Biol

University of Shanghai for Science and Technology, School of Medical Instrument and Food Engineering, 516 Jun Gong Road, Shanghai 200093, People's Republic of China.

Published: February 2018

This study aims to develop a computer-aided diagnosis (CADx) scheme for classification between malignant and benign lung nodules, and also assess whether CADx performance changes in detecting nodules associated with early and advanced stage lung cancer. The study involves 243 biopsy-confirmed pulmonary nodules. Among them, 76 are benign, 81 are stage I and 86 are stage III malignant nodules. The cases are separated into three data sets involving: (1) all nodules, (2) benign and stage I malignant nodules, and (3) benign and stage III malignant nodules. A CADx scheme is applied to segment lung nodules depicted on computed tomography images and we initially computed 66 3D image features. Then, three machine learning models namely, a support vector machine, naïve Bayes classifier and linear discriminant analysis, are separately trained and tested by using three data sets and a leave-one-case-out cross-validation method embedded with a Relief-F feature selection algorithm. When separately using three data sets to train and test three classifiers, the average areas under receiver operating characteristic curves (AUC) are 0.94, 0.90 and 0.99, respectively. When using the classifiers trained using data sets with all nodules, average AUC values are 0.88 and 0.99 for detecting early and advanced stage nodules, respectively. AUC values computed from three classifiers trained using the same data set are consistent without statistically significant difference (p  >  0.05). This study demonstrates (1) the feasibility of applying a CADx scheme to accurately distinguish between benign and malignant lung nodules, and (2) a positive trend between CADx performance and cancer progression stage. Thus, in order to increase CADx performance in detecting subtle and early cancer, training data sets should include more diverse early stage cancer cases.

Download full-text PDF

Source
http://dx.doi.org/10.1088/1361-6560/aaa610DOI Listing

Publication Analysis

Top Keywords

data sets
24
lung nodules
16
nodules
12
cadx scheme
12
cadx performance
12
nodules benign
12
benign stage
12
malignant nodules
12
three data
12
computer-aided diagnosis
8

Similar Publications

Analysis of nuclear receptor expression in head and neck cancer.

Cancer Genet

December 2024

Department of Otolaryngology, University of Minnesota, MMC396, 420 Delaware St SE, Minneapolis, MN 55455, USA.

Objective: Studies of squamous cell carcinoma of the head and neck (HNSCC) have demonstrated the importance of nuclear receptors and their associated coregulators in the development and treatment of HNSCC. We sought to characterize members of the nuclear receptor super family through interrogation of RNA-Seq and microarray data.

Materials And Methods: TCGA RNA-Seq data within the cBioportal platform comparing HNSCC samples (n = 515 patients with RNA-Seq data) to normal tissue (n = 82 patients) was interrogated for significant differences in nuclear receptor expression.

View Article and Find Full Text PDF

Plant diseases constantly threaten crops and food systems, while global connectivity further increases the risks of spreading existing and exotic pathogens. Here, we first explore how an integrative approach involving plant pathway knowledgegraphs, differential gene expression data, and biochemical data informing Raman spectroscopy could be used to detect plant pathways responding to pathogen attacks. The Plant Reactome (https://plantreactome.

View Article and Find Full Text PDF

Right-censored models by the expectile method.

Lifetime Data Anal

January 2025

Institut Camille Jordan, UMR 5208, Université Claude Bernard Lyon 1, Bat. Braconnier, 43, blvd du 11 novembre 1918, F - 69622, Villeurbanne Cedex, France.

Based on the expectile loss function and the adaptive LASSO penalty, the paper proposes and studies the estimation methods for the accelerated failure time (AFT) model. In this approach, we need to estimate the survival function of the censoring variable by the Kaplan-Meier estimator. The AFT model parameters are first estimated by the expectile method and afterwards, when the number of explanatory variables can be large, by the adaptive LASSO expectile method which directly carries out the automatic selection of variables.

View Article and Find Full Text PDF

As combination therapy becomes more common in clinical applications, predicting adverse effects of combination medications is a challenging task. However, there are three limitations of the existing prediction models. First, they rely on a single view of the drug and cannot fully utilize multiview information, resulting in limited performance when capturing complex structures.

View Article and Find Full Text PDF

Mind the Gap: A Neural Network Framework for Imputing Genotypes in Non-Model Species.

Mol Ecol Resour

January 2025

Section for Molecular Ecology and Evolution, Globe Institute, University of Copenhagen, Copenhagen, Denmark.

Reduced representation sequencing (RRS) has proven to be a cost-effective solution for sequencing subsets of the genome in non-model species for large-scale studies. However, the targeted nature of RRS approaches commonly introduces large amounts of missing data, leading to reduced statistical power and biased estimates in downstream analyses. Genotype imputation, the statistical inference of missing sites across the genome, is a powerful alternative to overcome the caveats associated with missing sites.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!