Background: Using visual, biological, and electronic health records data as the sole input source, pretrained convolutional neural networks and conventional machine learning methods have been heavily employed for the identification of various malignancies. Initially, a series of preprocessing steps and image segmentation steps are performed to extract region of interest features from noisy features. Then, the extracted features are applied to several machine learning and deep learning methods for the detection of cancer.

Methods: In this work, a review of all the methods that have been applied to develop machine learning algorithms that detect cancer is provided. With more than 100 types of cancer, this study only examines research on the four most common and prevalent cancers worldwide: lung, breast, prostate, and colorectal cancer. Next, by using state-of-the-art sentence transformers namely: SBERT (2019) and the unsupervised SimCSE (2021), this study proposes a new methodology for detecting cancer. This method requires raw DNA sequences of matched tumor/normal pair as the only input. The learnt DNA representations retrieved from SBERT and SimCSE will then be sent to machine learning algorithms (XGBoost, Random Forest, LightGBM, and CNNs) for classification. As far as we are aware, SBERT and SimCSE transformers have not been applied to represent DNA sequences in cancer detection settings.

Results: The XGBoost model, which had the highest overall accuracy of 73 ± 0.13 % using SBERT embeddings and 75 ± 0.12 % using SimCSE embeddings, was the best performing classifier. In light of these findings, it can be concluded that incorporating sentence representations from SimCSE's sentence transformer only marginally improved the performance of machine learning models.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10037872PMC
http://dx.doi.org/10.1186/s12859-023-05235-xDOI Listing

Publication Analysis

Top Keywords

machine learning
24
sbert simcse
12
cancer detection
8
learning methods
8
learning algorithms
8
dna sequences
8
learning
7
cancer
6
machine
6
sbert
5

Similar Publications

Background: Kidney tumors, common in the urinary system, have widely varying survival rates post-surgery. Current prognostic methods rely on invasive biopsies, highlighting the need for non-invasive, accurate prediction models to assist in clinical decision-making.

Purpose: This study aimed to construct a K-means clustering algorithm enhanced by Transformer-based feature transformation to predict the overall survival rate of patients after kidney tumor resection and provide an interpretability analysis of the model to assist in clinical decision-making.

View Article and Find Full Text PDF

Rib pathology is uniquely difficult and time-consuming for radiologists to diagnose. AI can reduce radiologist workload and serve as a tool to improve accurate diagnosis. To date, no reviews have been performed synthesizing identification of rib fracture data on AI and its diagnostic performance on X-ray and CT scans of rib fractures and its comparison to physicians.

View Article and Find Full Text PDF

Effect of terahertz radiation on cells and cellular structures.

Front Optoelectron

January 2025

Institute of Physics, Saratov State University, Saratov, 410012, Russia.

The paper presents the results of modern research on the effects of electromagnetic terahertz radiation in the frequency range 0.5-100 THz at different levels of power density and exposure time on the viability of normal and cancer cells. As an accompanying tool for monitoring the effect of radiation on biological cells and tissues, spectroscopic research methods in the terahertz frequency range are described, and attention is focused on the possibility of using the spectra of interstitial water as a marker of pathological processes.

View Article and Find Full Text PDF

Cognitive resilience (CR) describes the phenomenon of individuals evading cognitive decline despite prominent Alzheimer's disease neuropathology. Operationalization and measurement of this latent construct is non-trivial as it cannot be directly observed. The residual approach has been widely applied to estimate CR, where the degree of resilience is estimated through a linear model's residuals.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!