AI Article Synopsis

Article Abstract

Background: Data preprocessing is a major step in data mining. In data preprocessing, several known techniques can be applied, or new ones developed, to improve data quality such that the mining results become more accurate and intelligible. Bioinformatics is one area with a high demand for generation of comprehensive models from large datasets. In this article, we propose a context-based data preprocessing approach to mine data from molecular docking simulation results. The test cases used a fully-flexible receptor (FFR) model of Mycobacterium tuberculosis InhA enzyme (FFR_InhA) and four different ligands.

Results: We generated an initial set of attributes as well as their respective instances. To improve this initial set, we applied two selection strategies. The first was based on our context-based approach while the second used the CFS (Correlation-based Feature Selection) machine learning algorithm. Additionally, we produced an extra dataset containing features selected by combining our context strategy and the CFS algorithm. To demonstrate the effectiveness of the proposed method, we evaluated its performance based on various predictive (RMSE, MAE, Correlation, and Nodes) and context (Precision, Recall and FScore) measures.

Conclusions: Statistical analysis of the results shows that the proposed context-based data preprocessing approach significantly improves predictive and context measures and outperforms the CFS algorithm. Context-based data preprocessing improves mining results by producing superior interpretable models, which makes it well-suited for practical applications in molecular docking simulations using FFR models.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3909228PMC
http://dx.doi.org/10.1186/1471-2164-14-S6-S6DOI Listing

Publication Analysis

Top Keywords

data preprocessing
20
molecular docking
12
context-based data
12
data
9
preprocessing approach
8
initial set
8
cfs algorithm
8
context-based
5
preprocessing
5
context-based preprocessing
4

Similar Publications

Plastic waste management is one of the key issues in global environmental protection. Integrating spectroscopy acquisition devices with deep learning algorithms has emerged as an effective method for rapid plastic classification. However, the challenges in collecting plastic samples and spectroscopy data have resulted in a limited number of data samples and an incomplete comparison of relevant classification algorithms.

View Article and Find Full Text PDF

White Matter Fiber Bundle Alterations Correlate with Gait and Cognitive Impairments in Parkinson's Disease based on HARDI Data.

Curr Med Imaging

January 2025

Department of Radiology, Beijing Friendship Hospital, Capital Medical University, No. 95, Yong An Road, Xicheng District, Beijing 100050, China.

Background: The neuroanatomical basis of white matter fiber tracts in gait impairments in individuals suffering from Parkinson's Disease (PD) is unclear.

Methods: Twenty-four individuals living with PD and 29 Healthy Controls (HCs) were included. For each participant, two-shell High Angular Resolution Diffusion Imaging (HARDI) and high-resolution 3D structural images were acquired using the 3T MRI.

View Article and Find Full Text PDF

Purpose: The incidence of cancer, which is a serious public health concern, is increasing. A predictive analysis driven by machine learning was integrated with haematology parameters to create a method for the simultaneous diagnosis of several malignancies at different stages.

Patients And Methods: We analysed a newly collected dataset from various hospitals in Jordan comprising 19,537 laboratory reports (6,280 cancer and 13,257 noncancer cases).

View Article and Find Full Text PDF

Psoriasis severity assessment: Optimizing diagnostic models with deep learning.

Narra J

December 2024

Department of Pharmacy, Faculty of Mathematics and Natural Sciences, Universitas Syiah Kuala, Banda Aceh, Indonesia.

Psoriasis is a chronic skin condition with challenges in the accurate assessment of its severity due to subtle differences between severity levels. The aim of this study was to evaluate deep learning models for automated classification of psoriasis severity. A dataset containing 1,546 clinical images was subjected to pre-processing techniques, including cropping and applying noise reduction through median filtering.

View Article and Find Full Text PDF

Modern language models such as bidirectional encoder representations from transformers have revolutionized natural language processing (NLP) tasks but are computationally intensive, limiting their deployment on edge devices. This paper presents an energy-efficient accelerator design tailored for encoder-based language models, enabling their integration into mobile and edge computing environments. A data-flow-aware hardware accelerator design for language models inspired by Simba, makes use of approximate fixed-point POSIT-based multipliers and uses high bandwidth memory (HBM) in achieving significant improvements in computational efficiency, power consumption, area and latency compared to the hardware-realized scalable accelerator Simba.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!