Background: Data preprocessing is a major step in data mining. In data preprocessing, several known techniques can be applied, or new ones developed, to improve data quality such that the mining results become more accurate and intelligible. Bioinformatics is one area with a high demand for generation of comprehensive models from large datasets. In this article, we propose a context-based data preprocessing approach to mine data from molecular docking simulation results. The test cases used a fully-flexible receptor (FFR) model of Mycobacterium tuberculosis InhA enzyme (FFR_InhA) and four different ligands.
Results: We generated an initial set of attributes as well as their respective instances. To improve this initial set, we applied two selection strategies. The first was based on our context-based approach while the second used the CFS (Correlation-based Feature Selection) machine learning algorithm. Additionally, we produced an extra dataset containing features selected by combining our context strategy and the CFS algorithm. To demonstrate the effectiveness of the proposed method, we evaluated its performance based on various predictive (RMSE, MAE, Correlation, and Nodes) and context (Precision, Recall and FScore) measures.
Conclusions: Statistical analysis of the results shows that the proposed context-based data preprocessing approach significantly improves predictive and context measures and outperforms the CFS algorithm. Context-based data preprocessing improves mining results by producing superior interpretable models, which makes it well-suited for practical applications in molecular docking simulations using FFR models.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3909228 | PMC |
http://dx.doi.org/10.1186/1471-2164-14-S6-S6 | DOI Listing |
Anal Methods
January 2025
Jiangsu Beier Machinery Co. Ltd, Jiangsu, 215600, China.
Plastic waste management is one of the key issues in global environmental protection. Integrating spectroscopy acquisition devices with deep learning algorithms has emerged as an effective method for rapid plastic classification. However, the challenges in collecting plastic samples and spectroscopy data have resulted in a limited number of data samples and an incomplete comparison of relevant classification algorithms.
View Article and Find Full Text PDFCurr Med Imaging
January 2025
Department of Radiology, Beijing Friendship Hospital, Capital Medical University, No. 95, Yong An Road, Xicheng District, Beijing 100050, China.
Background: The neuroanatomical basis of white matter fiber tracts in gait impairments in individuals suffering from Parkinson's Disease (PD) is unclear.
Methods: Twenty-four individuals living with PD and 29 Healthy Controls (HCs) were included. For each participant, two-shell High Angular Resolution Diffusion Imaging (HARDI) and high-resolution 3D structural images were acquired using the 3T MRI.
Adv Appl Bioinform Chem
January 2025
Department of Information Technology, Mutah University, Al-Karak, Jordan.
Purpose: The incidence of cancer, which is a serious public health concern, is increasing. A predictive analysis driven by machine learning was integrated with haematology parameters to create a method for the simultaneous diagnosis of several malignancies at different stages.
Patients And Methods: We analysed a newly collected dataset from various hospitals in Jordan comprising 19,537 laboratory reports (6,280 cancer and 13,257 noncancer cases).
Narra J
December 2024
Department of Pharmacy, Faculty of Mathematics and Natural Sciences, Universitas Syiah Kuala, Banda Aceh, Indonesia.
Psoriasis is a chronic skin condition with challenges in the accurate assessment of its severity due to subtle differences between severity levels. The aim of this study was to evaluate deep learning models for automated classification of psoriasis severity. A dataset containing 1,546 clinical images was subjected to pre-processing techniques, including cropping and applying noise reduction through median filtering.
View Article and Find Full Text PDFPhilos Trans A Math Phys Eng Sci
January 2025
Indian Institute of Technology Gandhinagar, Gandhinagar, Gujarat, India.
Modern language models such as bidirectional encoder representations from transformers have revolutionized natural language processing (NLP) tasks but are computationally intensive, limiting their deployment on edge devices. This paper presents an energy-efficient accelerator design tailored for encoder-based language models, enabling their integration into mobile and edge computing environments. A data-flow-aware hardware accelerator design for language models inspired by Simba, makes use of approximate fixed-point POSIT-based multipliers and uses high bandwidth memory (HBM) in achieving significant improvements in computational efficiency, power consumption, area and latency compared to the hardware-realized scalable accelerator Simba.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!