AI Article Synopsis

  • Single Nucleotide Polymorphisms (SNPs) are crucial for various biological applications and require effective classification methods due to their high dimensionality, as feature selection is essential for efficient analysis.
  • The paper introduces a new method called FIFS (Frequent Item Feature Selection) that identifies the most informative SNPs by selecting frequent and unique genotypes from genomic data in a modular fashion.
  • Tested on a dataset of British pig breeds, FIFS demonstrated superior performance, achieving over 95% assignment accuracy with only 28 selected SNPs, significantly fewer than other methods used in comparison.

Article Abstract

Background And Objective: Single Nucleotide Polymorphism (SNPs) are, nowadays, becoming the marker of choice for biological analyses involving a wide range of applications with great medical, biological, economic and environmental interest. Classification tasks i.e. the assignment of individuals to groups of origin based on their (multi-locus) genotypes, are performed in many fields such as forensic investigations, discrimination between wild and/or farmed populations and others. Τhese tasks, should be performed with a small number of loci, for computational as well as biological reasons. Thus, feature selection should precede classification tasks, especially for Single Nucleotide Polymorphism (SNP) datasets, where the number of features can amount to hundreds of thousands or millions.

Methods: In this paper, we present a novel data mining approach, called FIFS - Frequent Item Feature Selection, based on the use of frequent items for selection of the most informative markers from population genomic data. It is a modular method, consisting of two main components. The first one identifies the most frequent and unique genotypes for each sampled population. The second one selects the most appropriate among them, in order to create the informative SNP subsets to be returned.

Results: The proposed method (FIFS) was tested on a real dataset, which comprised of a comprehensive coverage of pig breed types present in Britain. This dataset consisted of 446 individuals divided in 14 sub-populations, genotyped at 59,436 SNPs. Our method outperforms the state-of-the-art and baseline methods in every case. More specifically, our method surpassed the assignment accuracy threshold of 95% needing only half the number of SNPs selected by other methods (FIFS: 28 SNPs, Delta: 70 SNPs Pairwise FST: 70 SNPs, In: 100 SNPs.) CONCLUSION: Our approach successfully deals with the problem of informative marker selection in high dimensional genomic datasets. It offers better results compared to existing approaches and can aid biologists in selecting the most informative markers with maximum discrimination power for optimization of cost-effective panels with applications related to e.g. species identification, wildlife management, and forensics.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.compbiomed.2017.09.020DOI Listing

Publication Analysis

Top Keywords

data mining
8
informative marker
8
marker selection
8
selection high
8
high dimensional
8
population genomic
8
genomic data
8
single nucleotide
8
nucleotide polymorphism
8
classification tasks
8

Similar Publications

Background: Fulminant type 1 diabetes mellitus (FT1DM) is a severe subtype of type 1 diabetes characterized by rapid onset, metabolic disturbances, and irreversible insulin secretion failure. Recent studies have suggested associations between FT1DM and certain medications, warranting further investigation.

Objectives: This study aims to analyze drugs associated with an increased risk of FT1DM using the Food and Drug Administration Adverse Event Reporting System (FAERS) database.

View Article and Find Full Text PDF

Anomalous Node Detection in Blockchain Networks Based on Graph Neural Networks.

Sensors (Basel)

December 2024

School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China.

With the rapid development of blockchain technology, fraudulent activities have significantly increased, posing a major threat to the personal assets of blockchain users. The blockchain transaction network formed during user transactions can be represented as a graph consisting of nodes and edges, making it suitable for a graph data structure. Fraudulent nodes in the transaction network are referred to as anomalous nodes.

View Article and Find Full Text PDF

Melatonin is a hormone released by the pineal gland that regulates the sleep-wake cycle. It has been widely studied for its therapeutic effects on Alzheimer's disease (AD), particularly through the amyloidosis, oxidative stress, and neuroinflammation pathways. Nevertheless, the mechanisms through which it exerts its neuroprotective effects in AD are still largely unknown.

View Article and Find Full Text PDF

A Time Series Proposal Model to Define the Speed of Carbon Steel Corrosion in an Extreme Acid Environment.

Materials (Basel)

December 2024

Sustainable Mining Engineering Research Group, Department of Mining, Mechanic, Energetic and Construction Engineering, Higher Technical School of Engineering, University of Huelva, 21007 Huelva, Spain.

This article shows the behavior of the corrosive effect of acid mine water on carbon steel metal alloys. Mining equipment, composed of various steel alloys, is particularly prone to damage from highly acidic water. This corrosion results in material thinning, brittle fractures, fatigue cracks, and ultimately, equipment failure.

View Article and Find Full Text PDF

: The prevalence of diabetes is increasing worldwide, particularly in the Pacific Ocean island nations. Although machine learning (ML) models and data mining approaches have been applied to diabetes research, there was no study utilizing ML models to predict diabetes incidence in Taiwan. We aimed to predict the onset of diabetes in order to raise health awareness, thereby promoting any necessary lifestyle modifications and help mitigate disease burden.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!