Financial Fraud Detection and Prediction in Listed Companies Using SMOTE and Machine Learning Algorithms.

Entropy (Basel)

Faculty of Natural, Mathematical and Engineering Sciences, King's College, London WC2R 2LS, UK.

Published: August 2022

This paper proposes a new method that can identify and predict financial fraud among listed companies based on machine learning. We collected 18,060 transactions and 363 indicators of finance, including 362 financial variables and a class variable. Then, we eliminated 9 indicators which were not related to financial fraud and processed the missing values. After that, we extracted 13 indicators from 353 indicators which have a big impact on financial fraud based on multiple feature selection models and the frequency of occurrence of features in all algorithms. Then, we established five single classification models and three ensemble models for the prediction of financial fraud records of listed companies, including LR, RF, XGBOOST, SVM, and DT and ensemble models with a voting classifier. Finally, we chose the optimal single model from five machine learning algorithms and the best ensemble model among all hybrid models. In choosing the model parameter, optimal parameters were selected by using the grid search method and comparing several evaluation metrics of models. The results determined the accuracy of the optimal single model to be in a range from 97% to 99%, and that of the ensemble models as higher than 99%. This shows that the optimal ensemble model performs well and can efficiently predict and detect fraudulent activity of companies. Thus, a hybrid model which combines a logistic regression model with an XGBOOST model is the best among all models. In the future, it will not only be able to predict fraudulent behavior in company management but also reduce the burden of doing so.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9407419PMC
http://dx.doi.org/10.3390/e24081157DOI Listing

Publication Analysis

Top Keywords

financial fraud
20
listed companies
12
machine learning
12
ensemble models
12
learning algorithms
8
models
8
optimal single
8
model
8
single model
8
ensemble model
8

Similar Publications

United States and European Union laws demand separate clinical studies in children as a condition for drugs' marketing approval. Justified by carefully framed pseudo-scientific wordings, more so the European Medicines Agency than the United States Food and Drug Administration, "Pediatric Drug Development" is probably the largest abuse in medical research in history. Preterm newborns are immature and vulnerable, but they grow.

View Article and Find Full Text PDF

Explainable unsupervised anomaly detection for healthcare insurance data.

BMC Med Inform Decis Mak

January 2025

Department of Electrical Engineering, ESAT-STADIUS, KU Leuven, Kasteelpark Arenberg 10, B-3001 Leuven, Belgium.

Background: Waste and fraud are important problems for health insurers to deal with. With the advent of big data, these insurers are looking more and more towards data mining and machine learning methods to help in detecting waste and fraud. However, labeled data is costly and difficult to acquire as it requires expert investigators and known care providers with atypical behavior.

View Article and Find Full Text PDF

Geographical origins of Angelica sinensis using functional compounds and multielement with machine learning-based fusion approaches.

Food Chem

January 2025

State Key Laboratory for Quality Ensurance and Sustainable Use of Dao-di Herbs, Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing 100700, PR China; Key Laboratory of Biology and Cultivation of Herb Medicine, Ministry of Agriculture and Rural Affairs, Beijing 100700, PR China. Electronic address:

Ensuring food traceability is essential for maintaining safety and authenticity. Angelica sinensis (Oliv.) Diels (AS), a medicinal food prized for its rich nutritional value and tonic effects, is frequently vulnerable to geographic origin fraud.

View Article and Find Full Text PDF

Data-driven pipeline modeling for predicting unknown protein adulteration in dairy products.

Food Chem

December 2024

Institute of Food Science and Technology, Chinese Academy of Agricultural Sciences (CAAS), Beijing 100193, PR China. Electronic address:

To preemptively predict unknown protein adulterants in food and curb the incidence of food fraud at its origin, data-driven models were developed using three machine learning (ML) algorithms. Among these, the random forest (RF)-based model achieved optimal performance, achieving accuracies of 96.2 %, 95.

View Article and Find Full Text PDF

Privacy-preserving record linkage (PPRL) technology, crucial for linking records across datasets while maintaining privacy, is susceptible to graph-based re-identification attacks. These attacks compromise privacy and pose significant risks, such as identity theft and financial fraud. This study proposes a zero-relationship encoding scheme that minimizes the linkage between source and encoded records to enhance PPRL systems' resistance to re-identification attacks.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!