Machine learning misclassification networks reveal a citation advantage of interdisciplinary publications only in high-impact journals.

Alexey Lyutov Yilmaz Uygun Marc-Thorsten Hütt

Sci Rep

School of Science, Constructor University, 28759, Bremen, Germany.

Published: September 2024

Training a statistical model for classification in machine learning is easy with enough precise data, but fuzzy categories can reveal insights about the underlying issues in the classification process.
The study classifies academic publications using only their abstracts, identifying misclassifications by comparing machine learning-generated categories to journal categories, which creates a network that illustrates relationships among disciplines.
Analysis of this misclassification network shows how disciplines interact and suggests that misclassified articles are linked to increased interdisciplinarity, leading to higher citation rates in top journals but lower rates in others.

Given a large enough volume of data and precise, meaningful categories, training a statistical model to solve a classification problem is straightforward and has become a standard application of machine learning (ML). If the categories are not precise, but rather fuzzy, as in the case of scientific disciplines, the systematic failures of ML classification can be informative about properties of the underlying categories. Here we classify a large volume of academic publications using only the abstract as information. From the publications that are classified differently by journal categories and ML categories (i.e., misclassified publications, when using the journal assignment as ground truth) we construct a network among disciplines. Analysis of these misclassifications provides insight in two topics at the core of the science of science: (1) Mapping out the interplay of disciplines. We show that this misclassification network is informative about the interplay of academic disciplines and it is similar to, but distinct from, a citation-based map of science, where nodes are scientific disciplines and an edge indicates a strong co-citation count between publications in these disciplines. (2) Analyzing the success of interdisciplinarity. By evaluating the citation patterns of publications, we show that misclassification can be linked to interdisciplinarity and, furthermore, that misclassified articles have different citation frequencies than correctly classified articles: In the highest 10 percent of journals in each discipline, these misclassified articles are on average cited more frequently, while in the rest of the journals they are cited less frequently.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11412973	PMC
http://dx.doi.org/10.1038/s41598-024-72364-5	DOI Listing

Publication Analysis

Top Keywords

machine learning

large volume

scientific disciplines

misclassified articles

cited frequently

publications

disciplines

Similar Publications

Microfluidic and Computational Tools for Neurodegeneration Studies.

Annu Rev Chem Biomol Eng

January 2025

1Department of Chemical & Biomolecular Engineering, North Carolina State University, Raleigh, North Carolina, USA; email:

Kin Gomez Victoria R Yarmey Hrishikesh Mane Adriana San-Miguel

Understanding the molecular, cellular, and physiological components of neurodegenerative diseases (NDs) is paramount for developing accurate diagnostics and efficacious therapies. However, the complexity of ND pathology and the limitations associated with conventional analytical methods undermine research. Fortunately, microfluidic technology can facilitate discoveries through improved biomarker quantification, brain organoid culture, and small animal model manipulation.

View Article and Find Full Text PDF

Similar Publications

A Novel Preoperative Scoring System to Accurately Predict Cord-Level Intraoperative Neuromonitoring Data Loss During Spinal Deformity Surgery: A Machine-Learning Approach.

J Bone Joint Surg Am

November 2024

Department of Orthopedic Surgery, Columbia University Irving Medical Center, New York, NY.

Nathan J Lee Lawrence G Lenke Varun Arvind Ted Shi Alexandra C Dionne

Background: An accurate knowledge of a patient's risk of cord-level intraoperative neuromonitoring (IONM) data loss is important for an informed decision-making process prior to deformity correction, but no prediction tool currently exists.

Methods: A total of 1,106 patients with spinal deformity and 205 perioperative variables were included. A stepwise machine-learning (ML) approach using random forest (RF) analysis and multivariable logistic regression was performed.

View Article and Find Full Text PDF

Similar Publications

Correction to: Circulating miRNAs and Machine Learning for Lateralizing Primary Aldosteronism.

Hypertension

February 2025

View Article and Find Full Text PDF

Similar Publications

Improving early prediction of crop yield in Spanish olive groves using satellite imagery and machine learning.

PLoS One

January 2025

Department of Computer Science, University of Jaén, Jaén, Spain.

M Isabel Ramos Juan J Cubillas Ruth M Córdoba Lidia M Ortega

In the production sector, the usefulness of predictive systems as a tool for management and decision-making is well known. In the agricultural sector, a correct economic balance of the farm depends on making the right decisions. For this purpose, having information in advance on crop yields is an extraordinary help.

View Article and Find Full Text PDF

Similar Publications

A novel multi-user collaborative cognitive radio spectrum sensing model: Based on a CNN-LSTM model.

PLoS One

January 2025

School of Electronic Information Engineering, Inner Mongolia University, Hohhot, Inner Mongolia, China.

Kai Wang Yangyang Chen Dan Bo Shubin Wang

Cognitive Radio (CR) technology enables wireless devices to learn about their surrounding spectrum environment through sensing capabilities, thereby facilitating efficient spectrum utilization without interfering with the normal operation of licensed users. This study aims to enhance spectrum sensing in multi-user cooperative cognitive radio systems by leveraging a hybrid model that combines Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks. A novel multi-user cooperative spectrum sensing model is developed, utilizing CNN's local feature extraction capability and LSTM's advantage in handling sequential data to optimize sensing accuracy and efficiency.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!

Article Abstract

Download full-text PDF

Publication Analysis

Top Keywords

Similar Publications

Microfluidic and Computational Tools for Neurodegeneration Studies.

A Novel Preoperative Scoring System to Accurately Predict Cord-Level Intraoperative Neuromonitoring Data Loss During Spinal Deformity Surgery: A Machine-Learning Approach.

Correction to: Circulating miRNAs and Machine Learning for Lateralizing Primary Aldosteronism.

Improving early prediction of crop yield in Spanish olive groves using satellite imagery and machine learning.

A novel multi-user collaborative cognitive radio spectrum sensing model: Based on a CNN-LSTM model.

Machine learning misclassification networks reveal a citation advantage of interdisciplinary publications only in high-impact journals.

AI Article Synopsis

Want AI Summaries of new PubMed Abstracts delivered to your In-box?