Assessing the reproducibility of discriminant function analyses.

PeerJ

Biodiversity Research Centre, University of British Columbia, Vancouver, BC , Canada ; Molecular Ecology Editorial Office , Vancouver, BC , Canada.

Published: August 2015

AI Article Synopsis

  • Data integrity is crucial for empirical research, but many published datasets are either unavailable, incorrect, or poorly curated, hindering validation and exploration of new ideas.
  • A study attempted to reproduce Discriminant Function Analyses (DFAs) from organismal biology, assessing reproducibility among 100 initially surveyed papers, with 86 datasets ultimately analyzed.
  • The results showed that while 65% of the datasets produced consistent summary statistics, significant issues with data labeling and completeness affected reproducibility, indicating that many studies still face problems.

Article Abstract

Data are the foundation of empirical research, yet all too often the datasets underlying published papers are unavailable, incorrect, or poorly curated. This is a serious issue, because future researchers are then unable to validate published results or reuse data to explore new ideas and hypotheses. Even if data files are securely stored and accessible, they must also be accompanied by accurate labels and identifiers. To assess how often problems with metadata or data curation affect the reproducibility of published results, we attempted to reproduce Discriminant Function Analyses (DFAs) from the field of organismal biology. DFA is a commonly used statistical analysis that has changed little since its inception almost eight decades ago, and therefore provides an opportunity to test reproducibility among datasets of varying ages. Out of 100 papers we initially surveyed, fourteen were excluded because they did not present the common types of quantitative result from their DFA or gave insufficient details of their DFA. Of the remaining 86 datasets, there were 15 cases for which we were unable to confidently relate the dataset we received to the one used in the published analysis. The reasons ranged from incomprehensible or absent variable labels, the DFA being performed on an unspecified subset of the data, or the dataset we received being incomplete. We focused on reproducing three common summary statistics from DFAs: the percent variance explained, the percentage correctly assigned and the largest discriminant function coefficient. The reproducibility of the first two was fairly high (20 of 26, and 44 of 60 datasets, respectively), whereas our success rate with the discriminant function coefficients was lower (15 of 26 datasets). When considering all three summary statistics, we were able to completely reproduce 46 (65%) of 71 datasets. While our results show that a majority of studies are reproducible, they highlight the fact that many studies still are not the carefully curated research that the scientific community and public expects.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4540019PMC
http://dx.doi.org/10.7717/peerj.1137DOI Listing

Publication Analysis

Top Keywords

discriminant function
16
function analyses
8
dataset received
8
summary statistics
8
datasets
6
data
5
assessing reproducibility
4
discriminant
4
reproducibility discriminant
4
function
4

Similar Publications

Mold defects pose a significant risk to the preservation of valuable fine art paintings, typically arising from fungal growth in humid environments. This paper presents a novel approach for detecting and categorizing mold defects in fine art paintings. The technique leverages a feature extraction method called Derivative Level Thresholding to pinpoint suspicious regions within an image.

View Article and Find Full Text PDF

Objective: To explore independent risk factors and to establish a predictive model for postoperative urinary retention (POUR) following transabdominal preperitoneal inguinal hernia repair (TAPP).

Methods: Between January 2017 and December 2023, 598 patients with inguinal hernia who underwent TAPP at the General Surgery Department of Zunyi Medical University Affiliated Liupanshui Hospital were enrolled in the study. Participants were randomly divided into training and validation sets (7:3 ratio).

View Article and Find Full Text PDF

This work deals with the development of a greener RP-HPLC method and chemical pattern recognition for the identification of L. collected from different natural sources and samples traded as '' in Indian herbal drug markets. The simultaneous quantification of α- and β-asarone was performed using 0.

View Article and Find Full Text PDF

Building on emerging literature, a new self-report inventory was developed to assess multiple psychological attributes relevant to adaptability in remote warriors. Literature search backed by surveys of military and psychological experts identified 32 attributes for self-report scale development. Items were sorted reliably into targeted dimensions (67.

View Article and Find Full Text PDF

Development and Validation of the Scale of Hypoglycemia Self-Care Behavior in Type 2 Diabetes.

Nurs Res

January 2025

Kaohsiung Medical University, College School of Medicine, Division of Endocrinology and Metabolism, Department of Internal Medicine, Kaohsiung, Taiwan.

Background: Inappropriate dietary, exercise, and medication self-care behaviors among persons with diabetes can easily trigger hypoglycemia. Clinically, it is necessary to quickly identify high-risk groups for hypoglycemic events to provide targeted hypoglycemia education. However, there is currently a lack of precise tools to assess self-care behaviors related to hypoglycemia.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!