A PHP Error was encountered

Severity: Warning

Message: file_get_contents(https://...@pubfacts.com&api_key=b8daa3ad693db53b1410957c26c9a51b4908&a=1): Failed to open stream: HTTP request failed! HTTP/1.1 429 Too Many Requests

Filename: helpers/my_audit_helper.php

Line Number: 176

Backtrace:

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 176
Function: file_get_contents

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 250
Function: simplexml_load_file_from_url

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 3122
Function: getPubMedXML

File: /var/www/html/application/controllers/Detail.php
Line: 575
Function: pubMedSearch_Global

File: /var/www/html/application/controllers/Detail.php
Line: 489
Function: pubMedGetRelatedKeyword

File: /var/www/html/index.php
Line: 316
Function: require_once

Machine learning to detect invalid text responses: Validation and comparison to existing detection methods. | LitMetric

Machine learning to detect invalid text responses: Validation and comparison to existing detection methods.

Behav Res Methods

Department of Psychology, University of Waterloo, Psychology, Anthropology, and Sociology (PAS) Building, 200 University Avenue West, Waterloo, ON, N2L 3G1, Canada.

Published: December 2022

AI Article Synopsis

  • The study focuses on improving the detection and removal of invalid text data, which is essential for accurate text analysis in fields like autobiographical memory.
  • Previous methods for identifying invalid texts have been under-validated, leading to the development of a supervised machine learning approach that mimics human accuracy in coding texts.
  • The results show that this new model not only matches human performance but also surpasses traditional data quality indicators, and the researchers provide open access to their code and methods for better data quality.

Article Abstract

A crucial step in analysing text data is the detection and removal of invalid texts (e.g., texts with meaningless or irrelevant content). To date, research topics that rely heavily on analysis of text data, such as autobiographical memory, have lacked methods of detecting invalid texts that are both effective and practical. Although researchers have suggested many data quality indicators that might identify invalid responses (e.g., response time, character/word count), few of these methods have been empirically validated with text responses. In the current study, we propose and implement a supervised machine learning approach that can mimic the accuracy of human coding, but without the need to hand-code entire text datasets. Our approach (a) trains, validates, and tests on a subset of texts manually labelled as valid or invalid, (b) calculates performance metrics to help select the best model, and (c) predicts whether unlabelled texts are valid or invalid based on the text alone. Model validation and evaluation using autobiographical memory texts indicated that machine learning accurately detected invalid texts with performance near human coding, significantly outperforming existing data quality indicators. Our openly available code and instructions enable new methods of improving data quality for researchers using text as data.

Download full-text PDF

Source
http://dx.doi.org/10.3758/s13428-022-01801-yDOI Listing

Publication Analysis

Top Keywords

machine learning
12
text data
12
invalid texts
12
data quality
12
text responses
8
autobiographical memory
8
quality indicators
8
human coding
8
valid invalid
8
invalid
7

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!