A PHP Error was encountered

Severity: Warning

Message: file_get_contents(https://...@pubfacts.com&api_key=b8daa3ad693db53b1410957c26c9a51b4908&a=1): Failed to open stream: HTTP request failed! HTTP/1.1 429 Too Many Requests

Filename: helpers/my_audit_helper.php

Line Number: 176

Backtrace:

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 176
Function: file_get_contents

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 250
Function: simplexml_load_file_from_url

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 3122
Function: getPubMedXML

File: /var/www/html/application/controllers/Detail.php
Line: 575
Function: pubMedSearch_Global

File: /var/www/html/application/controllers/Detail.php
Line: 489
Function: pubMedGetRelatedKeyword

File: /var/www/html/index.php
Line: 316
Function: require_once

An Applied Statistics dataset for human vs AI-generated answer classification. | LitMetric

Due to the increasing popularity of Large Language Models (LLMs) like ChatGPT, students from various fields now commonly rely on AI-powered text generation tools to complete their assignments. This poses a challenge for course instructors who struggle to identify the authenticity of submitted work. Several AI detection tools for differentiating human-generated text from AI-generated text exist for domains like medical and coding, and available generic tools do not perform well on domain-specific tasks. Those AI detection tools depend on LLM, and to train the LLM, an instruction dataset is needed that helps the LLM to learn the differences between patterns of human-generated text and AI-generated text. To help with the creation of a tool for Applied Statistics, we have created a dataset containing 4231 question-and-answer combinations. To create the dataset, first, we collected 116 questions covering a wide range of topics from Applied Statistics selected by domain experts. Second, we created a framework to randomly distribute and collect answers to the questions from students. Third, we collected answers to fifty assigned questions from each of the 100 students participating in the work. Fourth, we generated an equal number of AI-generated answers using ChatGPT. The prepared dataset will be useful for creating AI-detector tools for the Applied Statistics domain as well as benchmarking AI-detector tools, and the proposed data preparation framework will be useful for collecting data for other domains.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11220852PMC
http://dx.doi.org/10.1016/j.dib.2024.110240DOI Listing

Publication Analysis

Top Keywords

applied statistics
16
detection tools
8
human-generated text
8
text ai-generated
8
ai-generated text
8
ai-detector tools
8
tools
6
dataset
5
text
5
applied
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!