Commonsense reasoning has emerged as a challenging problem in Artificial Intelligence (AI). However, one area of commonsense reasoning that has not received nearly as much attention in the AI research community is , which focuses on determining the likelihood of commonsense statements. Human-annotated benchmarks are essential for advancing research in this nascent area, as they enable researchers to develop and evaluate AI models effectively. Because plausibility is a subjective concept, it is important to obtain nuanced annotations, rather than a binary label of 'plausible' or 'implausible'. Furthermore, it is also important to obtain multiple human annotations for a given statement, to ensure validity of the labels. In this data article, we describe the process of re-annotating an existing commonsense plausibility dataset (SemEval-2020 Task 4) using large-scale crowdsourcing on the Amazon Mechanical Turk platform. We obtain 10,000 unique annotations on a corpus of 2000 sentences (five independent annotations per sentence). Based on these labels, each was labelled as . Next, we prompted the GPT-3.5 and GPT-4 models developed by OpenAI. Sentences from the human-annotated files were fed into the models using custom prompt templates, and the models' generated labels were used to determine if they were aligned with those output by humans. The PMC-Dataset is meant to serve as a rich resource for analysing and comparing human and machine commonsense reasoning capabilities, specifically on plausibility. Researchers can utilise this dataset to train, fine-tune, and evaluate AI models on plausibility. Applications include: determining the likelihood of everyday events, assessing the realism of hypothetical scenarios, and distinguishing between plausible and implausible statements in commonsense text. Ultimately, we intend for the dataset to support ongoing AI research by offering a robust foundation for developing models that are better aligned with human commonsense reasoning.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11408755 | PMC |
http://dx.doi.org/10.1016/j.dib.2024.110869 | DOI Listing |
Neural Netw
January 2025
School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510006, China; Key Laboratory of Machine Intelligence and Advanced Computing (SYSU), Ministry of Education, Guangzhou 510006, China. Electronic address:
Language model (LM) has played an increasingly important role in the common-sense understanding and reasoning in the CSQA task (Common Sense Question Answering). However, due to the amount of model parameters, increasing training data helps little in further improving model performance. Introducing external knowledge through graph neural networks (GNNs) proves positive in boosting performance, but exploiting different knowledge sources and capturing contextual information between text and knowledge inside remains a challenge.
View Article and Find Full Text PDFData Brief
December 2024
University of Southern California, 4676 Admiralty Way, Suite 1001 Marina del Rey, CA 90292, USA.
Commonsense reasoning has emerged as a challenging problem in Artificial Intelligence (AI). However, one area of commonsense reasoning that has not received nearly as much attention in the AI research community is , which focuses on determining the likelihood of commonsense statements. Human-annotated benchmarks are essential for advancing research in this nascent area, as they enable researchers to develop and evaluate AI models effectively.
View Article and Find Full Text PDFScand J Prim Health Care
December 2024
Postdoctoral Researcher at Department of Interdisciplinary health research, Faculty of Medicine and Associate Professor at Department of Criminology and Sociology of Law, Faculty of Law, University of Oslo, Oslo, Norway.
Background: According to the UN Committee Against Torture, all state parties to the Torture Convention have a responsibility to meet the rehabilitation needs of torture victims who have sought asylum within their borders. General practitioners (GPs) can play a crucial role in identifying torture victims and securing rehabilitation when needed. There is a pressing knowledge gap on the knowledge and practices of GPs vis-à-vis potentially tortured patients, and an urgent need for research that investigates GPs' practices of identification, referral, and rehabilitation - in Norway and beyond.
View Article and Find Full Text PDFConscious Cogn
October 2024
Philosophy Department, College of Charleston, Charleston, SC.
Some research suggests that moral behavior can be strongly influenced by trivial features of the environment of which we are completely unaware. Philosophers, psychologists, and neuroscientists have argued that these findings undermine our commonsense notions of agency and responsibility, both of which emphasize the role of practical reasoning and conscious deliberation in action. We present the results of four vignette-based studies (N=1,437) designed to investigate how people think about the metaphysical and moral implications of scientific findings that reveal our susceptibility to automaticity and situational influences.
View Article and Find Full Text PDFSensors (Basel)
June 2024
College of Automotive Engineering, Jilin University, Changchun 130025, China.
Human-level driving is the ultimate goal of autonomous driving. As the top-level decision-making aspect of autonomous driving, behavior decision establishes short-term driving behavior strategies by evaluating road structures, adhering to traffic rules, and analyzing the intentions of other traffic participants. Existing behavior decisions are primarily implemented based on rule-based methods, exhibiting insufficient generalization capabilities when faced with new and unseen driving scenarios.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!