ThoughtSource: A central hub for large language model reasoning data.

Simon Ott Konstantin Hebenstreit Valentin Liévin Christoffer Egeberg Hother Milad Moradi Maximilian Mayrhauser Robert Praas Ole Winther Matthias Samwald

Sci Data

Institute of Artificial Intelligence, Medical University of Vienna, Vienna, Austria.

Published: August 2023

Large language models (LLMs) such as GPT-4 have recently demonstrated impressive results across a wide range of tasks. LLMs are still limited, however, in that they frequently fail at complex reasoning, their reasoning processes are opaque, they are prone to 'hallucinate' facts, and there are concerns about their underlying biases. Letting models verbalize reasoning steps as natural language, a technique known as chain-of-thought prompting, has recently been proposed as a way to address some of these issues. Here we present ThoughtSource, a meta-dataset and software library for chain-of-thought (CoT) reasoning. The goal of ThoughtSource is to improve future artificial intelligence systems by facilitating qualitative understanding of CoTs, enabling empirical evaluations, and providing training data. This first release of ThoughtSource integrates seven scientific/medical, three general-domain and five math word question answering datasets.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10409727	PMC
http://dx.doi.org/10.1038/s41597-023-02433-3	DOI Listing

Publication Analysis

Top Keywords

large language

reasoning

thoughtsource

thoughtsource central

central hub

hub large

language model

model reasoning

reasoning data

data large

Similar Publications

Use of large language models as artificial intelligence tools in academic research and publishing among global clinical researchers.

Sci Rep

December 2024

Department of Dermatology, Niazi Hospital, Lahore, Pakistan.

Tanisha Mishra Edward Sutanto Rini Rossanti Nayana Pant Anum Ashraf

With breakthroughs in Natural Language Processing and Artificial Intelligence (AI), the usage of Large Language Models (LLMs) in academic research has increased tremendously. Models such as Generative Pre-trained Transformer (GPT) are used by researchers in literature review, abstract screening, and manuscript drafting. However, these models also present the attendant challenge of providing ethically questionable scientific information.

View Article and Find Full Text PDF

Similar Publications

Evaluating large language models for criterion-based grading from agreement to consistency.

NPJ Sci Learn

December 2024

Department of Psychology, Jeffrey Cheah School of Medicine and Health Sciences, Monash University Malaysia, Bandar Sunway, 475000, Malaysia.

Da-Wei Zhang Melissa Boey Yan Yu Tan Alexis Hoh Sheng Jia

This study evaluates the ability of large language models (LLMs) to deliver criterion-based grading and examines the impact of prompt engineering with detailed criteria on grading. Using well-established human benchmarks and quantitative analyses, we found that even free LLMs achieve criterion-based grading with a detailed understanding of the criteria, underscoring the importance of domain-specific understanding over model complexity. These findings highlight the potential of LLMs to deliver scalable educational feedback.

View Article and Find Full Text PDF

Similar Publications

Deep neural networks and humans both benefit from compositional language structure.

Nat Commun

December 2024

LEADS group, Max Planck Institute for Psycholinguistics, Nijmegen, Netherlands.

Lukas Galke Yoav Ram Limor Raviv

Deep neural networks drive the success of natural language processing. A fundamental property of language is its compositional structure, allowing humans to systematically produce forms for new meanings. For humans, languages with more compositional and transparent structures are typically easier to learn than those with opaque and irregular structures.

View Article and Find Full Text PDF

Similar Publications

modelRxiv: A Platform for the Dissemination and Interactive Display of Models.

Ecol Lett

January 2025

Department of Ecology, Evolution and Behavior, The Hebrew University of Jerusalem, Jerusalem, Israel.

Keith D Harris Guy Hadari Gili Greenbaum

Modelling the dynamics of biological processes is ubiquitous across the ecological and evolutionary disciplines. However, the increasing complexity of these models poses a challenge to the dissemination of model-derived results. Often only a small subset of model results are made available to the scientific community, with further exploration of the parameter space relying on local deployment of code supplied by the authors.

View Article and Find Full Text PDF

Similar Publications

Piecing together the narrative of #longcovid: an unsupervised deep learning of 1,354,889 X (formerly Twitter) posts from 2020 to 2023.

Front Public Health

December 2024

Saw Swee Hock School of Public Health, National University of Singapore and National University Health System, Singapore, Singapore.

Qin Xiang Ng Liang En Wee Yu Liang Lim Rebecca Hui Shan Ong Clarence Ong

Objective: To characterize the public conversations around long COVID, as expressed through X (formerly Twitter) posts from May 2020 to April 2023.

Methods: Using X as the data source, we extracted tweets containing #long-covid, #long_covid, or "long covid," posted from May 2020 to April 2023. We then conducted an unsupervised deep learning analysis using Bidirectional Encoder Representations from Transformers (BERT).

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!