Large language models (LLMs) such as GPT-4 have recently demonstrated impressive results across a wide range of tasks. LLMs are still limited, however, in that they frequently fail at complex reasoning, their reasoning processes are opaque, they are prone to 'hallucinate' facts, and there are concerns about their underlying biases. Letting models verbalize reasoning steps as natural language, a technique known as chain-of-thought prompting, has recently been proposed as a way to address some of these issues. Here we present ThoughtSource, a meta-dataset and software library for chain-of-thought (CoT) reasoning. The goal of ThoughtSource is to improve future artificial intelligence systems by facilitating qualitative understanding of CoTs, enabling empirical evaluations, and providing training data. This first release of ThoughtSource integrates seven scientific/medical, three general-domain and five math word question answering datasets.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10409727PMC
http://dx.doi.org/10.1038/s41597-023-02433-3DOI Listing

Publication Analysis

Top Keywords

large language
8
reasoning
5
thoughtsource
4
thoughtsource central
4
central hub
4
hub large
4
language model
4
model reasoning
4
reasoning data
4
data large
4

Similar Publications

With breakthroughs in Natural Language Processing and Artificial Intelligence (AI), the usage of Large Language Models (LLMs) in academic research has increased tremendously. Models such as Generative Pre-trained Transformer (GPT) are used by researchers in literature review, abstract screening, and manuscript drafting. However, these models also present the attendant challenge of providing ethically questionable scientific information.

View Article and Find Full Text PDF

Evaluating large language models for criterion-based grading from agreement to consistency.

NPJ Sci Learn

December 2024

Department of Psychology, Jeffrey Cheah School of Medicine and Health Sciences, Monash University Malaysia, Bandar Sunway, 475000, Malaysia.

This study evaluates the ability of large language models (LLMs) to deliver criterion-based grading and examines the impact of prompt engineering with detailed criteria on grading. Using well-established human benchmarks and quantitative analyses, we found that even free LLMs achieve criterion-based grading with a detailed understanding of the criteria, underscoring the importance of domain-specific understanding over model complexity. These findings highlight the potential of LLMs to deliver scalable educational feedback.

View Article and Find Full Text PDF

Deep neural networks drive the success of natural language processing. A fundamental property of language is its compositional structure, allowing humans to systematically produce forms for new meanings. For humans, languages with more compositional and transparent structures are typically easier to learn than those with opaque and irregular structures.

View Article and Find Full Text PDF

Modelling the dynamics of biological processes is ubiquitous across the ecological and evolutionary disciplines. However, the increasing complexity of these models poses a challenge to the dissemination of model-derived results. Often only a small subset of model results are made available to the scientific community, with further exploration of the parameter space relying on local deployment of code supplied by the authors.

View Article and Find Full Text PDF

Objective: To characterize the public conversations around long COVID, as expressed through X (formerly Twitter) posts from May 2020 to April 2023.

Methods: Using X as the data source, we extracted tweets containing #long-covid, #long_covid, or "long covid," posted from May 2020 to April 2023. We then conducted an unsupervised deep learning analysis using Bidirectional Encoder Representations from Transformers (BERT).

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!