Large language models (LLMs) such as GPT-4 have recently demonstrated impressive results across a wide range of tasks. LLMs are still limited, however, in that they frequently fail at complex reasoning, their reasoning processes are opaque, they are prone to 'hallucinate' facts, and there are concerns about their underlying biases. Letting models verbalize reasoning steps as natural language, a technique known as chain-of-thought prompting, has recently been proposed as a way to address some of these issues. Here we present ThoughtSource, a meta-dataset and software library for chain-of-thought (CoT) reasoning. The goal of ThoughtSource is to improve future artificial intelligence systems by facilitating qualitative understanding of CoTs, enabling empirical evaluations, and providing training data. This first release of ThoughtSource integrates seven scientific/medical, three general-domain and five math word question answering datasets.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10409727 | PMC |
http://dx.doi.org/10.1038/s41597-023-02433-3 | DOI Listing |
Sci Rep
December 2024
Department of Dermatology, Niazi Hospital, Lahore, Pakistan.
With breakthroughs in Natural Language Processing and Artificial Intelligence (AI), the usage of Large Language Models (LLMs) in academic research has increased tremendously. Models such as Generative Pre-trained Transformer (GPT) are used by researchers in literature review, abstract screening, and manuscript drafting. However, these models also present the attendant challenge of providing ethically questionable scientific information.
View Article and Find Full Text PDFNPJ Sci Learn
December 2024
Department of Psychology, Jeffrey Cheah School of Medicine and Health Sciences, Monash University Malaysia, Bandar Sunway, 475000, Malaysia.
This study evaluates the ability of large language models (LLMs) to deliver criterion-based grading and examines the impact of prompt engineering with detailed criteria on grading. Using well-established human benchmarks and quantitative analyses, we found that even free LLMs achieve criterion-based grading with a detailed understanding of the criteria, underscoring the importance of domain-specific understanding over model complexity. These findings highlight the potential of LLMs to deliver scalable educational feedback.
View Article and Find Full Text PDFNat Commun
December 2024
LEADS group, Max Planck Institute for Psycholinguistics, Nijmegen, Netherlands.
Deep neural networks drive the success of natural language processing. A fundamental property of language is its compositional structure, allowing humans to systematically produce forms for new meanings. For humans, languages with more compositional and transparent structures are typically easier to learn than those with opaque and irregular structures.
View Article and Find Full Text PDFEcol Lett
January 2025
Department of Ecology, Evolution and Behavior, The Hebrew University of Jerusalem, Jerusalem, Israel.
Modelling the dynamics of biological processes is ubiquitous across the ecological and evolutionary disciplines. However, the increasing complexity of these models poses a challenge to the dissemination of model-derived results. Often only a small subset of model results are made available to the scientific community, with further exploration of the parameter space relying on local deployment of code supplied by the authors.
View Article and Find Full Text PDFFront Public Health
December 2024
Saw Swee Hock School of Public Health, National University of Singapore and National University Health System, Singapore, Singapore.
Objective: To characterize the public conversations around long COVID, as expressed through X (formerly Twitter) posts from May 2020 to April 2023.
Methods: Using X as the data source, we extracted tweets containing #long-covid, #long_covid, or "long covid," posted from May 2020 to April 2023. We then conducted an unsupervised deep learning analysis using Bidirectional Encoder Representations from Transformers (BERT).
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!