Publications by Leon Derczynski

Publications by authors named "Leon Derczynski"

Page 1 of 1

Summon a demon and bind it: A grounded theory of LLM red teaming.

Nanna Inie Jonathan Stray Leon Derczynski

PLoS One

January 2025

Engaging in the deliberate generation of abnormal outputs from Large Language Models (LLMs) by attacking them is a novel human activity. This paper presents a thorough exposition of how and why people perform such attacks, defining LLM red-teaming based on extensive and diverse evidence. Using a formal qualitative methodology, we interviewed dozens of practitioners from a broad range of backgrounds, all contributors to this novel work of attempting to cause LLMs to fail.

View Article and Find Full Text PDF

Directions in abusive language training data, a systematic review: Garbage in, garbage out.

Bertie Vidgen Leon Derczynski

PLoS One

January 2021

Data-driven and machine learning based approaches for detecting, categorising and measuring abusive content such as hate speech and harassment have gained traction due to their scalability, robustness and increasingly high performance. Making effective detection systems for abusive content relies on having the right training datasets, reflecting a widely accepted mantra in computer science: Garbage In, Garbage Out. However, creating training datasets which are large, varied, theoretically-informed and that minimize biases is difficult, laborious and requires deep expertise.

View Article and Find Full Text PDF

Mental health-related conversations on social media and crisis episodes: a time-series regression analysis.

Anna Kolliakou Ioannis Bakolis David Chandran Leon Derczynski Nomi Werbeloff

Sci Rep

February 2020

We aimed to investigate whether daily fluctuations in mental health-relevant Twitter posts are associated with daily fluctuations in mental health crisis episodes. We conducted a primary and replicated time-series analysis of retrospectively collected data from Twitter and two London mental healthcare providers. Daily numbers of 'crisis episodes' were defined as incident inpatient, home treatment team and crisis house referrals between 2010 and 2014.

View Article and Find Full Text PDF