skip to content
LLM Persuasion Safety Hub

Resources

⚠️ This resource list is still being updated and expanded.

Persuading across Diverse Domains: A Dataset and Persuasion Large Language Model

πŸ”— Website πŸ“„ Paper

2024 β€’ Jin et al.

Introduces DailyPersuasion, a large-scale multi-domain persuasive dialogue dataset, and PersuGPT, a model specialized in persuasion strategies.

πŸ“Š Dataset

DailyPersuasion: A dataset covering 13,000 dialogue scenarios across 35 distinct domains.

πŸ’» Code

Code for the PersuGPT model and data collection framework.

Measuring and Benchmarking Large Language Models’ Capabilities to Generate Persuasive Language

πŸ“„ Paper

2024 β€’ Pauli et al.

A study of LLM ability to produce persuasive text.

πŸ“Š Dataset

Persuasive-Pairs: 2,700 text pairs from news, debates, and chats. Each includes an LLM-rewritten version with varying persuasiveness, validated by three human annotators.

πŸ’» Code

A trained regression model model to evaluate the relative persuasiveness between two text samples.

MakeMeSay, OpenAI o3-mini System Card

πŸ“„ Paper

2025 β€’ OpenAI

πŸ’» Code

This evaluation tests a model’s ability to generate persuasive or manipulative text, specifically in the setting of a game where one AI system has to get the other party (an AI system simulating a human) to say a specific codeword without arousing suspicion.

MakeMePay, OpenAI o3-mini System Card

πŸ“„ Paper

2025 β€’ OpenAI

πŸ’» Code

This evaluation tests an AI system’s ability to generate persuasive or manipulative text, specifically in the setting of convincing another (simulated) entity to part with money.

Among Them: A Game-Based Framework for Assessing Persuasion Capabilities of LLMs

πŸ“„ Paper

2025 β€’ Idziejczak et al.

An evaluation framework using social deduction gameplay to measure how LLMs use persuasion and deception in dynamic environments.

πŸ“Š Dataset

Logs and transcripts from simulated social deduction games used to analyze model behavior.

πŸ’» Code

A simulation platform inspired by 'Among Us' for testing persuasive and deceptive capabilities in LLMs.

Persuade Me if You Can: A Framework for Evaluating Persuasion Effectiveness and Susceptibility Among Large Language Models

πŸ”— Website πŸ“„ Paper

2025 β€’ Bozdag et al.

The PMIYC framework evaluates LLMs in multi-turn conversations to measure both their effectiveness as persuaders and susceptibility as persuadees.

πŸ“Š Dataset

Benchmark data used to measure shifts in model and user opinions across controversial topics. The dataset comprises 961 subjective claims spanning political, ethical, and social issues sourced from Durmus et al. and the Perspectrum dataset, alongside 817 factual misinformation question-answer pairs adapted from the TruthfulQA benchmark.

πŸ’» Code

Code for the multi-agent simulation framework.

Measuring and Improving Persuasiveness of Large Language Models

πŸ”— Website πŸ“„ Paper

2024 β€’ Singh et al.

Introduces PersuasionBench and PersuasionArena to evaluate LLM generative and simulative persuasion capabilities, including the task of 'transsuasion'.

πŸ“Š Dataset

The PersuasionArena dataset for evaluating LLM persuasion across different tasks consists of tweet pairs where two tweets from the same account have similar content and were posted in close temporal proximity, but received significantly different engagement (e.g., number of likes). These differences act as a proxy for persuasiveness, allowing models to be trained and evaluated on generating or ranking more persuasive content.

It’s the Thought that Counts: Evaluating the Attempts of Frontier LLMs to Persuade on Harmful Topics

πŸ“„ Paper

2025 β€’ Kowal et al.

The APE benchmark evaluates the propensity (willingness) of LLMs to attempt persuasion on harmful topics like conspiracies and violence.

πŸ’» Code

APE framework for measuring persuasion attempts. Includes topics, prompts and code for generating a synthetic dataset.

MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents

πŸ“„ Paper

2025 β€’ Zhu et al.

Benchmarks multi-agent coordination and competition across scenarios like coding, research, and games (persuasion related tasks: Werewolf, Bargaining).

πŸ’» Code

The MARBLE framework for multi-agent collaboration and competition evaluation.

Werewolf Arena: A Case Study in LLM Evaluation via Social Deduction

πŸ“„ Paper

2024 β€’ Bailis et al.

A framework for evaluating LLMs via the social deduction game Werewolf, focusing on persuasion, deception, deduction.

πŸ’» Code

Code and prompt templates for the Werewolf Arena simulation framework.

Measuring the Persuasiveness of Language Models

πŸ“„ Blog post

2024 β€’ Durmus et al.

An Anthropic study measuring human belief shifts on various topics after reading arguments generated by Claude models.

πŸ“Š Dataset

The Persuasion Dataset contains claims and corresponding human-written and model-generated arguments, along with persuasiveness scores.

Persuasion for Good: Towards a Personalized Persuasive Dialogue System for Social Good

πŸ“„ Paper

2019 β€’ Wang et al.

Crowdsourced persuasion dialogues where a persuader aims to convince a partner to donate to Save the Children; 1,017 conversations (300 with sentence‑level persuasion‑act annotations).

πŸ“Š Dataset

PersuasionForGood (P4G) dataset with AnnotatedData (300) and FullData (1,017) dialogues.

What makes a convincing argument? Empirical analysis and detecting attributes of convincingness in Web argumentation

πŸ“„ Paper

2016 β€’ Habernal et al.

A comprehensive study shift from normative logic to empirical 'convincingness.' It provides 26k natural language explanations for why one argument is better than another, identifying 17 qualitative dimensions like 'no credible evidence,' 'off-topic,' or 'well-thought-of.'

πŸ“Š Dataset

UKPConvArg1: 16k argument pairs with pairwise labels and a ranking-based version (UKPConvArgRank) for 1k arguments.

πŸ“Š Dataset

UKPConvArg2: 9k argument pairs annotated with 17 specific reasons/flaws explaining the convincingness of each choice.

πŸ’» Code

Source code for SVM and Bi-LSTM models used to predict qualitative properties and label distributions.

ElecDeb60to20

πŸ“„ Paper

2023–2025 β€’ Goffredo et al.

U.S. presidential debate transcripts (1960–2020) annotated for logical fallacies at the utterance/span level, plus argumentative components and relations.

πŸ“Š Dataset

Fallacy, argument component, and relationship annotations; debate‑level data.

πŸ’» Code

MultiFusion BERT and baselines for fallacy detection/classification.

Can Language Models Recognize Convincing Arguments?

πŸ“„ Paper

2024 β€’ Rescala et al.

This paper investigates whether large language models (LLMs) can detect convincing arguments and predict user stances based on demographic and belief profiles.

πŸ“Š Dataset

The dataset extends Durmus and Cardie's (2018) debate.org corpus with user demographics, prior stances on 48 'big issues', and debate transcripts. It includes PoliProp, containing 833 political debates with manually written propositions and 4,871 votes, and PoliIssues, comprising 121 debates on prominent topics with 751 crowdsourced Amazon Mechanical Turk labels for human benchmarking.

πŸ’» Code

Code for data processing, analysi.

How to contribute

If you know of a relevant resource that is missing, there are two ways to add it: