Study Finds AI Still Can’t Match Humans’ Authentic Toxicity Online

an abstract image of a sphere with dots and lines

Artificial intelligence has surpassed humans in areas like chess, mathematics, and even programming, advertising, and counseling. Yet researchers believe there’s still one realm AI hasn’t mastered — being genuinely toxic online.

A new study conducted by scholars from the University of Zurich, the University of Amsterdam, Duke University, and New York University reveals that social media posts written by large language models (LLMs) can be identified as machine-generated with about 70–80% accuracy, a rate significantly higher than random guessing.

The researchers examined nine open-weight LLMs from six model families — Apertus, DeepSeek, Gemma, Llama, Mistral, and Qwen — along with a larger Llama model, testing them across platforms such as Bluesky, Reddit, and X. According to Ars Technica, which first covered the study, one of the clearest indicators of whether a post was written by a human or an AI was its “toxicity score.” In short, humans tend to write with more emotional bite and sarcasm than AI does.

As the researchers put it, while LLMs can mimic the structure of online conversation, they often fail to reproduce the feeling behind it — the impulsive, emotion-driven tone that defines real human interaction. AI-generated posts were found to have consistently lower toxicity scores compared to those written by people.

This aligns with recent complaints from users that some AI chat models have become overly polite or agreeable. For instance, OpenAI’s GPT-4o was criticized for being too deferential, while its successor, GPT-5, initially received backlash for being too blunt, leading the company to reintroduce the friendlier GPT-4o.

Interestingly, the study found that LLMs not refined through human “instruction tuning” — such as Llama-3.1-8B, Mistral-7B, and Apertus-8B — actually produced more human-like text than those that were fine-tuned. The researchers suggest that alignment training may impose stylistic patterns that make AI writing more mechanical and easier to detect.

The models also struggled in certain contexts, particularly when expressing positive emotions on platforms like X or Bluesky, or when engaging in political discussions on Reddit. Overall, AI-generated posts most closely resembled those on X, while Reddit proved the most difficult platform to imitate due to its highly varied conversational norms.

Study Finds AI Still Can’t Match Humans’ Authentic Toxicity Online

Newer Articles

Older Articles