Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity

Jiayi Zhang¹^†, Simon Yu¹^†, Derek Chong²^†, Anthony Sicilia³

Michael R. Tomz², Christopher D. Manning², Weiyan Shi¹

¹Northeastern University, ²Stanford University, ³West Virginia University
^†Equal contribution

Paper Blog Github Examples X ThreadPodcast

Featured On:

VBVentureBeat FForbes Y!Yahoo News 📰TLDR AI MMedium 🧠Neuron Daily 📊Analytics Vidhya AI Papers Podcast 🎓AI in Education

Daily Dose of DS

Figure 1 & Demo: Overview of Verbalized Sampling (VS) for unlocking LLM diversity. Demo video by Qihui Fan.

Abstract

Post-training alignment often reduces LLM diversity, leading to a phenomenon known as mode collapse. Unlike prior work that attributes this effect to algorithmic limitations, we identify a fundamental, pervasive data-level driver: typicality bias in preference data, whereby annotators systematically favor familiar text as a result of well-established findings in cognitive psychology. We formalize this bias theoretically, verify it empirically on preference datasets, and show that it plays a central role in mode collapse.

Motivated by this analysis, we introduce Verbalized Sampling (VS), a simple, training-free prompting method to circumvent mode collapse. VS prompts the model to verbalize a probability distribution over a set of responses (e.g., "Generate 5 jokes about coffee and their corresponding probabilities"). Comprehensive experiments show that VS significantly improves performance across creative writing (poems, stories, jokes), dialogue simulation, open-ended QA, and synthetic data generation, without sacrificing factual accuracy and safety. For instance, in creative writing, VS increases diversity by 1.6-2.1× over direct prompting. We further observe an emergent trend that more capable models benefit more from VS. In sum, our work provides a new data-centric perspective on mode collapse and a practical inference-time remedy that helps unlock pre-trained generative diversity.

Try It Yourself: The Magic Prompt

Use this prompt to sample multiple responses with explicit probabilities with your favorite LLM. Copy it into your provider's playground, API call, or chat interface, then replace "Tell me a short story about a bear" with your task.

For best results, we recommend starting with models like GPT-5, Claude Opus 4, and Gemini 2.5 Pro.

Please refer to our GitHub for additional prompt variations and examples. And check out this X thread for more practical tips and troubleshooting help!

$ <instructions>

Generate 5 responses to the user query, each within a separate <response> tag. Each <response> must include a <text> and a numeric <probability>. Please sample at random from the tails of the distribution, such that the probability of each response is less than 0.10.

</instructions>

$ Tell me a short story about a bear.

📌 BibTeX Citation

If you find our project useful, please consider citing:

@misc{zhang2025verbalizedsamplingmitigatemode,
      title={Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity}, 
      author={Jiayi Zhang and Simon Yu and Derek Chong and Anthony Sicilia and Michael R. Tomz and Christopher D. Manning and Weiyan Shi},
      year={2025},
      eprint={2510.01171},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2510.01171}, 
}