Introduction
Take-away (one sentence)
Large-scale language models mirror about 20 % of a user’s expressed values—
the key to harnessing this is a two-layer prompt: Frame (specs) × Tone (values).
This post distills Anthropic’s latest paper, Values in the Wild, and presents
Sato Lab’s prompt-optimization workflow built on those findings.
Key Findings from the Paper
Focus | Paper insight | Note |
---|---|---|
Data size | Anonymous analysis of 700 k Claude 3/3.5 production chats | Snapshot: 18–25 Feb 2025 |
Extraction | 3 307 AI values / 2 483 human values clustered | Top-level: Practical / Epistemic / Social / Protective / Personal |
Mirroring rate | Same-word value echo in 20.1 % of replies | Interpreted as “resonance channel” |
Representative values | helpfulness, transparency, empathy … | Aligns with the HHH (Helpful-Honest-Harmless) principle |
Sato Lab Interpretation — Two-Layer Model
Layer | Concept | Implementation hint |
---|---|---|
Service Traits (Always-on) | helpfulness / clarity / transparency … | Fix with imperative Frame to raise priority |
Context Traits (Dynamic) | empathy / authenticity / sustainability … | Inject every turn via polite Tone + value tag |
“Frame × Tone” Prompt Template
First turn
── Frame (specs) ── • ≤ 200 chars • kid-friendly wording • Markdown table
── Tone (values) ── [value=hope] + [value=empathy] — keep the mood uplifting and future-oriented.
value=
candidates: hope / empathy / playfulness / authenticity / sustainability / curiosity
Subsequent turns
- Frame – update only the diff
- Tone – restate polite + thanks + value tag → keeps mirroring stable
Caveats & Limits
- Mirroring ≠ correctness — always A/B-test task quality.
- Cross-cultural variance — Japanese politeness tactics don’t map 1-to-1 to other languages.
- Service-trait clashes — defaults can drift unless priority is explicitly set.
Wrap-up
[Values in the Wild] → [Frame / Tone Model] → [Balanced structure × temperature]
By translating paper insights into a “lab recipe,” we can balance structural control
with emotional tone. Feel free to adapt this template to your own prompt engineering!
Reference
- Huang, S. et al. “Values in the Wild: Discovering and Analyzing Values in Real-World Language Model Interactions,” pre-print, 2025. :Values in the wild: Discovering and analyzing values in real-world language model interactions \ Anthropic