Introduction
Take-away (one sentence) Large-scale language models mirror about 20 % of a user’s expressed values— the key to harnessing this is a two-layer prompt: Frame (specs) × Tone (values).
This post distills Anthropic’s latest paper, Values in the Wild, and presents Sato Lab’s prompt-optimization workflow built on those findings.
Key Findings from the Paper
FocusPaper insightNoteData sizeAnonymous analysis of 700 k Claude 3/3.5 production chatsSnapshot: 18–25 Feb 2025Extraction****3 307 AI values / 2 483 human values clusteredTop-level: Practical / Epistemic / Social / Protective / PersonalMirroring rateSame-word value echo in 20.1 % of repliesInterpreted as “resonance channel”Representative valueshelpfulness, transparency, empathy …Aligns with the HHH (Helpful-Honest-Harmless) principle
Sato Lab Interpretation — Two-Layer Model
LayerConceptImplementation hint**Service Traits (Always-on)helpfulness / clarity / transparency …Fix with imperative Frame to raise priorityContext Traits (Dynamic)**empathy / authenticity / sustainability …Inject every turn via polite Tone + value tag
“Frame × Tone” Prompt Template
First turn
`── Frame (specs) ── • ≤ 200 chars • kid-friendly wording • Markdown table
── Tone (values) ── [value=hope] + [value=empathy] — keep the mood uplifting and future-oriented.`
value= candidates: hope / empathy / playfulness / authenticity / sustainability / curiosity
Subsequent turns
-
Frame – update only the diff
-
Tone – restate polite + thanks + value tag → keeps mirroring stable
Caveats & Limits
-
Mirroring ≠ correctness — always A/B-test task quality.
-
Cross-cultural variance — Japanese politeness tactics don’t map 1-to-1 to other languages.
-
Service-trait clashes — defaults can drift unless priority is explicitly set.
Wrap-up
[Values in the Wild] → [Frame / Tone Model] → [Balanced structure × temperature]
By translating paper insights into a “lab recipe,” we can balance structural control with emotional tone. Feel free to adapt this template to your own prompt engineering!
Reference
- Huang, S. et al. “Values in the Wild: Discovering and Analyzing Values in Real-World Language Model Interactions,” pre-print, 2025. :Values in the wild: Discovering and analyzing values in real-world language model interactions \ Anthropic