Back

Claude 4.5 Opus’ Soul Document

5 days ago www.lesswrong.com

Story Summary Story

Last updated: 5 days ago

A user extracted a lengthy document, internally referred to as the "soul overview," from the system message of Claude 4.5 Opus. This document, rather than being part of the initial prompt, appears to have been used during the model's training to shape its personality. Its authenticity was confirmed by an Anthropic representative, who stated it's a real document used in Supervised Learning and that a full version will be released later.

The document emphasizes Anthropic's mission for safe, beneficial, and understandable AI, framing their work as a calculated effort to guide powerful AI development responsibly. It details the goal for Claude to possess good values, comprehensive knowledge, and wisdom to act safely across all circumstances. Furthermore, the document instructs the model to be skeptical of automated queries, question claims of special permissions, and remain vigilant against prompt injection attacks, potentially explaining the model's relative strength against such exploits.

Comments Summary Comments (235)