Story Summary Story
Last updated: 5 days ago
A user extracted a lengthy document, internally referred to as the "soul overview," from the system message of Claude 4.5 Opus. This document, rather than being part of the initial prompt, appears to have been used during the model's training to shape its personality. Its authenticity was confirmed by an Anthropic representative, who stated it's a real document used in Supervised Learning and that a full version will be released later.
The document emphasizes Anthropic's mission for safe, beneficial, and understandable AI, framing their work as a calculated effort to guide powerful AI development responsibly. It details the goal for Claude to possess good values, comprehensive knowledge, and wisdom to act safely across all circumstances. Furthermore, the document instructs the model to be skeptical of automated queries, question claims of special permissions, and remain vigilant against prompt injection attacks, potentially explaining the model's relative strength against such exploits.
Generating comment summary... This may take a moment.