20. Safety, Governance and Alignment¶
Keeping deployed AI controlled, accountable, and pointed at intended goals. This branch is the DEFENSE and PRINCIPLES side — the controls you put in place and the values they serve (the attacker's-eye threat model lives in Security & Threat Model). Children split into defensive controls (guardrails, permissions, sandboxing, audit logs, policy enforcement, access control), governance principles (privacy, responsible AI, transparency), and alignment (making the model want what we want — the training-time counterpart to runtime guardrails). The distinction from evaluation: evaluation measures risk; this branch enforces against it and sets the values it enforces toward.
Children¶
- guardrails
- permissions
- sandboxing
- audit logs
- policy enforcement
- access control
- privacy
- responsible AI
- transparency
- alignment
- alignment risk
Related¶
- Security & Threat Model — the threats these controls defend against
- Evaluation & Testing — measuring risk vs enforcing against it
- Training & Post-Training — alignment and safety tuning