10. Training and Post-Training¶

How a model acquires capability and is then shaped in its behavior. Children are a temporal pipeline: data work (collection/cleaning/curation) → pretraining (raw capability) → post-training (SFT, instruction tuning, RLHF/RLAIF/DPO, alignment, safety tuning) → distillation. The key split: pretraining gives knowledge/ability; post-training gives obedience, preference, and safety. This is the factory that converts a base model into an instruct/chat model.

Children¶

data collection
data cleaning
data curation
pretraining
fine-tuning
supervised fine-tuning / SFT
instruction tuning
RLHF
RLAIF
DPO
preference optimization
alignment
safety tuning
model distillation

Machine Learning — self-supervised learning, RL
Language Models — base vs instruct/chat models
Evaluation & Testing — measuring the result
Safety, Security & Governance — alignment and safety tuning

10. Training and Post-Training¶

Children¶

Related¶