22. Cost and Economics¶
What running AI actually costs and how those costs are reasoned about — a first-class operating concern because the model is a metered, recurring expense, not a one-time build. Children split into inference economics (per-token pricing, input vs output vs reasoning-token cost, context-length cost, caching discounts), training economics (compute/GPU-hours, data and labeling cost), and the levers and accounting (batching, quantization, model-tiering/routing, cost monitoring, unit economics / cost-per-task, build-vs-buy). The organizing question is cost per unit of useful work, which ties this branch to inference, infrastructure, and engineering.
Children¶
- inference economics
- per-token pricing (input / output / reasoning tokens)
- context-length cost
- prompt-caching discounts
- training economics
- compute / GPU-hours
- data and labeling cost
- cost levers
- batching
- quantization
- model tiering / routing
- caching
- cost accounting
- cost monitoring / observability
- unit economics (cost per task)
- build vs buy
Related¶
- Inference — the token economy being priced
- Infrastructure & Runtime — the compute being paid for
- AI Engineering — cost/latency optimization
- Vendor & Model Ecosystem — who sets the prices