Skip to content

12. Inference

A single act of running the model to get output. Children are the knobs and I/O of one call: the input structure (system/developer/user messages, context), the token economy (input/output/reasoning tokens, context length), the sampling controls (temperature, top_p, max tokens, stop sequences), and the output shape (structured output, JSON schema, streaming). This is the practitioner's primary control surface — what you actually touch via an API.

Children

  • prompt
  • system instructions
  • developer instructions
  • user message
  • context
  • context length
  • input tokens
  • output tokens
  • reasoning tokens
  • temperature
  • top_p
  • max tokens
  • stop sequences
  • structured output
  • JSON schema
  • streaming output