Denotes the number of tokens to predict per generation.
If not explicitly set, this is set dynamically using the model's
max_output_tokens from its
model profile.
Falls back to 4096 when no profile entry exists for the model or when the
profile is missing max_output_tokens.