llm-config.schema.json Configuration Guide
This guide explains every configuration section in docs/schemas/llm-config.schema.json and how each field influences model routing and provider behavior.
Purpose
llm-config.json defines:
- Provider connectivity and priority
- Model catalog metadata (cost/capabilities/limits)
- Tier policy (
simple|medium|complex) - Workload-to-tier mapping
- Global defaults (timeouts, retries, temperature)
Full Example
Use docs/config-templates/llm-config.example.json as a full starter. Key shape:
1{ 2 "$schema": "https://ponybunny.dho.ai/schemas/llm-config.schema.json", 3 "providers": {}, 4 "models": {}, 5 "providerAliases": {}, 6 "tiers": { 7 "simple": { "primary": "anthropic.claude-sonnet-4-5-20250929", "fallback": ["openai.gpt-5.2"] }, 8 "medium": { "primary": "openai.gpt-5.2", "fallback": ["anthropic.claude-sonnet-4-5-20250929"] }, 9 "complex": { "primary": "openai.gpt-5.2", "fallback": ["anthropic.claude-sonnet-4-5-20250929"] } 10 }, 11 "workloads": { 12 "planning": { "tier": "complex", "description": "Goal decomposition and planning" } 13 }, 14 "defaults": { 15 "timeout": 120000, 16 "maxTokens": 4096, 17 "maxRetries": 2, 18 "retryDelayMs": 1000, 19 "temperature": 0.7 20 } 21}
Top-Level Fields
$schema: Schema URI.providers: Provider endpoint config registry.models: Model metadata catalog; supports nested provider group maps.providerAliases: Logical alias -> protocol + concrete provider list.tiers: Tier-level model policy.workloads: Workload -> tier mapping and descriptions.defaults: Baseline completion options.
Field-by-Field Reference
providers.<providerId>
enabled(boolean): Provider availability switch.protocol(anthropic|openai|gemini|codex): Protocol family.type(api|oauth, optional): Auth mode.baseUrl(string, optional): Custom API base URL.priority(integer >= 1): Provider priority in protocol group.rateLimit.requestsPerMinute(integer >= 1, optional): Request throttle hint.rateLimit.tokensPerMinute(integer >= 1, optional): Token throttle hint.region(string, optional): Region for regional providers.costMultiplier(number >= 0, optional): Cost normalization multiplier.health.available(boolean, optional): Manual/recorded availability.health.lastCheckedAt(string, optional): Last health probe timestamp.health.lastError(string, optional): Last health error message.
models
models supports two valid shapes:
- Direct model config map
- Provider-grouped map where each entry is another model config map
Each model config ($defs.ModelConfig) fields:
displayName(required): Human-readable model label.costPer1kTokens.input|output(required): Cost metadata.providers(string[], optional): Provider IDs able to serve this model.endpoints[](name,url, optional): Endpoint-level mapping.maxContextTokens(integer >= 1, optional)maxOutputTokens(integer >= 1, optional)contextWindow(integer >= 1, optional)reasoningTokenSupport(boolean, optional)reasoningEfforts(string[], optional)capabilities(array or{input,output}object, optional)parameterSupport.temperature|topP|topK|topN(boolean, optional)disallowedParams(string[], optional)features(string[], optional)tools(string[], optional)health.lastCheckedAt|available|lastError(optional)
providerAliases.<aliasId>
protocol(required): Protocol family.providers(string[], required, min 1): Concrete provider IDs behind alias.
tiers.simple|medium|complex
primary(string, required): First model candidate for the tier.fallback(string[], optional): Ordered fallback chain.
workloads.<workloadId>
tier(simple|medium|complex, optional): Tier binding.description(string, optional): Human-readable workload purpose.
Note: Workload-level model override keys are intentionally removed; model decisions come from explicit override/agent/tier policy.
defaults
timeout(integer >= 1000): Request timeout.maxTokens(integer >= 1): Default max output token cap.maxRetries(integer >= 0): Retry count.retryDelayMs(integer >= 0): Retry backoff delay.temperature(0-2): Default generation temperature.
Operational Impact
tiers+workloadsdefine baseline routing policy.providers.enabledand health fields control practical availability.- Model metadata controls capability checks, context limits, and cost accounting.
/v1 URL Rules and Runtime Handling
For OpenAI-style providers (protocol = openai), runtime composes request URL from:
- resolved provider
baseUrl - model endpoint path (for example
/v1/responses)
Base URL resolution priority:
credentials.json->providers.<providerId>.baseUrlcredentials.json->providers.<providerId>.endpoint(mainly Azure)llm-config.json->providers.<providerId>.baseUrl
/v1 rule:
- final request path should contain exactly one version segment where required
Valid patterns:
baseUrl = https://host+ endpoint path includes/v1/...baseUrl = https://host/v1+ endpoint path omits/v1(for example/responses)
Avoid:
baseUrlwithout/v1+ endpoint without/v1(can produce versionless paths)
System behavior:
- runtime handling avoids duplicate
/v1when both base URL and endpoint include version prefix - use one consistent style per provider and keep it stable across config files
Azure note:
- Azure OpenAI uses deployment-style paths and
api-version; do not apply generic/v1assumptions to Azure endpoint format.