Skip to main content
Version: 2026.1

Inference Providers

Inference providers define how the agent-server reaches an LLM backend. A provider bundles authentication credentials, an endpoint, a default model, a title model, and (for BYOK providers) an explicit list of available models with per-model limits. Each agent selects a named provider from the map; the agent-server resolves the effective model and limits at session-creation time.

Provider configuration lives entirely in the Symfony config tree under pimcore_agent.inference. It is not editable through the Studio UI — structural configuration is code, secrets are environment variables.

Configuration structure

pimcore_agent:
inference:
default_provider: anthropic-cloud # name of the provider agents use when none is specified
providers:
anthropic-cloud:
driver: copilot # only valid value today
auth_mode: byok
provider: anthropic # SDK type: openai | anthropic | azure
base_url: ~ # null → SDK default for the provider
token: '${ANTHROPIC_API_KEY}'
default_model: claude-sonnet-4-5
title_model: claude-haiku-4-5-20251001
available_models:
claude-sonnet-4-5:
max_context_window_tokens: 200000
max_prompt_tokens: 180000
max_output_tokens: 16000
claude-haiku-4-5-20251001:
max_context_window_tokens: 200000
max_prompt_tokens: 180000
max_output_tokens: 8000

The providers: key is a map; keys are provider names — arbitrary strings used to reference the provider from agent definitions and from default_provider. Any string works (the examples use names like anthropic-cloud, local-ollama, github-copilot); pick whatever convention you prefer and use it consistently.

Provider fields

FieldRequiredDescription
drivernocopilot — the only valid value. Selects the Copilot SDK adapter.
auth_modeyesbyok or github. Controls how the token value is used and which model catalog applies.
providerbyok onlySDK provider type: openai, anthropic, or azure. Defaults to openai when omitted.
base_urlnoProvider endpoint URL. Null uses the SDK's default for the chosen provider. Required for local/self-hosted endpoints.
tokennoAPI key (byok) or GitHub PAT (github). Use ${VAR_NAME} to reference a container environment variable; the ${…} placeholder is resolved agent-server-side and never appears in the PHP config export as a resolved value. For providers that accept any non-empty string (e.g. Ollama), use a literal.
default_modelnoModel to use for agents that do not specify a model: field.
title_modelnoModel used for generating conversation titles. Falls back to default_model when omitted. Must be from the same provider (byok) or available in the Copilot catalog (github).
wire_apinocompletions or responses. Selects the OpenAI Chat Completions or Responses API wire format. Omit unless the provider requires a specific wire format.
compatnocerebras or openai-strict. Enables the provider compatibility shim for this provider — see Provider Compatibility.
available_modelsbyokMap of model id → limit overrides. Required for the model dropdown in the Studio UI and for context-window compaction to engage. Not used in github mode (model list is fetched live from the Copilot catalog).

available_models and per-model limits

Each entry in available_models is a model id (exact string the provider uses) mapped to an optional limits block:

FieldTypeDescription
max_prompt_tokensintegerMaximum tokens that may appear in the prompt (input side).
max_output_tokensintegerMaximum tokens the model may generate per response. Must be less than max_prompt_tokens when both are set.
max_context_window_tokensintegerTotal context window size. Required for context compaction to engage.
background_compaction_thresholdfloat (0–1 exclusive)Fraction of the context window at which background compaction is triggered.
buffer_exhaustion_thresholdfloat (0–1 exclusive)Fraction at which the session considers the buffer exhausted and forces compaction.
large_output_max_bytesintegerCap on large output responses in bytes.
reasoning_effortsstring[]Reasoning-effort levels this model accepts (e.g. [low, medium, high]). The list is model-specific — values are free-form strings defined by the provider. Omit or leave empty when the model has no reasoning-effort control.

All fields are optional within each model entry — omit any field to leave the SDK default in effect for that field.

For context compaction to engage, max_context_window_tokens must be set. Without it the session manager cannot determine when to compact.

The reasoning_efforts list controls which values are valid for the agent-level reasoningEffort field (see Agents). When a model entry omits reasoning_efforts (or provides an empty list), the Studio UI shows no reasoning-effort control for that model. Example — a model that exposes three reasoning levels:

available_models:
gpt-oss:
max_context_window_tokens: 128000
max_prompt_tokens: 120000
max_output_tokens: 16000
reasoning_efforts: [low, medium, high]

In github mode the reasoning-effort levels are not configured here — they are read from the Copilot model catalog's supportedReasoningEfforts metadata for each model. The reasoning_efforts field has no effect in github mode.

Authentication modes

auth_mode: byok

"Bring Your Own Key". The token value is the API key for the provider named in provider: (e.g. sk-ant-… for Anthropic, sk-… for OpenAI). The agent-server builds an SDK provider block from provider, base_url, token, and wire_api at session-creation time.

The model list for the Studio UI dropdown comes entirely from available_models. The SDK has no model catalog in BYOK mode.

auth_mode: github

Inference is routed through the GitHub Copilot catalog. The token value is a GitHub PAT with the Copilot Requests permission. The provider: and base_url fields are not used in this mode.

The model list is fetched live from the Copilot catalog — available_models has no effect in github mode. Multiple agents on the same github provider share a single GitHub identity; per-provider distinct GitHub identities are not supported.

The default_provider

When an agent does not specify a provider: field, the agent-server uses the provider named by default_provider. If default_provider is null or absent and the agent has no provider, no inference config is applied and the agent-server falls back to legacy environment-variable config (see Environment Variables).

If default_provider names a provider that does not exist in the providers: map, startup logs an error and agents that rely on the default will fail session creation.

Agent → provider relationship

An agent selects its provider with the provider: field in its YAML. An empty or absent value means "use default_provider":

name: data-management
displayName: Data Management
provider: anthropic-cloud # uses this provider; omit to use default_provider
model: claude-haiku-4-5-20251001 # resolved within anthropic-cloud's available_models

When provider is set to a name that does not exist in the providers: map, session creation fails with an explicit error — there is no silent fallback.

The model: field on an agent is resolved within the chosen provider. If the agent omits model:, the provider's default_model is used. The Studio UI model dropdown is populated from the chosen provider's model list (byok: available_models; github: live Copilot catalog).

Examples

BYOK — Anthropic Claude

pimcore_agent:
inference:
default_provider: anthropic-cloud
providers:
anthropic-cloud:
driver: copilot
auth_mode: byok
provider: anthropic
token: '${ANTHROPIC_API_KEY}'
default_model: claude-sonnet-4-5
title_model: claude-haiku-4-5-20251001
available_models:
claude-sonnet-4-5:
max_context_window_tokens: 200000
max_prompt_tokens: 180000
max_output_tokens: 16000
claude-haiku-4-5-20251001:
max_context_window_tokens: 200000
max_prompt_tokens: 180000
max_output_tokens: 8000

Set ANTHROPIC_API_KEY in the agent-server container's environment (e.g. .env.local).

BYOK — OpenAI-compatible self-hosted endpoint (Ollama)

pimcore_agent:
inference:
default_provider: local-ollama
providers:
local-ollama:
driver: copilot
auth_mode: byok
provider: openai
base_url: 'http://host.docker.internal:11434/v1'
token: 'ollama' # Ollama accepts any non-empty string
compat: openai-strict # enables the strict-compat shim
default_model: 'gemma3:27b'
title_model: 'llama3.1:8b'
available_models:
'gemma3:27b':
max_context_window_tokens: 128000
max_prompt_tokens: 120000
max_output_tokens: 8000
'llama3.1:8b':
max_context_window_tokens: 131072
max_prompt_tokens: 120000
max_output_tokens: 8000

For Ollama, vLLM, HuggingFace TGI, and other strict OpenAI-compatible endpoints, set compat: openai-strict to enable the compatibility shim. See Provider Compatibility.

BYOK — HuggingFace Inference Endpoints (OpenAI-compat)

pimcore_agent:
inference:
default_provider: hf-endpoint
providers:
hf-endpoint:
driver: copilot
auth_mode: byok
provider: openai
base_url: '${HF_ENDPOINT_URL}' # e.g. https://xyz.us-east-1.aws.endpoints.huggingface.cloud/v1
token: '${HF_API_TOKEN}'
compat: openai-strict
default_model: 'meta-llama/Llama-3.1-70B-Instruct'
available_models:
'meta-llama/Llama-3.1-70B-Instruct':
max_context_window_tokens: 131072
max_prompt_tokens: 120000
max_output_tokens: 8000

GitHub Copilot catalog

pimcore_agent:
inference:
default_provider: github-copilot
providers:
github-copilot:
driver: copilot
auth_mode: github
token: '${GH_COPILOT_TOKEN}'
default_model: claude-haiku-4.5
title_model: claude-haiku-4.5

Set GH_COPILOT_TOKEN in the agent-server container environment. The token must be a fine-grained PAT with the "Copilot Requests" permission. The model list is fetched live from the catalog — no available_models block is needed or used.

Multiple providers — mixing BYOK and GitHub

pimcore_agent:
inference:
default_provider: anthropic-cloud
providers:
anthropic-cloud:
driver: copilot
auth_mode: byok
provider: anthropic
token: '${ANTHROPIC_API_KEY}'
default_model: claude-sonnet-4-5
title_model: claude-haiku-4-5-20251001
available_models:
claude-sonnet-4-5:
max_context_window_tokens: 200000
max_prompt_tokens: 180000
max_output_tokens: 16000
claude-haiku-4-5-20251001:
max_context_window_tokens: 200000
max_prompt_tokens: 180000
max_output_tokens: 8000
github-copilot:
driver: copilot
auth_mode: github
token: '${GH_COPILOT_TOKEN}'
default_model: claude-haiku-4.5

Agents can then select a provider:

name: search-agent
displayName: Search
provider: github-copilot # uses GitHub Copilot catalog
model: claude-haiku-4.5

---

name: edit-agent
displayName: Editor
provider: anthropic-cloud # uses direct Anthropic key
model: claude-sonnet-4-5

Token placeholders

The token field in a provider block supports ${VAR_NAME} placeholders. PHP exports these placeholder strings verbatim — they are never resolved on the PHP side. The agent-server resolves them against its own container environment at session-config build time.

token: '${ANTHROPIC_API_KEY}'   # resolved agent-server-side

Other string fields (base_url, default_model, etc.) are structural and do not participate in ${…} interpolation. Set non-secret values directly, or use Symfony's own %env(...)% syntax for structural fields that vary by environment:

base_url: '%env(HF_ENDPOINT_URL)%'   # resolved PHP-side at container build time

Do not mix the two syntaxes for the same field. token must always use ${VAR} (agent-server-side); structural fields may use either a literal or %env(...)%.