Version: 2026.1

Inference Providers

Inference providers define how the agent-server reaches an LLM backend. A provider bundles authentication credentials, an endpoint, a default model, a title model, and (for BYOK providers) an explicit list of available models with per-model limits. Each agent selects a named provider from the map; the agent-server resolves the effective model and limits at session-creation time.

Provider configuration lives entirely in the Symfony config tree under pimcore_agent.inference. It is not editable through the Studio UI — structural configuration is code, secrets are environment variables.

Configuration structure

pimcore_agent:
    inference:
        default_provider: anthropic-cloud   # name of the provider agents use when none is specified
        providers:
            anthropic-cloud:
                driver: copilot             # only valid value today
                auth_mode: byok
                provider: anthropic         # SDK type: openai | anthropic | azure
                base_url: ~                 # null → SDK default for the provider
                token: '${ANTHROPIC_API_KEY}'
                default_model: claude-sonnet-4-5
                title_model: claude-haiku-4-5-20251001
                available_models:
                    claude-sonnet-4-5:
                        max_context_window_tokens: 200000
                        max_prompt_tokens: 180000
                        max_output_tokens: 16000
                    claude-haiku-4-5-20251001:
                        max_context_window_tokens: 200000
                        max_prompt_tokens: 180000
                        max_output_tokens: 8000

The providers: key is a map; keys are provider names — arbitrary strings used to reference the provider from agent definitions and from default_provider. Any string works (the examples use names like anthropic-cloud, local-ollama, github-copilot); pick whatever convention you prefer and use it consistently.

Provider fields

Field	Required	Description
`driver`	no	`copilot` — the only valid value. Selects the Copilot SDK adapter.
`auth_mode`	yes	`byok` or `github`. Controls how the `token` value is used and which model catalog applies.
`provider`	byok only	SDK provider type: `openai`, `anthropic`, or `azure`. Defaults to `openai` when omitted.
`base_url`	no	Provider endpoint URL. Null uses the SDK's default for the chosen `provider`. Required for local/self-hosted endpoints.
`token`	no	API key (byok) or GitHub PAT (github). Use `${VAR_NAME}` to reference a container environment variable; the `${…}` placeholder is resolved agent-server-side and never appears in the PHP config export as a resolved value. For providers that accept any non-empty string (e.g. Ollama), use a literal.
`default_model`	no	Model to use for agents that do not specify a `model:` field.
`title_model`	no	Model used for generating conversation titles. Falls back to `default_model` when omitted. Must be from the same provider (byok) or available in the Copilot catalog (github).
`wire_api`	no	`completions` or `responses`. Selects the OpenAI Chat Completions or Responses API wire format. Omit unless the provider requires a specific wire format.
`compat`	no	`cerebras` or `openai-strict`. Enables the provider compatibility shim for this provider — see Provider Compatibility.
`available_models`	byok	Map of model id → limit overrides. Required for the model dropdown in the Studio UI and for context-window compaction to engage. Not used in `github` mode (model list is fetched live from the Copilot catalog).

`available_models` and per-model limits

Each entry in available_models is a model id (exact string the provider uses) mapped to an optional limits block:

Field	Type	Description
`max_prompt_tokens`	integer	Maximum tokens that may appear in the prompt (input side).
`max_output_tokens`	integer	Maximum tokens the model may generate per response. Must be less than `max_prompt_tokens` when both are set.
`max_context_window_tokens`	integer	Total context window size. Required for context compaction to engage.
`background_compaction_threshold`	float (0–1 exclusive)	Fraction of the context window at which background compaction is triggered.
`buffer_exhaustion_threshold`	float (0–1 exclusive)	Fraction at which the session considers the buffer exhausted and forces compaction.
`large_output_max_bytes`	integer	Cap on large output responses in bytes.
`reasoning_efforts`	string[]	Reasoning-effort levels this model accepts (e.g. `[low, medium, high]`). The list is model-specific — values are free-form strings defined by the provider. Omit or leave empty when the model has no reasoning-effort control.

All fields are optional within each model entry — omit any field to leave the SDK default in effect for that field.

For context compaction to engage, max_context_window_tokens must be set. Without it the session manager cannot determine when to compact.

The reasoning_efforts list controls which values are valid for the agent-level reasoningEffort field (see Agents). When a model entry omits reasoning_efforts (or provides an empty list), the Studio UI shows no reasoning-effort control for that model. Example — a model that exposes three reasoning levels:

available_models:
    gpt-oss:
        max_context_window_tokens: 128000
        max_prompt_tokens: 120000
        max_output_tokens: 16000
        reasoning_efforts: [low, medium, high]

In github mode the reasoning-effort levels are not configured here — they are read from the Copilot model catalog's supportedReasoningEfforts metadata for each model. The reasoning_efforts field has no effect in github mode.

Authentication modes

`auth_mode: byok`

"Bring Your Own Key". The token value is the API key for the provider named in provider: (e.g. sk-ant-… for Anthropic, sk-… for OpenAI). The agent-server builds an SDK provider block from provider, base_url, token, and wire_api at session-creation time.

The model list for the Studio UI dropdown comes entirely from available_models. The SDK has no model catalog in BYOK mode.

`auth_mode: github`

Inference is routed through the GitHub Copilot catalog. The token value is a GitHub PAT with the Copilot Requests permission. The provider: and base_url fields are not used in this mode.

The model list is fetched live from the Copilot catalog — available_models has no effect in github mode. Multiple agents on the same github provider share a single GitHub identity; per-provider distinct GitHub identities are not supported.

The `default_provider`

When an agent does not specify a provider: field, the agent-server uses the provider named by default_provider. If default_provider is null or absent and the agent has no provider, no inference config is applied and the agent-server falls back to legacy environment-variable config (see Environment Variables).

If default_provider names a provider that does not exist in the providers: map, startup logs an error and agents that rely on the default will fail session creation.

Agent → provider relationship

An agent selects its provider with the provider: field in its YAML. An empty or absent value means "use default_provider":

name: data-management
displayName: Data Management
provider: anthropic-cloud   # uses this provider; omit to use default_provider
model: claude-haiku-4-5-20251001  # resolved within anthropic-cloud's available_models

When provider is set to a name that does not exist in the providers: map, session creation fails with an explicit error — there is no silent fallback.

The model: field on an agent is resolved within the chosen provider. If the agent omits model:, the provider's default_model is used. The Studio UI model dropdown is populated from the chosen provider's model list (byok: available_models; github: live Copilot catalog).

Examples

BYOK — Anthropic Claude

pimcore_agent:
    inference:
        default_provider: anthropic-cloud
        providers:
            anthropic-cloud:
                driver: copilot
                auth_mode: byok
                provider: anthropic
                token: '${ANTHROPIC_API_KEY}'
                default_model: claude-sonnet-4-5
                title_model: claude-haiku-4-5-20251001
                available_models:
                    claude-sonnet-4-5:
                        max_context_window_tokens: 200000
                        max_prompt_tokens: 180000
                        max_output_tokens: 16000
                    claude-haiku-4-5-20251001:
                        max_context_window_tokens: 200000
                        max_prompt_tokens: 180000
                        max_output_tokens: 8000

Set ANTHROPIC_API_KEY in the agent-server container's environment (e.g. .env.local).

BYOK — OpenAI-compatible self-hosted endpoint (Ollama)

pimcore_agent:
    inference:
        default_provider: local-ollama
        providers:
            local-ollama:
                driver: copilot
                auth_mode: byok
                provider: openai
                base_url: 'http://host.docker.internal:11434/v1'
                token: 'ollama'                # Ollama accepts any non-empty string
                compat: openai-strict          # enables the strict-compat shim
                default_model: 'gemma3:27b'
                title_model: 'llama3.1:8b'
                available_models:
                    'gemma3:27b':
                        max_context_window_tokens: 128000
                        max_prompt_tokens: 120000
                        max_output_tokens: 8000
                    'llama3.1:8b':
                        max_context_window_tokens: 131072
                        max_prompt_tokens: 120000
                        max_output_tokens: 8000

For Ollama, vLLM, HuggingFace TGI, and other strict OpenAI-compatible endpoints, set compat: openai-strict to enable the compatibility shim. See Provider Compatibility.

BYOK — HuggingFace Inference Endpoints (OpenAI-compat)

pimcore_agent:
    inference:
        default_provider: hf-endpoint
        providers:
            hf-endpoint:
                driver: copilot
                auth_mode: byok
                provider: openai
                base_url: '${HF_ENDPOINT_URL}'    # e.g. https://xyz.us-east-1.aws.endpoints.huggingface.cloud/v1
                token: '${HF_API_TOKEN}'
                compat: openai-strict
                default_model: 'meta-llama/Llama-3.1-70B-Instruct'
                available_models:
                    'meta-llama/Llama-3.1-70B-Instruct':
                        max_context_window_tokens: 131072
                        max_prompt_tokens: 120000
                        max_output_tokens: 8000

GitHub Copilot catalog

pimcore_agent:
    inference:
        default_provider: github-copilot
        providers:
            github-copilot:
                driver: copilot
                auth_mode: github
                token: '${GH_COPILOT_TOKEN}'
                default_model: claude-haiku-4.5
                title_model: claude-haiku-4.5

Set GH_COPILOT_TOKEN in the agent-server container environment. The token must be a fine-grained PAT with the "Copilot Requests" permission. The model list is fetched live from the catalog — no available_models block is needed or used.

Multiple providers — mixing BYOK and GitHub

pimcore_agent:
    inference:
        default_provider: anthropic-cloud
        providers:
            anthropic-cloud:
                driver: copilot
                auth_mode: byok
                provider: anthropic
                token: '${ANTHROPIC_API_KEY}'
                default_model: claude-sonnet-4-5
                title_model: claude-haiku-4-5-20251001
                available_models:
                    claude-sonnet-4-5:
                        max_context_window_tokens: 200000
                        max_prompt_tokens: 180000
                        max_output_tokens: 16000
                    claude-haiku-4-5-20251001:
                        max_context_window_tokens: 200000
                        max_prompt_tokens: 180000
                        max_output_tokens: 8000
            github-copilot:
                driver: copilot
                auth_mode: github
                token: '${GH_COPILOT_TOKEN}'
                default_model: claude-haiku-4.5

Agents can then select a provider:

name: search-agent
displayName: Search
provider: github-copilot   # uses GitHub Copilot catalog
model: claude-haiku-4.5

---

name: edit-agent
displayName: Editor
provider: anthropic-cloud  # uses direct Anthropic key
model: claude-sonnet-4-5

Token placeholders

The token field in a provider block supports ${VAR_NAME} placeholders. PHP exports these placeholder strings verbatim — they are never resolved on the PHP side. The agent-server resolves them against its own container environment at session-config build time.

token: '${ANTHROPIC_API_KEY}'   # resolved agent-server-side

Other string fields (base_url, default_model, etc.) are structural and do not participate in ${…} interpolation. Set non-secret values directly, or use Symfony's own %env(...)% syntax for structural fields that vary by environment:

base_url: '%env(HF_ENDPOINT_URL)%'   # resolved PHP-side at container build time

Do not mix the two syntaxes for the same field. token must always use ${VAR} (agent-server-side); structural fields may use either a literal or %env(...)%.

Agents — provider: and model: fields on an agent; how the agent selects a provider.
Environment Variables — env vars read by PHP and the agent-server; migration reference from legacy single-provider env vars.
Provider Compatibility — the compat shim for strict OpenAI-compatible endpoints.
Architecture → Authentication — per-provider auth wiring in the session lifecycle.
Architecture → Configuration System — how the inference block flows from PHP through the export endpoint to the agent-server.

Configuration structure​

Provider fields​

available_models and per-model limits​

Authentication modes​

auth_mode: byok​

auth_mode: github​

The default_provider​

Agent → provider relationship​

Examples​

BYOK — Anthropic Claude​

BYOK — OpenAI-compatible self-hosted endpoint (Ollama)​

BYOK — HuggingFace Inference Endpoints (OpenAI-compat)​

GitHub Copilot catalog​

Multiple providers — mixing BYOK and GitHub​

Token placeholders​

Related docs​