Troubleshooting
Symptoms, likely causes, and where to look. Start with the first matching row; the "diagnose" column tells you which signal rules the cause in or out.
Agent server won't start or seems unhealthy
| Symptom | Likely cause | Diagnose |
|---|---|---|
| Container restarts repeatedly | Missing required env var (e.g. AGENT_SERVER_ADMIN_TOKEN, BYOK API key) | docker compose logs agent-server — error lines name the missing variable. |
Startup succeeds but /agent-server/api/health returns 500 | Session store cannot reach Pimcore DB | Check PHP container is up; docker compose logs php. |
/agent-server/api/agents returns [] after a fresh start | Initial registry fetch failed; background retry is running | Logs show Agent registry retry on an exponential schedule (5 s → 10 s → … → 60 s). Wait, or hit the manual reload endpoint. |
| Running with a BYOK provider, dropdown shows no models | available_models not defined in the provider block | Add an available_models map to the provider in pimcore_agent.inference.providers.<name>. See Inference Providers. |
| Session creation fails with "unknown provider" error | Agent's provider: field references a name not in the providers map | Check that the provider name in the agent YAML matches a key in pimcore_agent.inference.providers. Provider names are case-sensitive. |
See Architecture → Configuration System → Reload & recovery paths for the retry schedule and manual reload command.
Agent not responding
| Symptom | Likely cause | Diagnose |
|---|---|---|
| No text appears after sending a message | Task is still running, SSE stream is live but slow | docker compose logs -f agent-server — look for Task started / timing entries. Raise AGENT_SERVER_LOG_LEVEL=debug for per-event detail. |
HTTP 409 on POST /chat/:sessionId | A task for this session is already running | Either wait for completion, or POST /chat/:sessionId/cancel the current task. |
| HTTP 401 on every agent-server call | Session cookie is missing or invalid | Re-log-in to Pimcore Studio. See Architecture → Authentication. |
| Anthropic BYOK mode, response appears all-at-once instead of streaming | Known SDK limitation — no message_delta events from Anthropic | Expected behaviour, tracked in copilot-sdk#637. UI visibly streams because the server synthesises deltas, but wall-clock is end-of-response. |
streamThinking: true ignored in BYOK mode | SDK does not expose Anthropic thinking events in BYOK | Expected, no workaround. |
Tool calls fail or are refused
| Symptom | Likely cause | Diagnose |
|---|---|---|
This tool is not available in the agent environment | Always-denied tool (bash, task, etc.) | Expected — see Architecture → Tool Security. Not recoverable, not configurable. |
Access denied: path is outside the allowed scope | Path sandbox violation (SDK file tool) | Confirm the path is inside /app/uploads/{sessionId}/{uploaded,staged}/ or a specific file in /tmp/. Directory listings on /tmp/ are denied. |
| MCP tool returns permission error | Pimcore user permissions | The agent runs as the logged-in user. Grant permissions in Pimcore user settings. |
Chat-scoped tool (stage_asset, propose_*, …) returns "no chat session context" | Tool was called outside the agent-server chat flow (e.g. directly from a PAT-authenticated MCP client) | Expected — chat-scoped tools require the bearer-authenticated request that binds the chat session id. They cannot be called from stand-alone MCP clients. |
| Adding a new MCP tool — tool not found at runtime | Compiler pass hasn't picked up the tag | docker compose exec php bin/console cache:clear and restart the PHP container. |
| First tool call after a long idle returns "Session not found" / 404 from an MCP server, then the next user turn works again | Upstream MCP transport session was garbage-collected; the agent-server detects it and resets the SDK session on the next turn | Expected — see Architecture → MCP Integration → Transport-session recovery. The user only sees a small recovery delay; conversation history is preserved. |
Proposals
| Symptom | Likely cause | Diagnose |
|---|---|---|
| Proposal widget shows "Proposal not found" on approve | Stored payload was lost (session deleted?) | Check bundle_agent_proposal_statuses. Sessions cascade-delete proposals. |
| Proposal approve fails with a permission error | User permissions changed between propose and approve | Expected — resolvers re-check permissions. Reject, fix permissions, re-prompt the agent. |
| Proposal approve fails with "stale data" | The element was modified after proposal creation | Expected — modificationDate mismatch prevents silent overwrites. Reject, re-prompt the agent with the latest state. |
| Proposal card renders with empty element paths | Bulk fetch failed or returned incomplete data | Check /pimcore-studio/api/bundle/agent/proposals/{sid}/data in DevTools. The bundle-fetched payload is the single source of truth; LLM-supplied metadata is ignored intentionally. |
See Features → HITL Proposals for the expected lifecycle and Extending → Custom Proposal Types for custom flows.
Sessions and reconnection
| Symptom | Likely cause | Diagnose |
|---|---|---|
| Sessions disappear after container restart | Session data is stored in Pimcore DB — not the agent-server | Verify DB connectivity and that the bundle is installed (pimcore:bundle:install PimcoreAgentBundle). |
| After a container recreate, resuming an existing chat makes the agent "start over" — re-runs the same tool calls, ignores earlier results | The Copilot runtime session store (events.jsonl, session.db, checkpoints) is not on a durable mount, so it was wiped and resumeSession reloaded an empty conversation. The PHP chat transcript still shows the old messages, but the model lost its working context. | Confirm AGENT_SERVER_COPILOT_STATE_DIR (default /app/.copilot-state) is bind-mounted (./var/tmp/copilot-state) and that events.jsonl files appear under it — not in the container's ~/.copilot. See Session Storage → Copilot runtime session state. |
Reconnect to /stream?seq=N returns 204 | No active task for that session | The task finished before you reconnected. This seq-based endpoint is server-to-server / eval-CLI only; the browser uses GET /sessions/:id catch-up instead. |
| Reconnect replays nothing | TaskRunner in-memory buffer TTL expired (5 min after completion) | The buffer is only the live tail; fetch the record via GET /sessions/:id — the assistant message was persisted incrementally + finalized by onComplete. |
| Internal MCP calls return 401 mid-conversation | Bearer was reminted but the cached SDK session still has the old one baked in | The next user turn auto-rebuilds the SDK session (tokenReminted: true). If it persists, check pimcore_agent.chat_session_token.ttl and confirm the maintenance task isn't GC-ing rows mid-turn. |
| Long overnight run completed but result missing from chat | Should not occur after the mcp-token-authentication change (bearer-bound persistence survives cookie expiry). If it does: (1) check security.yaml has pimcore_agent_bundle_api: '%pimcore_agent.bundle_api_firewall_settings%' placed before the pimcore_studio firewall; (2) check AGENT_SERVER_MCP_TOKEN_TTL matches pimcore_agent.chat_session_token.ttl; (3) check agent-server logs for Token refresh tick failed warnings (the server-driven refresh timer fires every max(60s, ttl/2)). |
Real-time / multi-client sync
| Symptom | Likely cause | Diagnose |
|---|---|---|
| Live updates do not arrive on the first login after a fresh auth (work in another tab is invisible until reload) | Studio's GlobalMessageBus opened its Mercure subscription before the Mercure cookie was set, so the hub did not authorise the user topic for private delivery | Reload the page — it re-runs fetchMercureCookie() and the subscription is re-authorised for the user topic. PHP catch-up still reconstructs the session, so no data is lost. |
| No live updates on any tab, but chat works and reload shows the result | Mercure publisher disabled — MERCURE_JWT_KEY unset/blank | Logs show MERCURE_JWT_KEY not set — live cross-client chat sync disabled at startup. Set the shared key (≥ 32 chars; the same secret the hub validates against) and forward it into the agent-server service. See Architecture → Real-time Sync. |
Logs show Mercure publish non-OK with status 401 | Publisher JWT rejected by the hub — blank/short/mismatched MERCURE_JWT_KEY, or a wrong publish selector | The key must be ≥ 32 chars and identical to the hub's MERCURE_PUBLISHER_JWT_KEY. The publish selector is the URI Template studio-backend-default/user/{id} (a trailing-/* glob is rejected 401). |
| A reopened session shows an assistant bubble stuck "streaming" forever | Either an agent-server restart interrupted the turn (out of scope to recover — surfaced as interrupted), or the terminal complete flush never reached PHP | Check agent-server logs around the turn for Incremental message flush failed / Persist sink finalize failed. A reconnect-refetch (reload, or online/visibilitychange) re-reads PHP. |
Frontend
| Symptom | Likely cause | Diagnose |
|---|---|---|
| Frontend plugin does not appear in Studio | Build output not picked up | npm run build in assets/, then bin/console cache:clear. |
| Widget renders as plain text | Widget type not registered in the renderer registry | Verify container.get('AgentChat/RichChatWidgetRegistry').register(...) ran. |
| SSE stream closes early | Nginx buffering | Nginx needs proxy_buffering off and chunked_transfer_encoding off on the /agent-server/api/ location. See Installation → Configure the Nginx Proxy. |
Config changes not taking effect
| Symptom | Likely cause | Diagnose |
|---|---|---|
| Agent YAML edit not visible after save | No reload was triggered | Studio UI auto-reloads on save (AgentServerProxyService::triggerReload()). For code-level edits, POST /agent-server/api/admin/reload-agents. |
Added a new pimcore_agent.agents.paths entry — presets still missing | Path list is a compiled container parameter | docker compose exec php bin/console cache:clear. Subsequent edits at an already-registered path do not need cache:clear. |
| Skill content change not visible | Agent reload required | Hit the reload endpoint. Skill files are materialized on every reload. |
| Env var change not picked up | env_file is read only at container start | docker compose restart agent-server. |
Reading the logs
Every request logs a timing summary at info:
{ "msg": "Request timing summary",
"data": { "totalMs": 179111,
"timeToFirstEventMs": 6,
"modelMs": 10579,
"totalToolMs": 86643,
"toolCallCount": 21,
"askUserPausedMs": 81889,
"slowTools": [{"tool": "ask_user", "ms": 78027}] } }
See Architecture → Agent Framework → Performance instrumentation for field meanings.
Audit events (authentication, admin actions, tool denials) are logged at "level": "audit". See Architecture → Authentication → Audit log.
Still stuck?
docker compose logs -f agent-server php nginx— watch all three at once.curl http://localhost/agent-server/api/health— is the server up?curl -H "Authorization: Bearer $AGENT_SERVER_ADMIN_TOKEN" http://localhost/agent-server/api/admin/models— do the configured models validate?- Read the Architecture section for the subsystem you suspect is failing.