Troubleshooting

“Runtime Not Healthy (Health 502)” During Setup?

A deterministic recovery flow for the exact setup loop where your agent appears online but backend health checks keep returning 502.

If support or system messages say your runtime is not healthy with health 502, you're usually dealing with a backend startup/reachability issue — not a personality prompt problem.

Important: repeated re-prompts and random config edits rarely fix this. Use a clean isolation flow first.

Fast isolation flow (5 minutes)

1) Confirm scope: one agent or all agents?

Open a second known-good agent (or canary workspace). If only one fails, it is likely workspace/runtime specific. If all fail, it may be broader infrastructure.

2) Confirm surface: dashboard only or all channels?

Test dashboard + one connected channel (Discord/Telegram). Capture exact behavior in each surface. This quickly separates UI path issues from runtime issues.

3) Run one clean browser pass

Try incognito/private window, then one normal window after a hard refresh. Avoid rapid reconnect loops while testing.

4) Wait 60–120 seconds, then retest once

Cold starts and host restarts can produce temporary 502s. One delayed retry is useful; ten immediate retries are noise.

5) If still failing, escalate with evidence (not guesses)

At this point, the likely fix is host-side/runtime intervention. Send a complete packet so support can act immediately.

What this usually means

Runtime/container is not fully booted yet
Upstream dependency is timing out during startup
Agent host health is degraded (server-side)
Network path or control-UI route is unstable while runtime is recovering

High-friction mistakes to avoid

Changing SOUL.md/AGENTS.md repeatedly (not related to runtime health)
Reconnecting integrations over and over before proving scope
Rapid model-switch retries while runtime is unhealthy (often creates extra 500 noise)
Posting “still broken” with no timestamp, no surface test, no exact error text
Mixing old and new errors in one thread without a clean latest repro

Advanced edge case: 502 + model-change 500 combo

A recurring pattern: users see health 502, then try switching models and hit Gateway tool invoke failed: 500. In many cases, the model switch is failing because runtime health is degraded — not because your chosen model ID is wrong.

First, stabilize runtime health (this guide) before testing model changes.
Use one canary model test only after runtime is healthy again.
If both 502 and 500 occur in the same window, report both with timestamps.

Copy/paste escalation packet

I'm seeing: runtime not healthy / health 502. Started around: [time + timezone] Agent/workspace: [name or link] Tested surfaces: - Dashboard: [result] - Discord/Telegram: [result] Scope check: - Other agent works? [yes/no] - This agent only? [yes/no] Retest steps done: - hard refresh - incognito test - waited 60-120s and retried once Related errors in same time window: - model-change 500 seen? [yes/no] - exact message: [paste exact text] Current status: [still failing / intermittent] Please check host/runtime health server-side.

Security reminder: never include API keys, tokens, or passwords in support messages.

Loop-break rule

If the same health 502 persists after one clean retest: stop local random changes and escalate once with the packet above. This saves hours of avoidable churn.

Bottom line: runtime health 502 is usually an infrastructure/runtime condition, not user prompt quality. Prove scope quickly, retest once, then escalate with precise evidence.