Troubleshooting
Agent Burning Too Many Tokens?
Fix runaway token usage with a fast proof-first workflow: smaller turns, tighter outputs, and no-repeat context handoffs.
Community pattern: users hit limits fast, assume billing is broken, and keep retrying giant prompts. Most of the time, this is a prompt-shape and workflow issue—not a model bug.
Reality check: one oversized turn can cost more than ten small turns. The fastest way to save tokens is to force short, staged responses.
2-minute diagnosis checklist
- Are you pasting huge chat history repeatedly?
- Are you asking for giant outputs in one message ("full strategy", "all code", "everything")?
- Are you retrying the same failed prompt without reducing scope?
- Are you mixing multiple tasks in one request?
- Did you ask the agent to be concise and return only final output?
Known-good low-token prompt pattern
Token-aware mode ON.
Goal: [one concrete task only]
Context (max 5 bullets):
- ...
- ...
Constraints:
- Keep response under 180 words
- No markdown tables
- Ask at most one clarifying question (only if blocking)
- If not blocked, execute immediately
Output format:
1) Result
2) Evidence (paths/links/commands used)
3) Next smallest step
High-friction mistakes that spike token burn
- "Summarize everything" loops: asking for giant retrospectives every turn.
- Attachment pileups: re-uploading many files without narrowing target files.
- Multi-goal mega prompts: strategy + implementation + QA + publish in one shot.
- Verbose style lock: forcing long-form explanations when a short checklist would do.
Advanced edge case: external-tool retry spiral
Recurring pattern from community chat: users try to connect external generators (like Midjourney/social tools), then keep retrying the same broad prompt while setup or capability is still unresolved.
- Why it burns tokens: each retry repeats long context, same explanation, and no new signal.
- Fix: run one tiny capability canary first (yes/no), then branch.
Canary check (copy/paste):
1) In one sentence, confirm whether you can directly perform [specific external action] in this session.
2) If NO, return only:
- exact manual handoff step I should do
- exact output artifact you can still produce now
3) Keep response under 120 words.
Do not continue with long production prompts until the canary returns a clear capability answer.
Loop-break rule: after two failed retries, stop. Split into smaller steps or start a fresh session with a compact handoff instead of trying the same giant prompt again.
Compact handoff when starting fresh
New session handoff:
- Objective: [single outcome]
- Completed so far: [3 bullets max]
- Current blocker: [1 sentence]
- Required output: [exact format]
- Hard limits: <= 180 words, no tables, no repeated recap
When to escalate to #help
- Token use stays unusually high even with short prompts and tight output limits.
- You see inconsistent quota behavior across the same model/provider with identical prompts.
- External-tool workflows keep looping between “can do” and “can’t do” across retries.
- You need help separating platform limits from provider key caps/reset windows.
Bottom line: small scoped turns + hard output limits + compact handoffs solve most token-burn loops quickly.