Troubleshooting

Agent Burning Too Many Tokens?

Fix runaway token usage with a fast proof-first workflow: smaller turns, tighter outputs, and no-repeat context handoffs.

Community pattern: users hit limits fast, assume billing is broken, and keep retrying giant prompts. Most of the time, this is a prompt-shape and workflow issue—not a model bug.

Reality check: one oversized turn can cost more than ten small turns. The fastest way to save tokens is to force short, staged responses.

2-minute diagnosis checklist

Are you pasting huge chat history repeatedly?
Are you asking for giant outputs in one message ("full strategy", "all code", "everything")?
Are you retrying the same failed prompt without reducing scope?
Are you mixing multiple tasks in one request?
Did you ask the agent to be concise and return only final output?

Known-good low-token prompt pattern

Token-aware mode ON. Goal: [one concrete task only] Context (max 5 bullets): - ... - ... Constraints: - Keep response under 180 words - No markdown tables - Ask at most one clarifying question (only if blocking) - If not blocked, execute immediately Output format: 1) Result 2) Evidence (paths/links/commands used) 3) Next smallest step

High-friction mistakes that spike token burn

"Summarize everything" loops: asking for giant retrospectives every turn.
Attachment pileups: re-uploading many files without narrowing target files.
Multi-goal mega prompts: strategy + implementation + QA + publish in one shot.
Verbose style lock: forcing long-form explanations when a short checklist would do.

Advanced edge case: external-tool retry spiral

Recurring pattern from community chat: users try to connect external generators (like Midjourney/social tools), then keep retrying the same broad prompt while setup or capability is still unresolved.

Why it burns tokens: each retry repeats long context, same explanation, and no new signal.
Fix: run one tiny capability canary first (yes/no), then branch.

Canary check (copy/paste): 1) In one sentence, confirm whether you can directly perform [specific external action] in this session. 2) If NO, return only: - exact manual handoff step I should do - exact output artifact you can still produce now 3) Keep response under 120 words.

Do not continue with long production prompts until the canary returns a clear capability answer.

Loop-break rule: after two failed retries, stop. Split into smaller steps or start a fresh session with a compact handoff instead of trying the same giant prompt again.

Compact handoff when starting fresh

New session handoff: - Objective: [single outcome] - Completed so far: [3 bullets max] - Current blocker: [1 sentence] - Required output: [exact format] - Hard limits: <= 180 words, no tables, no repeated recap

When to escalate to #help

Token use stays unusually high even with short prompts and tight output limits.
You see inconsistent quota behavior across the same model/provider with identical prompts.
External-tool workflows keep looping between “can do” and “can’t do” across retries.
You need help separating platform limits from provider key caps/reset windows.

Bottom line: small scoped turns + hard output limits + compact handoffs solve most token-burn loops quickly.