Troubleshooting
Heartbeat/Cron Broken or Just Misconfigured?
A deterministic 5-minute checklist to tell the difference between a platform issue and a local setup problem.
When reminders miss, heartbeat runs go quiet, or schedules drift, most people immediately assume “the scheduler is down.” Sometimes that’s true. Usually, it’s a misconfig you can prove in minutes.
Most common failure mode
Users change timezone, delivery destination, or session context and keep retesting old assumptions. One clean canary run with proof beats ten blind retries.
Step 1) Confirm What “Failed” Actually Means
- Never use “didn’t work” as the signal. Identify exactly what failed:
- job never executed
- job executed but delivered nowhere
- executed late (timezone/window issue)
- executed in wrong context/session
- Capture one concrete timestamp (with timezone) from your expected run.
Step 2) Run a Tiny Canary (Now + 2 Minutes)
Create one disposable reminder with obvious text, short delay, and known destination. Keep it minimal.
Reminder text: "CANARY: scheduler test from [surface/context]"
When: now + 2 minutes
Destination: same place you're watching
Expected result: one message, once, with exact canary text
Interpretation rule
If canary fires correctly, your scheduler pipeline is healthy. Your original job likely has configuration drift (time expression, destination, or context mismatch).
Step 3) Verify the Three Drift Zones
A) Time drift (cron expression + timezone)
- Confirm schedule and timezone are paired intentionally (e.g.,
America/New_York vs server default UTC).
- If the message arrives at the wrong hour/day, this is almost always timezone mismatch, not scheduler outage.
B) Destination drift (where output is delivered)
- Check whether delivery goes to main session, isolated run announce, or a specific channel.
- If run history shows success but no visible message, destination is wrong or not monitored.
C) Context drift (wrong session)
- Jobs bound to a different session can appear “missing” even when successful.
- Recreate with explicit session target instead of assuming the current chat thread is implicit.
Step 4) Use This One-Paste Triage Packet
When escalating to support, paste this once:
Symptom: (never ran / ran late / ran but no message / wrong place)
Expected time + timezone:
Actual observed time + timezone:
Canary test run (now+2m): (pass/fail + exact output)
Schedule expression used:
Session target + delivery mode:
Where I looked for output:
1 screenshot or log snippet:
Fast Decision Tree
- Canary passes: local config issue (time, destination, or context drift).
- Canary fails + no run evidence: possible scheduler/runtime issue, escalate with packet.
- Run evidence exists but no message: delivery/session mismatch, not core scheduler failure.
Loop-break rule
Don’t spam retries with the same broken config. Change one variable at a time and rerun a fresh canary so you can prove causality.