Canister Unresponsive Recovery ​
Last Updated: 2025-12-04 Alert: CanisterUnresponsive Severity: Critical Response Time: < 15 minutes
Overview ​
This runbook covers the recovery procedure when a canister becomes unresponsive and stops answering requests.
Symptoms ​
- Alert:
CanisterUnresponsive - 502/503 errors when accessing canister
dfx canister statusshows "Stopped" or times out- Frontend showing connection errors
- Users unable to access DAO features
Diagnosis ​
Step 1: Check Canister Status ​
bash
export DFX_WARNING=-mainnet_plaintext_identity
dfx canister --network ic status <canister-id>
# Possible statuses:
# - Running: Canister is operational
# - Stopped: Canister is stopped (needs restart)
# - Stopping: Canister is in process of stopping
# - Error: Canister encountered an errorStep 2: Check Cycles Balance ​
bash
dfx canister --network ic status <canister-id> | grep Balance
# If balance is 0 or very low, see cycles-topup.mdStep 3: Check Controller Access ​
bash
# Verify your identity is a controller
dfx identity get-principal
# Check canister controllers
dfx canister --network ic info <canister-id>Step 4: Check Recent Deployments ​
- Review GitHub Actions for recent deployments
- Check if canister became unresponsive after deployment
- Note the commit SHA of current deployment
Resolution ​
Scenario A: Canister is Stopped ​
Most common case - canister trapped/panicked and stopped:
bash
# Start the canister
dfx canister --network ic start <canister-id>
# Verify it's running
dfx canister --network ic status <canister-id>
# Test functionality
dfx canister --network ic call <canister-id> <test-method>Scenario B: Canister Out of Cycles ​
If canister has 0 cycles:
bash
# Top up cycles first
dfx ledger --network ic top-up <canister-id> --amount 0.1
# Then start canister
dfx canister --network ic start <canister-id>See Cycles Top-Up Procedure for details.
Scenario C: Canister in Error State ​
If canister is in error state after crash:
bash
# Try to start it
dfx canister --network ic start <canister-id>
# If that fails, may need to reinstall
# WARNING: This will reset canister state!
dfx canister install <canister-id> --network ic --mode reinstall \
--wasm /path/to/canister.wasm --yesNote: Reinstall will reset state. Only use if no other option and data loss is acceptable.
Scenario D: Recent Deployment Caused Issue ​
If canister became unresponsive after deployment:
Rollback to previous version:
bash# Via GitHub Actions # Go to Actions > Emergency Rollback > Run workflowOr manually:
bashgit checkout <previous-commit> cargo build --release --target wasm32-unknown-unknown dfx canister install <canister-id> --network ic --mode upgrade \ --wasm target/wasm32-unknown-unknown/release/<canister>.wasm --yes
See Deployment Failure Recovery for details.
Scenario E: IC Network Issues ​
If multiple canisters are affected:
- Check IC status: https://status.internetcomputer.org
- Check IC Dashboard for subnet status
- Wait for IC network to recover
- Canisters should auto-recover when network is stable
Canister Recovery Commands ​
Basic Commands ​
bash
# Check status
dfx canister --network ic status <canister-id>
# Start stopped canister
dfx canister --network ic start <canister-id>
# Stop running canister
dfx canister --network ic stop <canister-id>
# View logs (if available)
dfx canister --network ic logs <canister-id>Emergency Commands ​
bash
# Reinstall (WARNING: loses state)
dfx canister install <canister-id> --network ic --mode reinstall \
--wasm <path-to-wasm> --yes
# Upgrade (preserves state)
dfx canister install <canister-id> --network ic --mode upgrade \
--wasm <path-to-wasm> --yes
# Add cycles
dfx ledger --network ic top-up <canister-id> --amount 0.1Common Panic Causes ​
| Panic Message | Cause | Resolution |
|---|---|---|
| "out of memory" | State too large | Optimize storage, archive old data |
| "instruction limit exceeded" | Computation too heavy | Optimize algorithms, paginate |
| "trap: integer overflow" | Arithmetic bug | Fix code, deploy patch |
| "cannot find method" | API mismatch | Verify frontend matches backend |
| "assertion failed" | Logic bug | Check logs, fix code |
Post-Resolution ​
Step 1: Verify Recovery ​
bash
# Confirm canister is running
dfx canister --network ic status <canister-id>
# Test basic functionality
dfx canister --network ic call <canister-id> <test-method>Step 2: Test User Flows ​
- Test frontend access
- Test form submissions
- Test authentication
- Verify no data loss
Step 3: Monitor ​
- Watch Grafana dashboard for 30 minutes
- Verify error rate is normal
- Confirm no new alerts fire
Step 4: Root Cause Analysis ​
For critical incidents:
- Document timeline
- Identify what caused the crash
- Create tickets for:
- Bug fixes
- Better error handling
- Monitoring improvements
Prevention ​
Code Quality ​
- Use
Resulttypes instead of panicking - Add comprehensive input validation
- Set reasonable computation limits
- Test with large datasets
Monitoring ​
- Set up alerts for cycles balance
- Monitor memory usage trends
- Track error rates by method
- Enable canister logging
Deployment Safety ​
- Test thoroughly on staging
- Monitor closely after deployment
- Have rollback procedure ready
- Use incremental upgrades
Escalation ​
| Condition | Action |
|---|---|
| Canister won't start | Contact senior engineer |
| Data loss suspected | Contact team lead immediately |
| Multiple canisters affected | Check IC network status |
| IC subnet issue | Contact DFINITY support |
Canister IDs Reference ​
| Canister | ID | Purpose |
|---|---|---|
| Frontend (Staging) | vlmti-wqaaa-aaaad-acoiq-cai | UI hosting |
| Add more as deployed |