Skip to content
🔒

Login Required

You need to be logged in to view this content. This page requires Member access.

Canister Unresponsive Recovery ​

Last Updated: 2025-12-04 Alert: CanisterUnresponsive Severity: Critical Response Time: < 15 minutes

Overview ​

This runbook covers the recovery procedure when a canister becomes unresponsive and stops answering requests.

Symptoms ​

  • Alert: CanisterUnresponsive
  • 502/503 errors when accessing canister
  • dfx canister status shows "Stopped" or times out
  • Frontend showing connection errors
  • Users unable to access DAO features

Diagnosis ​

Step 1: Check Canister Status ​

bash
export DFX_WARNING=-mainnet_plaintext_identity
dfx canister --network ic status <canister-id>

# Possible statuses:
# - Running: Canister is operational
# - Stopped: Canister is stopped (needs restart)
# - Stopping: Canister is in process of stopping
# - Error: Canister encountered an error

Step 2: Check Cycles Balance ​

bash
dfx canister --network ic status <canister-id> | grep Balance

# If balance is 0 or very low, see cycles-topup.md

Step 3: Check Controller Access ​

bash
# Verify your identity is a controller
dfx identity get-principal

# Check canister controllers
dfx canister --network ic info <canister-id>

Step 4: Check Recent Deployments ​

  1. Review GitHub Actions for recent deployments
  2. Check if canister became unresponsive after deployment
  3. Note the commit SHA of current deployment

Resolution ​

Scenario A: Canister is Stopped ​

Most common case - canister trapped/panicked and stopped:

bash
# Start the canister
dfx canister --network ic start <canister-id>

# Verify it's running
dfx canister --network ic status <canister-id>

# Test functionality
dfx canister --network ic call <canister-id> <test-method>

Scenario B: Canister Out of Cycles ​

If canister has 0 cycles:

bash
# Top up cycles first
dfx ledger --network ic top-up <canister-id> --amount 0.1

# Then start canister
dfx canister --network ic start <canister-id>

See Cycles Top-Up Procedure for details.

Scenario C: Canister in Error State ​

If canister is in error state after crash:

bash
# Try to start it
dfx canister --network ic start <canister-id>

# If that fails, may need to reinstall
# WARNING: This will reset canister state!
dfx canister install <canister-id> --network ic --mode reinstall \
  --wasm /path/to/canister.wasm --yes

Note: Reinstall will reset state. Only use if no other option and data loss is acceptable.

Scenario D: Recent Deployment Caused Issue ​

If canister became unresponsive after deployment:

  1. Rollback to previous version:

    bash
    # Via GitHub Actions
    # Go to Actions > Emergency Rollback > Run workflow
  2. Or manually:

    bash
    git checkout <previous-commit>
    cargo build --release --target wasm32-unknown-unknown
    dfx canister install <canister-id> --network ic --mode upgrade \
      --wasm target/wasm32-unknown-unknown/release/<canister>.wasm --yes

See Deployment Failure Recovery for details.

Scenario E: IC Network Issues ​

If multiple canisters are affected:

  1. Check IC status: https://status.internetcomputer.org
  2. Check IC Dashboard for subnet status
  3. Wait for IC network to recover
  4. Canisters should auto-recover when network is stable

Canister Recovery Commands ​

Basic Commands ​

bash
# Check status
dfx canister --network ic status <canister-id>

# Start stopped canister
dfx canister --network ic start <canister-id>

# Stop running canister
dfx canister --network ic stop <canister-id>

# View logs (if available)
dfx canister --network ic logs <canister-id>

Emergency Commands ​

bash
# Reinstall (WARNING: loses state)
dfx canister install <canister-id> --network ic --mode reinstall \
  --wasm <path-to-wasm> --yes

# Upgrade (preserves state)
dfx canister install <canister-id> --network ic --mode upgrade \
  --wasm <path-to-wasm> --yes

# Add cycles
dfx ledger --network ic top-up <canister-id> --amount 0.1

Common Panic Causes ​

Panic MessageCauseResolution
"out of memory"State too largeOptimize storage, archive old data
"instruction limit exceeded"Computation too heavyOptimize algorithms, paginate
"trap: integer overflow"Arithmetic bugFix code, deploy patch
"cannot find method"API mismatchVerify frontend matches backend
"assertion failed"Logic bugCheck logs, fix code

Post-Resolution ​

Step 1: Verify Recovery ​

bash
# Confirm canister is running
dfx canister --network ic status <canister-id>

# Test basic functionality
dfx canister --network ic call <canister-id> <test-method>

Step 2: Test User Flows ​

  • Test frontend access
  • Test form submissions
  • Test authentication
  • Verify no data loss

Step 3: Monitor ​

  • Watch Grafana dashboard for 30 minutes
  • Verify error rate is normal
  • Confirm no new alerts fire

Step 4: Root Cause Analysis ​

For critical incidents:

  1. Document timeline
  2. Identify what caused the crash
  3. Create tickets for:
    • Bug fixes
    • Better error handling
    • Monitoring improvements

Prevention ​

Code Quality ​

  • Use Result types instead of panicking
  • Add comprehensive input validation
  • Set reasonable computation limits
  • Test with large datasets

Monitoring ​

  • Set up alerts for cycles balance
  • Monitor memory usage trends
  • Track error rates by method
  • Enable canister logging

Deployment Safety ​

  • Test thoroughly on staging
  • Monitor closely after deployment
  • Have rollback procedure ready
  • Use incremental upgrades

Escalation ​

ConditionAction
Canister won't startContact senior engineer
Data loss suspectedContact team lead immediately
Multiple canisters affectedCheck IC network status
IC subnet issueContact DFINITY support

Canister IDs Reference ​

CanisterIDPurpose
Frontend (Staging)vlmti-wqaaa-aaaad-acoiq-caiUI hosting
Add more as deployed

Hello World Co-Op DAO