Skip to content

Checking access...

Canister Unresponsive Recovery

Last Updated: 2025-12-04 Alert: CanisterUnresponsive Severity: Critical Response Time: < 15 minutes

Overview

This runbook covers the recovery procedure when a canister becomes unresponsive and stops answering requests.

Symptoms

  • Alert: CanisterUnresponsive
  • 502/503 errors when accessing canister
  • dfx canister status shows "Stopped" or times out
  • Frontend showing connection errors
  • Users unable to access DAO features

Diagnosis

Step 1: Check Canister Status

bash
export DFX_WARNING=-mainnet_plaintext_identity
dfx canister --network ic status <canister-id>

# Possible statuses:
# - Running: Canister is operational
# - Stopped: Canister is stopped (needs restart)
# - Stopping: Canister is in process of stopping
# - Error: Canister encountered an error

Step 2: Check Cycles Balance

bash
dfx canister --network ic status <canister-id> | grep Balance

# If balance is 0 or very low, see cycles-topup.md

Step 3: Check Controller Access

bash
# Verify your identity is a controller
dfx identity get-principal

# Check canister controllers
dfx canister --network ic info <canister-id>

Step 4: Check Recent Deployments

  1. Review GitHub Actions for recent deployments
  2. Check if canister became unresponsive after deployment
  3. Note the commit SHA of current deployment

Resolution

Scenario A: Canister is Stopped

Most common case - canister trapped/panicked and stopped:

bash
# Start the canister
dfx canister --network ic start <canister-id>

# Verify it's running
dfx canister --network ic status <canister-id>

# Test functionality
dfx canister --network ic call <canister-id> <test-method>

Scenario B: Canister Out of Cycles

If canister has 0 cycles:

bash
# Top up cycles first
dfx ledger --network ic top-up <canister-id> --amount 0.1

# Then start canister
dfx canister --network ic start <canister-id>

See Cycles Top-Up Procedure for details.

Scenario C: Canister in Error State

If canister is in error state after crash:

bash
# Try to start it
dfx canister --network ic start <canister-id>

# If that fails, may need to reinstall
# WARNING: This will reset canister state!
dfx canister install <canister-id> --network ic --mode reinstall \
  --wasm /path/to/canister.wasm --yes

Note: Reinstall will reset state. Only use if no other option and data loss is acceptable.

Scenario D: Recent Deployment Caused Issue

If canister became unresponsive after deployment:

  1. Rollback to previous version:

    bash
    # Via GitHub Actions
    # Go to Actions > Emergency Rollback > Run workflow
  2. Or manually:

    bash
    git checkout <previous-commit>
    cargo build --release --target wasm32-unknown-unknown
    dfx canister install <canister-id> --network ic --mode upgrade \
      --wasm target/wasm32-unknown-unknown/release/<canister>.wasm --yes

See Deployment Failure Recovery for details.

Scenario E: IC Network Issues

If multiple canisters are affected:

  1. Check IC status: https://status.internetcomputer.org
  2. Check IC Dashboard for subnet status
  3. Wait for IC network to recover
  4. Canisters should auto-recover when network is stable

Canister Recovery Commands

Basic Commands

bash
# Check status
dfx canister --network ic status <canister-id>

# Start stopped canister
dfx canister --network ic start <canister-id>

# Stop running canister
dfx canister --network ic stop <canister-id>

# View logs (if available)
dfx canister --network ic logs <canister-id>

Emergency Commands

bash
# Reinstall (WARNING: loses state)
dfx canister install <canister-id> --network ic --mode reinstall \
  --wasm <path-to-wasm> --yes

# Upgrade (preserves state)
dfx canister install <canister-id> --network ic --mode upgrade \
  --wasm <path-to-wasm> --yes

# Add cycles
dfx ledger --network ic top-up <canister-id> --amount 0.1

Common Panic Causes

Panic MessageCauseResolution
"out of memory"State too largeOptimize storage, archive old data
"instruction limit exceeded"Computation too heavyOptimize algorithms, paginate
"trap: integer overflow"Arithmetic bugFix code, deploy patch
"cannot find method"API mismatchVerify frontend matches backend
"assertion failed"Logic bugCheck logs, fix code

Post-Resolution

Step 1: Verify Recovery

bash
# Confirm canister is running
dfx canister --network ic status <canister-id>

# Test basic functionality
dfx canister --network ic call <canister-id> <test-method>

Step 2: Test User Flows

  • Test frontend access
  • Test form submissions
  • Test authentication
  • Verify no data loss

Step 3: Monitor

  • Watch Grafana dashboard for 30 minutes
  • Verify error rate is normal
  • Confirm no new alerts fire

Step 4: Root Cause Analysis

For critical incidents:

  1. Document timeline
  2. Identify what caused the crash
  3. Create tickets for:
    • Bug fixes
    • Better error handling
    • Monitoring improvements

Prevention

Code Quality

  • Use Result types instead of panicking
  • Add comprehensive input validation
  • Set reasonable computation limits
  • Test with large datasets

Monitoring

  • Set up alerts for cycles balance
  • Monitor memory usage trends
  • Track error rates by method
  • Enable canister logging

Deployment Safety

  • Test thoroughly on staging
  • Monitor closely after deployment
  • Have rollback procedure ready
  • Use incremental upgrades

Escalation

ConditionAction
Canister won't startContact senior engineer
Data loss suspectedContact team lead immediately
Multiple canisters affectedCheck IC network status
IC subnet issueContact DFINITY support

Canister IDs Reference

CanisterIDPurpose
Frontend (Staging)vlmti-wqaaa-aaaad-acoiq-caiUI hosting
Add more as deployed

Hello World Co-Op DAO