Skip to content
🔒

Login Required

You need to be logged in to view this content. This page requires Developer access.

Hybrid Architecture: Encrypted Canister + Plaintext Database

Story: 2-5-2 Database Sync After Email Verification Version: 1.0 Last Updated: 2025-11-25

Overview

The Hello World DAO platform uses a hybrid data architecture that balances privacy, security, and operational requirements. User PII exists in two locations with different encryption states:

  1. User-Service Canister (authoritative): Encrypted PII with zero-knowledge architecture
  2. PostgreSQL Database (convenience): Plaintext PII for marketing and operations

This document explains the architectural rationale, security model, and operational implications.


Architecture Diagram

┌─────────────────────────────────────────────────────────────────────┐
│                         Frontend (React)                            │
│                                                                     │
│  User submits plaintext PII during registration/verification       │
│  - Email (plaintext)                                                │
│  - First Name (plaintext)                                           │
│  - Last Name (plaintext)                                            │
│  - Password (hashed client-side, never sent)                        │
└────────┬──────────────────────────────────────────────┬─────────────┘
         │                                               │
         │ Registration                                  │ Verification
         │                                               │ (email, code, first, last)
         ▼                                               ▼
┌────────────────────────────────────┐   ┌─────────────────────────────────┐
│   User-Service Canister (IC)      │   │   User-Service Canister (IC)    │
│   ENCRYPTED STORAGE (AES-256-GCM)  │   │   ENCRYPTED STORAGE + SYNC      │
├────────────────────────────────────┤   ├─────────────────────────────────┤
│ ✅ email_encrypted                 │   │ 1. Validate verification code   │
│ ✅ first_name_encrypted            │   │ 2. Mark user as verified        │
│ ✅ last_name_encrypted             │   │ 3. Spawn async sync task:       │
│ ✅ recovery_key_encrypted          │   │    - Build JSON with plaintext  │
│ ✅ email_hash (SHA-256)            │   │    - Sign with Ed25519          │
│ ❌ No plaintext stored             │   │    - HTTP POST to oracle-bridge │
└────────────────────────────────────┘   └─────────────┬───────────────────┘
         ▲                                              │
         │                                              │ Signed HTTP POST
         │ Client-side encryption                       │ /sync/individual
         │ (uses user password)                         │
         │                                              ▼
┌────────────────────────────────────┐   ┌─────────────────────────────────┐
│      Frontend Encryption Layer     │   │   Oracle-Bridge (Node.js)       │
│                                    │   │   SIGNATURE VERIFICATION        │
│ Master Key = KDF(password, salt)   │   ├─────────────────────────────────┤
│ Encrypt PII before sending         │   │ 1. Verify Ed25519 signature     │
│ Zero-knowledge: canister never     │   │ 2. Validate timestamp (5 min)   │
│ sees plaintext password or key     │   │ 3. Validate required fields     │
└────────────────────────────────────┘   │ 4. UPSERT to PostgreSQL         │
                                         │ 5. Log audit event              │
                                         └─────────────┬───────────────────┘

                                                       │ UPSERT
                                                       │ (plaintext)

                                         ┌─────────────────────────────────┐
                                         │   PostgreSQL Database           │
                                         │   PLAINTEXT STORAGE             │
                                         ├─────────────────────────────────┤
                                         │ ✅ email (plaintext)            │
                                         │ ✅ first_name (plaintext)       │
                                         │ ✅ last_name (plaintext)        │
                                         │ ✅ email_hash (for lookup)      │
                                         │ ✅ verified, timestamps         │
                                         │ ❌ No recovery keys             │
                                         │ ❌ No passwords                 │
                                         └─────────────────────────────────┘

                                                       │ Queries

                                         ┌─────────────────────────────────┐
                                         │   Marketing Tools               │
                                         │   - Email campaigns             │
                                         │   - CSV exports                 │
                                         │   - Analytics queries           │
                                         │   - CRM integration             │
                                         └─────────────────────────────────┘

Design Rationale

Why Encrypted Canister Storage?

Privacy and Security:

  • Users retain full control over their data via password-derived keys
  • Canister cannot decrypt PII without user's password
  • Even DAO admins cannot access plaintext PII from canister
  • Meets zero-knowledge privacy standards

Regulatory Compliance:

  • GDPR "right to be forgotten" - user can delete their key
  • Data minimization - only encrypted data stored on-chain
  • User consent required for decryption (password entry)

Disaster Recovery:

  • Recovery keys enable account restoration without password
  • Multi-party key splitting possible for enterprise accounts
  • Canister data backed up via IC subnet replication

Why Plaintext Database Storage?

Operational Requirements:

  • Marketing teams need to export email lists
  • CRM systems require plaintext contact data
  • Email campaigns cannot send to encrypted addresses
  • Analytics queries need to group by demographics

Performance:

  • Database queries on encrypted data are slow
  • Text search requires plaintext (LIKE, ILIKE queries)
  • Indexes only work on plaintext columns

Cost Efficiency:

  • IC canister storage is expensive (cycles cost)
  • PostgreSQL is cheaper for high-volume queries
  • Database can be rebuilt from canister (not authoritative)

Why Hybrid (Both)?

Principle: Different storage layers serve different purposes.

RequirementCanister (Encrypted)Database (Plaintext)
User privacy✅ Best❌ Moderate
Marketing operations❌ Impossible✅ Easy
Disaster recovery✅ Authoritative❌ Convenience copy
Query performance❌ Slow✅ Fast
Cost (high volume)❌ Expensive✅ Cheap
Regulatory compliance✅ Strong⚠️ Requires controls

Conclusion: Neither solution alone meets all requirements. Hybrid approach provides:

  • Strong privacy via canister encryption
  • Operational efficiency via database plaintext
  • Clear authority (canister is source of truth)
  • Recovery path (database can be rebuilt)

Security Model

Threat Model

Protected Against:

  • Database breach: Attackers get plaintext PII but cannot access encrypted canister data or recovery keys
  • Canister breach: Attackers get encrypted PII but cannot decrypt without user passwords
  • Insider threat: DAO admins cannot decrypt canister data without user cooperation
  • Man-in-the-middle: HTTP outcalls use HTTPS + Ed25519 signatures
  • Replay attacks: Timestamp validation (5-minute window)

Not Protected Against:

  • Compromised user password: Attacker with password can decrypt canister PII
  • Client-side malware: Keylogger captures password during entry
  • Social engineering: User tricked into revealing password or recovery key
  • Quantum computing (future): AES-256 may be vulnerable in 10-20 years

Trust Boundaries

┌──────────────────────────────────────────────────────────────┐
│ TRUSTED ZONE: User's Browser                                 │
│ - Password never leaves this zone                            │
│ - Encryption/decryption happens here                         │
│ - Master key derived from password (KDF)                     │
└──────────────────────────────────────────────────────────────┘

                           │ Encrypted PII

┌──────────────────────────────────────────────────────────────┐
│ ZERO-KNOWLEDGE ZONE: User-Service Canister                   │
│ - Stores encrypted PII only                                  │
│ - Never sees plaintext password or master key                │
│ - Can verify email_hash without seeing email                 │
└──────────────────────────────────────────────────────────────┘

                           │ Signed HTTPS

┌──────────────────────────────────────────────────────────────┐
│ SEMI-TRUSTED ZONE: Oracle-Bridge + Database                  │
│ - Signature verification prevents tampering                  │
│ - Stores plaintext PII for verified users only               │
│ - Subject to access controls and audit logging               │
│ - NOT authoritative (can be lost and rebuilt)                │
└──────────────────────────────────────────────────────────────┘

Encryption Specifications

Canister Encryption (AES-256-GCM):

  • Algorithm: AES-256-GCM (Authenticated Encryption)
  • Key derivation: Argon2id (password → master key)
  • Salt: Random 16 bytes per user
  • Nonce: Random 12 bytes per field
  • Authentication: GCM tag prevents tampering

Signature Authentication (Ed25519):

  • Algorithm: Ed25519 (Curve25519)
  • Key size: 256 bits
  • Message format: timestamp + method + path + body
  • Signature verification: Oracle-bridge validates before accepting sync

Password Hashing (Argon2id):

  • Not used for database sync (password never leaves client)
  • Used only for client-side key derivation

Data Flow

Registration Flow

1. User enters email, password, first_name, last_name in frontend
2. Frontend derives master key from password (Argon2id)
3. Frontend encrypts PII with master key (AES-256-GCM)
4. Frontend sends encrypted PII to canister
5. Canister stores encrypted PII + email_hash
6. Canister sends verification email via oracle-bridge (plaintext)
7. User remains unverified until code entry

Key Point: Database is NOT synced during registration (only verified users).

Verification Flow (Triggers Database Sync)

1. User receives verification code via email
2. User enters verification code + email + first_name + last_name
3. Frontend sends plaintext PII to canister (verify_code call)
4. Canister validates code against email_hash
5. Canister marks user as verified (encrypted record)
6. Canister spawns async sync task:
   a. Build JSON with plaintext PII
   b. Generate Ed25519 signature
   c. HTTP POST to oracle-bridge /sync/individual
7. Oracle-bridge validates signature
8. Oracle-bridge UPSERTs plaintext PII to PostgreSQL
9. Canister returns success to frontend (sync failure doesn't block)

Key Point: Plaintext PII submitted during verification enables database sync.

Query Flows

Encrypted Query (Canister):

1. User enters password in frontend
2. Frontend derives master key from password
3. Frontend requests encrypted PII from canister
4. Canister returns encrypted PII
5. Frontend decrypts with master key
6. User sees plaintext in browser

Plaintext Query (Database):

1. Marketing tool queries PostgreSQL
2. Database returns plaintext PII
3. Tool displays email list, exports CSV, etc.

No decryption needed - database stores plaintext.


Operational Implications

Database Loss Scenarios

ScenarioImpactRecovery
Complete DB lossMarketing operations stopRebuild from canister (metadata only), users re-verify for PII
Partial corruptionSome users missingTargeted resync via admin endpoint
RansomwareDatabase encrypted by attackerIgnore ransom, rebuild from canister

Recovery Time: 10-30 minutes (automated script).

Data Loss: Plaintext PII requires user re-verification (architectural limitation).

Canister Loss Scenarios

ScenarioImpactRecovery
Canister wipedTotal data lossDeploy from backup WASM + state snapshot
Subnet failureTemporary outageWait for IC subnet recovery (redundant)
Upgrade failureCanister stuckRollback to previous WASM version

Recovery Time: Minutes to hours (depends on IC recovery procedures).

Data Loss: If backup available, none. If no backup, catastrophic (database has PII but no recovery keys).

Sync Failure Scenarios

ScenarioImpactMitigation
Network outageSync fails, verification succeedsAutomatic retry on next verification
Oracle-bridge downSync fails, verification succeedsManual resync via admin endpoint
Invalid signatureSync rejected, verification succeedsRe-register canister public key
Database fullSync fails, verification succeedsIncrease storage, resync

Key Principle: Sync failure NEVER blocks user verification (fire-and-forget pattern).


Privacy and Compliance

GDPR Compliance

Right to Access:

  • User can decrypt their canister PII with password
  • Database PII available via authenticated API

Right to Rectification:

  • User updates trigger sync to database
  • Database is convenience copy, not authoritative

Right to Erasure:

  • User requests deletion via canister
  • Canister soft-deletes (GDPR flag)
  • Sync propagates gdpr_deleted = true to database
  • Marketing queries exclude deleted users

Right to Data Portability:

  • User can export decrypted PII from canister
  • Database export available via admin tools

Opt-In Model:

  • Default: gdpr_marketing_consent = true
  • User can opt-out anytime
  • Database queries respect consent flag

Query Pattern:

sql
SELECT email, first_name, last_name
FROM individuals
WHERE verified = true
  AND gdpr_deleted = false
  AND gdpr_marketing_consent = true;

Audit Trail

Database Audit Log:

  • All sync operations logged
  • Tracks: action, table, record ID, timestamp, canister ID
  • Retention: 90 days minimum

Canister Audit Log:

  • All decryption attempts logged
  • Tracks: user ID, timestamp, IP hash
  • Retention: 1 year

Migration Path

Current State

v1.0 (Story 2-5-2):

  • Canister: Encrypted PII + recovery keys
  • Database: Plaintext PII for verified users only
  • Sync: Automatic after email verification
  • Resync: Manual via admin endpoint + batch script

Future Enhancements

Planned (Epic 2.5, Story 2-5-3):

  • Automated database backups (daily)
  • SSL/TLS enforcement (certificate pinning)
  • Monitoring dashboards (Grafana)
  • Disaster recovery automation
  • Point-in-time recovery

Considered (Epic 3):

  • Homomorphic encryption for database (query encrypted data)
  • Differential privacy for analytics
  • Federated learning for ML models
  • Multi-party computation for admin access

Alternatives Considered

Alternative 1: Encrypted Database Only

Approach: Store encrypted PII in database, decrypt on-demand.

Rejected Because:

  • Database queries on encrypted data are very slow
  • Text search (LIKE queries) impossible
  • Indexes don't work on encrypted columns
  • Marketing tools would need major rework

Alternative 2: Plaintext Canister Only

Approach: Store plaintext PII in canister, no encryption.

Rejected Because:

  • Violates zero-knowledge principle
  • DAO admins could access all PII
  • Regulatory risk (GDPR, CCPA)
  • User privacy expectations not met

Alternative 3: Client-Side Database Sync

Approach: Frontend syncs to database directly after verification.

Rejected Because:

  • Requires database credentials in frontend (security risk)
  • CORS complications
  • No signature authentication (anyone could write)
  • Audit trail more difficult

Alternative 4: No Database, Canister Only

Approach: Store encrypted PII in canister, marketing queries via canister.

Rejected Because:

  • Canister query costs scale linearly with data volume
  • No JOIN support (multi-table queries)
  • Export to CSV very slow (cycles cost)
  • CRM integration requires REST APIs (canister API is Candid)

Conclusion: Hybrid architecture is the optimal balance.


Technical Debt and Limitations

Known Limitations

  1. Plaintext PII Loss on Rebuild:

    • Database rebuild only recovers metadata
    • Users must re-verify to populate plaintext fields
    • Impact: User friction after disaster recovery
    • Mitigation: Encourage early re-verification via dashboard banner
  2. No Homomorphic Encryption:

    • Cannot query encrypted canister data directly
    • Requires decryption in browser for user access
    • Impact: Performance overhead for user-facing queries
    • Future: Explore FHE libraries (expensive computationally)
  3. Single Database Instance:

    • No read replicas yet
    • Single point of failure (mitigated by rebuild capability)
    • Impact: Database downtime affects marketing operations
    • Future: Add PostgreSQL replication (Epic 2.5)
  4. Signature Replay Window:

    • 5-minute timestamp validation allows replay within window
    • Impact: Attacker could replay sync request multiple times
    • Mitigation: Idempotent UPSERT (no harm from replay)

Technical Debt

  1. No Encrypted Database Backup:

    • Database backups are plaintext
    • Risk: Backup leaks expose all user PII
    • Mitigation: Encrypt backups at rest (planned Epic 2.5)
  2. Canister Signing Key Rotation:

    • No automated key rotation
    • Risk: Compromised signing key affects all syncs
    • Mitigation: Manual rotation procedure documented
  3. Monitoring Gaps:

    • No alerting for sync failures yet
    • Risk: Silent sync failures go unnoticed
    • Mitigation: Add monitoring (Epic 2.5, Story 2-5-3)

Performance Characteristics

Sync Performance

Automatic Sync (per user):

  • Trigger: Email verification completion
  • Latency: 100-500ms (HTTP outcall)
  • Throughput: 100+ syncs/minute
  • Cycles Cost: ~50M cycles per sync

Batch Resync (full database):

  • Users: 1000
  • Time: ~10 minutes (pagination overhead)
  • Database Load: Moderate (UPSERT is efficient)

Query Performance

Canister Queries (encrypted):

  • Latency: 200-1000ms (decryption in browser)
  • Throughput: 10-50 queries/second
  • Cycles Cost: Query is free, but decryption is client-side

Database Queries (plaintext):

  • Latency: 5-50ms (indexed lookups)
  • Throughput: 1000+ queries/second
  • Cost: Minimal (PostgreSQL is fast)

Conclusion: Database is 10-100x faster for high-volume queries.



Change Log

VersionDateAuthorChanges
1.02025-11-25SystemInitial architecture documentation

Hello World Co-Op DAO