Hybrid Architecture: Encrypted Canister + Plaintext Database
Story: 2-5-2 Database Sync After Email Verification Version: 1.0 Last Updated: 2025-11-25
Overview
The Hello World DAO platform uses a hybrid data architecture that balances privacy, security, and operational requirements. User PII exists in two locations with different encryption states:
- User-Service Canister (authoritative): Encrypted PII with zero-knowledge architecture
- PostgreSQL Database (convenience): Plaintext PII for marketing and operations
This document explains the architectural rationale, security model, and operational implications.
Architecture Diagram
┌─────────────────────────────────────────────────────────────────────┐
│ Frontend (React) │
│ │
│ User submits plaintext PII during registration/verification │
│ - Email (plaintext) │
│ - First Name (plaintext) │
│ - Last Name (plaintext) │
│ - Password (hashed client-side, never sent) │
└────────┬──────────────────────────────────────────────┬─────────────┘
│ │
│ Registration │ Verification
│ │ (email, code, first, last)
▼ ▼
┌────────────────────────────────────┐ ┌─────────────────────────────────┐
│ User-Service Canister (IC) │ │ User-Service Canister (IC) │
│ ENCRYPTED STORAGE (AES-256-GCM) │ │ ENCRYPTED STORAGE + SYNC │
├────────────────────────────────────┤ ├─────────────────────────────────┤
│ ✅ email_encrypted │ │ 1. Validate verification code │
│ ✅ first_name_encrypted │ │ 2. Mark user as verified │
│ ✅ last_name_encrypted │ │ 3. Spawn async sync task: │
│ ✅ recovery_key_encrypted │ │ - Build JSON with plaintext │
│ ✅ email_hash (SHA-256) │ │ - Sign with Ed25519 │
│ ❌ No plaintext stored │ │ - HTTP POST to oracle-bridge │
└────────────────────────────────────┘ └─────────────┬───────────────────┘
▲ │
│ │ Signed HTTP POST
│ Client-side encryption │ /sync/individual
│ (uses user password) │
│ ▼
┌────────────────────────────────────┐ ┌─────────────────────────────────┐
│ Frontend Encryption Layer │ │ Oracle-Bridge (Node.js) │
│ │ │ SIGNATURE VERIFICATION │
│ Master Key = KDF(password, salt) │ ├─────────────────────────────────┤
│ Encrypt PII before sending │ │ 1. Verify Ed25519 signature │
│ Zero-knowledge: canister never │ │ 2. Validate timestamp (5 min) │
│ sees plaintext password or key │ │ 3. Validate required fields │
└────────────────────────────────────┘ │ 4. UPSERT to PostgreSQL │
│ 5. Log audit event │
└─────────────┬───────────────────┘
│
│ UPSERT
│ (plaintext)
▼
┌─────────────────────────────────┐
│ PostgreSQL Database │
│ PLAINTEXT STORAGE │
├─────────────────────────────────┤
│ ✅ email (plaintext) │
│ ✅ first_name (plaintext) │
│ ✅ last_name (plaintext) │
│ ✅ email_hash (for lookup) │
│ ✅ verified, timestamps │
│ ❌ No recovery keys │
│ ❌ No passwords │
└─────────────────────────────────┘
│
│ Queries
▼
┌─────────────────────────────────┐
│ Marketing Tools │
│ - Email campaigns │
│ - CSV exports │
│ - Analytics queries │
│ - CRM integration │
└─────────────────────────────────┘Design Rationale
Why Encrypted Canister Storage?
Privacy and Security:
- Users retain full control over their data via password-derived keys
- Canister cannot decrypt PII without user's password
- Even DAO admins cannot access plaintext PII from canister
- Meets zero-knowledge privacy standards
Regulatory Compliance:
- GDPR "right to be forgotten" - user can delete their key
- Data minimization - only encrypted data stored on-chain
- User consent required for decryption (password entry)
Disaster Recovery:
- Recovery keys enable account restoration without password
- Multi-party key splitting possible for enterprise accounts
- Canister data backed up via IC subnet replication
Why Plaintext Database Storage?
Operational Requirements:
- Marketing teams need to export email lists
- CRM systems require plaintext contact data
- Email campaigns cannot send to encrypted addresses
- Analytics queries need to group by demographics
Performance:
- Database queries on encrypted data are slow
- Text search requires plaintext (LIKE, ILIKE queries)
- Indexes only work on plaintext columns
Cost Efficiency:
- IC canister storage is expensive (cycles cost)
- PostgreSQL is cheaper for high-volume queries
- Database can be rebuilt from canister (not authoritative)
Why Hybrid (Both)?
Principle: Different storage layers serve different purposes.
| Requirement | Canister (Encrypted) | Database (Plaintext) |
|---|---|---|
| User privacy | ✅ Best | ❌ Moderate |
| Marketing operations | ❌ Impossible | ✅ Easy |
| Disaster recovery | ✅ Authoritative | ❌ Convenience copy |
| Query performance | ❌ Slow | ✅ Fast |
| Cost (high volume) | ❌ Expensive | ✅ Cheap |
| Regulatory compliance | ✅ Strong | ⚠️ Requires controls |
Conclusion: Neither solution alone meets all requirements. Hybrid approach provides:
- Strong privacy via canister encryption
- Operational efficiency via database plaintext
- Clear authority (canister is source of truth)
- Recovery path (database can be rebuilt)
Security Model
Threat Model
Protected Against:
- ✅ Database breach: Attackers get plaintext PII but cannot access encrypted canister data or recovery keys
- ✅ Canister breach: Attackers get encrypted PII but cannot decrypt without user passwords
- ✅ Insider threat: DAO admins cannot decrypt canister data without user cooperation
- ✅ Man-in-the-middle: HTTP outcalls use HTTPS + Ed25519 signatures
- ✅ Replay attacks: Timestamp validation (5-minute window)
Not Protected Against:
- ❌ Compromised user password: Attacker with password can decrypt canister PII
- ❌ Client-side malware: Keylogger captures password during entry
- ❌ Social engineering: User tricked into revealing password or recovery key
- ❌ Quantum computing (future): AES-256 may be vulnerable in 10-20 years
Trust Boundaries
┌──────────────────────────────────────────────────────────────┐
│ TRUSTED ZONE: User's Browser │
│ - Password never leaves this zone │
│ - Encryption/decryption happens here │
│ - Master key derived from password (KDF) │
└──────────────────────────────────────────────────────────────┘
│
│ Encrypted PII
▼
┌──────────────────────────────────────────────────────────────┐
│ ZERO-KNOWLEDGE ZONE: User-Service Canister │
│ - Stores encrypted PII only │
│ - Never sees plaintext password or master key │
│ - Can verify email_hash without seeing email │
└──────────────────────────────────────────────────────────────┘
│
│ Signed HTTPS
▼
┌──────────────────────────────────────────────────────────────┐
│ SEMI-TRUSTED ZONE: Oracle-Bridge + Database │
│ - Signature verification prevents tampering │
│ - Stores plaintext PII for verified users only │
│ - Subject to access controls and audit logging │
│ - NOT authoritative (can be lost and rebuilt) │
└──────────────────────────────────────────────────────────────┘Encryption Specifications
Canister Encryption (AES-256-GCM):
- Algorithm: AES-256-GCM (Authenticated Encryption)
- Key derivation: Argon2id (password → master key)
- Salt: Random 16 bytes per user
- Nonce: Random 12 bytes per field
- Authentication: GCM tag prevents tampering
Signature Authentication (Ed25519):
- Algorithm: Ed25519 (Curve25519)
- Key size: 256 bits
- Message format:
timestamp + method + path + body - Signature verification: Oracle-bridge validates before accepting sync
Password Hashing (Argon2id):
- Not used for database sync (password never leaves client)
- Used only for client-side key derivation
Data Flow
Registration Flow
1. User enters email, password, first_name, last_name in frontend
2. Frontend derives master key from password (Argon2id)
3. Frontend encrypts PII with master key (AES-256-GCM)
4. Frontend sends encrypted PII to canister
5. Canister stores encrypted PII + email_hash
6. Canister sends verification email via oracle-bridge (plaintext)
7. User remains unverified until code entryKey Point: Database is NOT synced during registration (only verified users).
Verification Flow (Triggers Database Sync)
1. User receives verification code via email
2. User enters verification code + email + first_name + last_name
3. Frontend sends plaintext PII to canister (verify_code call)
4. Canister validates code against email_hash
5. Canister marks user as verified (encrypted record)
6. Canister spawns async sync task:
a. Build JSON with plaintext PII
b. Generate Ed25519 signature
c. HTTP POST to oracle-bridge /sync/individual
7. Oracle-bridge validates signature
8. Oracle-bridge UPSERTs plaintext PII to PostgreSQL
9. Canister returns success to frontend (sync failure doesn't block)Key Point: Plaintext PII submitted during verification enables database sync.
Query Flows
Encrypted Query (Canister):
1. User enters password in frontend
2. Frontend derives master key from password
3. Frontend requests encrypted PII from canister
4. Canister returns encrypted PII
5. Frontend decrypts with master key
6. User sees plaintext in browserPlaintext Query (Database):
1. Marketing tool queries PostgreSQL
2. Database returns plaintext PII
3. Tool displays email list, exports CSV, etc.No decryption needed - database stores plaintext.
Operational Implications
Database Loss Scenarios
| Scenario | Impact | Recovery |
|---|---|---|
| Complete DB loss | Marketing operations stop | Rebuild from canister (metadata only), users re-verify for PII |
| Partial corruption | Some users missing | Targeted resync via admin endpoint |
| Ransomware | Database encrypted by attacker | Ignore ransom, rebuild from canister |
Recovery Time: 10-30 minutes (automated script).
Data Loss: Plaintext PII requires user re-verification (architectural limitation).
Canister Loss Scenarios
| Scenario | Impact | Recovery |
|---|---|---|
| Canister wiped | Total data loss | Deploy from backup WASM + state snapshot |
| Subnet failure | Temporary outage | Wait for IC subnet recovery (redundant) |
| Upgrade failure | Canister stuck | Rollback to previous WASM version |
Recovery Time: Minutes to hours (depends on IC recovery procedures).
Data Loss: If backup available, none. If no backup, catastrophic (database has PII but no recovery keys).
Sync Failure Scenarios
| Scenario | Impact | Mitigation |
|---|---|---|
| Network outage | Sync fails, verification succeeds | Automatic retry on next verification |
| Oracle-bridge down | Sync fails, verification succeeds | Manual resync via admin endpoint |
| Invalid signature | Sync rejected, verification succeeds | Re-register canister public key |
| Database full | Sync fails, verification succeeds | Increase storage, resync |
Key Principle: Sync failure NEVER blocks user verification (fire-and-forget pattern).
Privacy and Compliance
GDPR Compliance
Right to Access:
- User can decrypt their canister PII with password
- Database PII available via authenticated API
Right to Rectification:
- User updates trigger sync to database
- Database is convenience copy, not authoritative
Right to Erasure:
- User requests deletion via canister
- Canister soft-deletes (GDPR flag)
- Sync propagates
gdpr_deleted = trueto database - Marketing queries exclude deleted users
Right to Data Portability:
- User can export decrypted PII from canister
- Database export available via admin tools
Marketing Consent
Opt-In Model:
- Default:
gdpr_marketing_consent = true - User can opt-out anytime
- Database queries respect consent flag
Query Pattern:
SELECT email, first_name, last_name
FROM individuals
WHERE verified = true
AND gdpr_deleted = false
AND gdpr_marketing_consent = true;Audit Trail
Database Audit Log:
- All sync operations logged
- Tracks: action, table, record ID, timestamp, canister ID
- Retention: 90 days minimum
Canister Audit Log:
- All decryption attempts logged
- Tracks: user ID, timestamp, IP hash
- Retention: 1 year
Migration Path
Current State
v1.0 (Story 2-5-2):
- Canister: Encrypted PII + recovery keys
- Database: Plaintext PII for verified users only
- Sync: Automatic after email verification
- Resync: Manual via admin endpoint + batch script
Future Enhancements
Planned (Epic 2.5, Story 2-5-3):
- Automated database backups (daily)
- SSL/TLS enforcement (certificate pinning)
- Monitoring dashboards (Grafana)
- Disaster recovery automation
- Point-in-time recovery
Considered (Epic 3):
- Homomorphic encryption for database (query encrypted data)
- Differential privacy for analytics
- Federated learning for ML models
- Multi-party computation for admin access
Alternatives Considered
Alternative 1: Encrypted Database Only
Approach: Store encrypted PII in database, decrypt on-demand.
Rejected Because:
- Database queries on encrypted data are very slow
- Text search (LIKE queries) impossible
- Indexes don't work on encrypted columns
- Marketing tools would need major rework
Alternative 2: Plaintext Canister Only
Approach: Store plaintext PII in canister, no encryption.
Rejected Because:
- Violates zero-knowledge principle
- DAO admins could access all PII
- Regulatory risk (GDPR, CCPA)
- User privacy expectations not met
Alternative 3: Client-Side Database Sync
Approach: Frontend syncs to database directly after verification.
Rejected Because:
- Requires database credentials in frontend (security risk)
- CORS complications
- No signature authentication (anyone could write)
- Audit trail more difficult
Alternative 4: No Database, Canister Only
Approach: Store encrypted PII in canister, marketing queries via canister.
Rejected Because:
- Canister query costs scale linearly with data volume
- No JOIN support (multi-table queries)
- Export to CSV very slow (cycles cost)
- CRM integration requires REST APIs (canister API is Candid)
Conclusion: Hybrid architecture is the optimal balance.
Technical Debt and Limitations
Known Limitations
Plaintext PII Loss on Rebuild:
- Database rebuild only recovers metadata
- Users must re-verify to populate plaintext fields
- Impact: User friction after disaster recovery
- Mitigation: Encourage early re-verification via dashboard banner
No Homomorphic Encryption:
- Cannot query encrypted canister data directly
- Requires decryption in browser for user access
- Impact: Performance overhead for user-facing queries
- Future: Explore FHE libraries (expensive computationally)
Single Database Instance:
- No read replicas yet
- Single point of failure (mitigated by rebuild capability)
- Impact: Database downtime affects marketing operations
- Future: Add PostgreSQL replication (Epic 2.5)
Signature Replay Window:
- 5-minute timestamp validation allows replay within window
- Impact: Attacker could replay sync request multiple times
- Mitigation: Idempotent UPSERT (no harm from replay)
Technical Debt
No Encrypted Database Backup:
- Database backups are plaintext
- Risk: Backup leaks expose all user PII
- Mitigation: Encrypt backups at rest (planned Epic 2.5)
Canister Signing Key Rotation:
- No automated key rotation
- Risk: Compromised signing key affects all syncs
- Mitigation: Manual rotation procedure documented
Monitoring Gaps:
- No alerting for sync failures yet
- Risk: Silent sync failures go unnoticed
- Mitigation: Add monitoring (Epic 2.5, Story 2-5-3)
Performance Characteristics
Sync Performance
Automatic Sync (per user):
- Trigger: Email verification completion
- Latency: 100-500ms (HTTP outcall)
- Throughput: 100+ syncs/minute
- Cycles Cost: ~50M cycles per sync
Batch Resync (full database):
- Users: 1000
- Time: ~10 minutes (pagination overhead)
- Database Load: Moderate (UPSERT is efficient)
Query Performance
Canister Queries (encrypted):
- Latency: 200-1000ms (decryption in browser)
- Throughput: 10-50 queries/second
- Cycles Cost: Query is free, but decryption is client-side
Database Queries (plaintext):
- Latency: 5-50ms (indexed lookups)
- Throughput: 1000+ queries/second
- Cost: Minimal (PostgreSQL is fast)
Conclusion: Database is 10-100x faster for high-volume queries.
Related Documentation
Change Log
| Version | Date | Author | Changes |
|---|---|---|---|
| 1.0 | 2025-11-25 | System | Initial architecture documentation |