Database Sync Architecture
Story: 2-5-2 Database Sync After Email Verification Version: 1.0 Last Updated: 2025-11-25
Overview
The database sync system replicates verified user PII from the user-service canister to the oracle-bridge PostgreSQL database for operational use (marketing, exports, queries).
Architecture Principles
1. Canister is Authoritative
- Source of Truth: User-service canister (encrypted PII + recovery keys)
- Convenience Copy: PostgreSQL database (plaintext PII for marketing)
- Recovery: Database can be lost/rebuilt from canister data
2. Zero-Knowledge Maintained
- Canister Storage: Encrypted PII (AES-256-GCM with master recovery key)
- Database Storage: Plaintext PII (synced after verification)
- Client-Side Encryption: User password never sent to canister
- Sync Limitation: Only plaintext provided by frontend can be synced
3. Sync After Verification Only
- Privacy-Respecting: No sync of unverified users
- Trigger: Email verification completion (
verify_code()success) - Non-Blocking: Sync failure does NOT block verification success
4. Canister Signature Authentication
- Ed25519 Signatures: All sync requests must be signed by canister
- Prevents Tampering: Oracle-bridge validates signature before accepting data
- Timestamp Validation: 5-minute window to prevent replay attacks
Sync Flow
Automatic Sync (Normal Flow)
┌─────────────────┐
│ Frontend │
│ (React + II) │
└────────┬────────┘
│
│ 1. User submits verification code
│ + email
│ + verification code
│ + first_name (plaintext)
│ + last_name (plaintext)
│
▼
┌─────────────────────────────────────────┐
│ User-Service Canister │
│ verify_code(email, code, first, last) │
└────────┬────────────────────────────────┘
│
│ 2. Validate code + email_hash
│ 3. Mark user as verified
│ 4. Update verified_at timestamp
│
│ 5. Spawn async sync task
│ (fire-and-forget)
│
▼
┌──────────────────────────────────────────┐
│ sync_individual_to_database() │
│ - Build JSON body with plaintext PII │
│ - Generate Ed25519 signature │
│ - HTTP POST to oracle-bridge │
└────────┬─────────────────────────────────┘
│
│ 6. HTTP outcall with signature
│ Headers:
│ - X-Canister-ID: <canister_id>
│ - X-Signature: <ed25519_sig>
│ - X-Timestamp: <milliseconds>
│
▼
┌──────────────────────────────────────────┐
│ Oracle-Bridge │
│ POST /sync/individual │
└────────┬─────────────────────────────────┘
│
│ 7. requireSignature middleware
│ - Fetch canister public key
│ - Verify Ed25519 signature
│ - Check timestamp (5 min window)
│
│ 8. Validate required fields
│ - id, email, first_name, last_name
│ - email_hash, submitted_at, etc.
│
│ 9. UPSERT to PostgreSQL
│ - INSERT ... ON CONFLICT UPDATE
│ - Idempotent (safe to retry)
│
│ 10. Log audit event
│ - Action: SYNC
│ - User: canister
│
▼
┌──────────────────────────────────────────┐
│ PostgreSQL Database │
│ individuals table │
│ - Plaintext PII stored │
│ - synced_at = NOW() │
└──────────────────────────────────────────┘Error Handling:
- Sync failure is logged but does NOT block verification
- User sees "Email verified successfully!" even if sync fails
- Sync can be retried manually via admin endpoint
Manual Resync (Admin Flow)
Option 1: Specific User Resync
curl -X POST https://oracle-bridge.example.com/admin/resync/individual \
-H "Content-Type: application/json" \
-H "X-Canister-ID: <canister_id>" \
-H "X-Signature: <signature>" \
-H "X-Timestamp: <timestamp>" \
-d '{
"id": "user-123",
"email": "john@example.com",
"first_name": "John",
"last_name": "Doe",
"email_hash": "abc123...",
"verified": true,
"verified_at": "1700000000000000000",
"gdpr_marketing_consent": true
}'Use Cases:
- Sync failure during automatic sync
- Database record corruption
- Manual data correction
Option 2: Batch Metadata Resync
npm run resync -- \
--canister-id rrkah-fqaaa-aaaaa-aaaaq-cai \
--verified-only \
--dry-runWhat Gets Synced:
- ✅ email_hash (SHA-256)
- ✅ verified status (boolean)
- ✅ verified_at timestamp
- ✅ submitted_at timestamp
- ❌ Plaintext PII (NOT available - encrypted in canister)
Limitations:
- Cannot recover plaintext email, first_name, last_name
- Users must re-verify email to populate plaintext fields
- Only syncs verification metadata
API Specification
POST /sync/individual
Authentication: Canister signature (Ed25519)
Headers:
Content-Type: application/json
X-Canister-ID: <canister_id>
X-Signature: <base64_signature>
X-Timestamp: <milliseconds>Request Body:
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"email": "user@example.com",
"first_name": "John",
"last_name": "Doe",
"email_hash": "2c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7ae",
"verified": true,
"verified_at": "1700000000000000000",
"submitted_at": "1699000000000000000",
"encryption_key_id": "key-123",
"gdpr_marketing_consent": true,
"gdpr_deleted": false
}Response (Success - 200 OK):
{
"success": true
}Response (Validation Error - 400 Bad Request):
{
"error": "Missing required fields: id, email, first_name, last_name, email_hash, submitted_at, encryption_key_id"
}Response (Authentication Error - 401 Unauthorized):
{
"error": "Invalid canister signature"
}Response (Server Error - 500 Internal Server Error):
{
"error": "Failed to sync individual data"
}POST /admin/resync/individual
Authentication: Canister signature (Ed25519)
Purpose: Manual resync for specific user (admin operation)
Request/Response: Same as /sync/individual but logs action as MANUAL_RESYNC
GET /admin/resync/status
Authentication: Canister signature (Ed25519)
Purpose: Get database sync statistics
Response (200 OK):
{
"success": true,
"statistics": {
"total_users": "1523",
"verified_users": "1204",
"deleted_users": "12",
"oldest_sync": "2025-10-01T10:30:00.000Z",
"latest_sync": "2025-11-25T14:22:33.000Z",
"stale_syncs": "45"
}
}Signature Authentication
Canister Registration
Before syncing, canister must register its public key:
// 1. Canister generates Ed25519 keypair during init
const keypair = nacl.sign.keyPair();
// 2. Canister registers public key with oracle-bridge
POST /canister/register
{
"canisterId": "rrkah-fqaaa-aaaaa-aaaaq-cai",
"publicKey": "<base64_public_key>",
"signature": "<self_signature>",
"timestamp": 1700000000000
}Signature Generation (Canister)
// user-service/src/lib.rs:3099-3118
fn sign_http_request(
method: &str,
path: &str,
body: &str,
timestamp: u64,
signing_key_bytes: &[u8]
) -> Result<String, String> {
let signing_key = SigningKey::from_bytes(&key_bytes);
// Message format: timestamp + method + path + body
let message = format!("{}{}{}{}", timestamp, method, path, body);
// Sign with Ed25519
let signature = signing_key.sign(message.as_bytes());
// Return base64-encoded signature
Ok(base64_engine.encode(signature.to_bytes()))
}Signature Verification (Oracle-Bridge)
// oracle-bridge/src/middleware/signature-auth.ts:33-118
export async function requireSignature(req, res, next) {
const canisterId = req.headers['x-canister-id'];
const signature = req.headers['x-signature'];
const timestamp = req.headers['x-timestamp'];
// 1. Validate timestamp (5-minute window)
const now = Date.now();
const fiveMinutes = 5 * 60 * 1000;
if (Math.abs(now - parseInt(timestamp)) > fiveMinutes) {
return res.status(401).json({ error: 'Signature expired' });
}
// 2. Fetch canister public key (from cache or storage)
const publicKey = await getCanisterPublicKey(canisterId);
// 3. Reconstruct message
const message = timestamp + req.method + req.path + JSON.stringify(req.body);
// 4. Verify Ed25519 signature
const verified = nacl.sign.detached.verify(
Buffer.from(message, 'utf-8'),
Buffer.from(signature, 'base64'),
Buffer.from(publicKey, 'base64')
);
if (!verified) {
return res.status(401).json({ error: 'Invalid signature' });
}
// 5. Attach canister ID to request
req.canisterId = canisterId;
next();
}Database Operations
UPSERT Implementation
// oracle-bridge/src/db/individuals.ts:27-87
export async function upsertIndividual(data: IndividualData): Promise<void> {
const query = `
INSERT INTO individuals (
id, email, first_name, last_name, email_hash,
verified, verified_at, submitted_at,
canister_id, encryption_key_id,
gdpr_marketing_consent, gdpr_deleted, synced_at
)
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, NOW())
ON CONFLICT (id) DO UPDATE SET
email = EXCLUDED.email,
first_name = EXCLUDED.first_name,
last_name = EXCLUDED.last_name,
verified = EXCLUDED.verified,
verified_at = EXCLUDED.verified_at,
gdpr_marketing_consent = EXCLUDED.gdpr_marketing_consent,
gdpr_deleted = EXCLUDED.gdpr_deleted,
synced_at = NOW(),
updated_at = NOW()
`;
await pool.query(query, values);
}Idempotency:
- Multiple syncs with same data = no duplicate rows
- Latest data always wins (EXCLUDED values)
synced_atalways updated to track freshness
Error Handling
Canister-Side Errors
// Sync failure is logged but does NOT block verification
ic_cdk::spawn(async move {
match sync_individual_to_database(...).await {
Ok(_) => ic_cdk::println!("✅ Synced user {} to database", user_id),
Err(e) => ic_cdk::println!("⚠️ Database sync failed: {}", e),
// User verification still succeeds!
}
});Rationale: Email verification is critical user flow, database sync is secondary.
Oracle-Bridge Errors
try {
await upsertIndividual(individualData);
await logAudit('SYNC', 'individuals', id, 'canister', ...);
res.json({ success: true });
} catch (error) {
console.error('Error syncing individual:', error);
res.status(500).json({ error: 'Failed to sync individual data' });
}Audit Logging:
- Audit failures do NOT block sync (internal error handling)
- Logged to console for monitoring
Monitoring
Sync Health Indicators
Stale Syncs: Users with
synced_at> 7 days oldsqlSELECT COUNT(*) FROM individuals WHERE synced_at < NOW() - INTERVAL '7 days';Unverified Users: Should NOT be in database
sqlSELECT COUNT(*) FROM individuals WHERE verified = FALSE;GDPR Compliance: Deleted users excluded from marketing
sqlSELECT COUNT(*) FROM individuals WHERE gdpr_deleted = TRUE;Audit Log Analysis: Track sync operations
sqlSELECT action, COUNT(*) FROM audit_log WHERE table_name = 'individuals' AND created_at > NOW() - INTERVAL '24 hours' GROUP BY action;
Security Considerations
1. Signature Validation
- Algorithm: Ed25519 (collision-resistant, 256-bit security)
- Timestamp Window: 5 minutes (prevents replay attacks)
- Public Key Storage: Cached in oracle-bridge (verified during registration)
2. Data Integrity
- Canister Signature: Ensures data comes from trusted canister
- HTTPS Only: All communication over TLS 1.3
- No Raw Credentials: Database password in environment variables only
3. Privacy
- Zero-Knowledge Canister: Never sees plaintext passwords or keys
- Plaintext Database: Only stores data after user email verification
- GDPR Soft Deletes: Deleted users marked, not hard-deleted
Performance
Throughput
- Async Sync: Non-blocking, spawned as background task
- Database Connection Pool: 20 connections max
- Query Timeout: 2 seconds per operation
Optimization
- Indexes: All query columns indexed (email, email_hash, verified, etc.)
- UPSERT: Single query (no SELECT + INSERT/UPDATE race condition)
- Pagination: Batch resync uses 100 users per page
Production Infrastructure
Implemented: Epic 2.5, Story 2-5-3 (2025-11-25)
Automated Backups
- Schedule: Daily at 2:00 AM UTC, monthly on 1st of each month
- Storage: AWS S3 (
s3://hwdao-database-backups/) - Encryption: Optional GPG (AES-256) or AWS KMS
- Retention: 30 daily backups + 12 monthly backups
- Validation: Monthly restoration tests to staging database
- Scripts:
oracle-bridge/scripts/backup-database.sh,validate-backup.sh
SSL/TLS Security
- Protocol: TLS 1.2+ (TLS 1.0/1.1 disabled)
- Certificates: Let's Encrypt or cloud provider managed
- Connection Mode:
sslmode=requirein DATABASE_URL - Auto-Renewal: Certbot cron job (twice daily)
- Monitoring: Certificate expiration alerts (30 days, 7 days)
Monitoring and Alerting
- Metrics: Uptime (60s checks), disk space, connection pool, slow queries, backup success
- Alerts: Critical (database down, disk >90%, backup failure), Warning (disk >80%, slow queries)
- Dashboard: Grafana with Prometheus metrics
- Configuration:
oracle-bridge/monitoring/(prometheus.yml, alert-rules.yml, grafana-dashboard.json)
Disaster Recovery
- RTO: 4 hours (time to full restoration)
- RPO: 24 hours (maximum data loss)
- Primary Method: Backup restoration (Procedure D in runbook)
- Fallback Method: Canister metadata resync (Procedure A)
- Runbook:
docs/operations/database-rebuild-runbook.md
Production Readiness Checklist
- [x] Automated daily backups configured
- [x] SSL/TLS enforced for all connections
- [x] Monitoring dashboards deployed
- [x] Alerting configured for critical failures
- [x] Disaster recovery procedures documented
- [x] Monthly backup validation scheduled
- [x] Certificate auto-renewal configured
Next Steps for Deployment:
- Configure production environment variables (DATABASE_URL with SSL, BACKUP_S3_BUCKET)
- Set up AWS S3 bucket for backups
- Install and configure Prometheus + Grafana
- Schedule cron jobs for backups and validation
- Configure certbot for SSL certificate management
- Test disaster recovery procedures in staging
- Schedule first quarterly disaster recovery drill
Future Enhancements
- Replication: Multi-region read replicas for high availability
- Point-in-Time Recovery: Transaction log archival for sub-daily RPO
- Query Optimization: Automated slow query analysis and index recommendations
- Capacity Planning: Predictive analytics for storage and connection pool scaling