Database Connectivity Issues ​
Last Updated: 2025-12-04 Alert: EmailDeliveryFailures, OracleBridgeHighErrorRate Severity: Warning / High Response Time: < 1 hour
Overview ​
This runbook covers troubleshooting connectivity issues between the oracle-bridge service and external databases/services, as well as issues with the PostgreSQL database for user data sync (Epic 2.5).
Affected Systems ​
- Oracle-bridge service (off-chain)
- PostgreSQL database (user data sync)
- External services (Stripe, email provider)
Symptoms ​
- Alert:
EmailDeliveryFailuresor high error rate - Email verification emails not being sent
- Stripe webhooks not being processed
- Database sync failures
- Oracle-bridge health check failing
Diagnosis ​
Step 1: Check Oracle-Bridge Health ​
bash
# Check health endpoint
curl -s https://oracle.helloworlddao.com/health
# Expected response: {"status": "ok"}Step 2: Check Database Connectivity ​
If oracle-bridge can't connect to PostgreSQL:
bash
# SSH to oracle-bridge server
ssh deploy@oracle-bridge-server
# Test database connection
psql -h <db-host> -U <db-user> -d helloworlddao -c "SELECT 1"
# Check connection pooling
# Review connection pool metrics in logsStep 3: Check External Service Status ​
| Service | Status Page |
|---|---|
| SendGrid | https://status.sendgrid.com |
| Stripe | https://status.stripe.com |
| PostgreSQL (cloud) | Check cloud provider status |
Step 4: Review Logs ​
bash
# Oracle-bridge logs
ssh deploy@oracle-bridge-server
cd /home/deploy/oracle-bridge
npm run logs
# Look for:
# - Connection refused
# - Timeout errors
# - Authentication failures
# - SSL/TLS errorsResolution ​
Scenario A: Database Connection Refused ​
bash
# Check if database is running
psql -h <db-host> -U <db-user> -c "SELECT 1"
# If connection refused:
# 1. Check if database server is running
# 2. Check firewall rules allow connection
# 3. Check security group settings (cloud)
# 4. Verify database credentialsScenario B: Connection Pool Exhausted ​
If seeing "too many connections" errors:
bash
# Check current connections
psql -h <db-host> -U <db-user> -d helloworlddao -c \
"SELECT count(*) FROM pg_stat_activity WHERE datname = 'helloworlddao'"
# Kill idle connections if needed
psql -h <db-host> -U <db-user> -d helloworlddao -c \
"SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname = 'helloworlddao' AND state = 'idle' AND state_change < NOW() - INTERVAL '10 minutes'"For persistent issues:
- Increase
max_connectionsin PostgreSQL - Reduce connection pool size in oracle-bridge
- Implement connection pooler (PgBouncer)
Scenario C: SSL/TLS Issues ​
If seeing SSL handshake failures:
bash
# Test SSL connection
openssl s_client -connect <db-host>:5432 -starttls postgres
# Verify certificate
# Check if CA cert is installed correctly
# Update SSL mode in connection string if neededScenario D: Authentication Failures ​
bash
# Verify credentials work
psql -h <db-host> -U <db-user> -d helloworlddao
# If password auth fails:
# 1. Verify password in environment variables
# 2. Check pg_hba.conf allows connection method
# 3. Verify user exists with correct permissionsScenario E: Network/Firewall Issues ​
bash
# Test network connectivity
nc -zv <db-host> 5432
# If timeout:
# 1. Check security groups (cloud)
# 2. Check network ACLs
# 3. Check local firewall
# 4. Check VPC peering if applicableScenario F: Oracle-Bridge Service Down ​
bash
# SSH to server
ssh deploy@oracle-bridge-server
# Check service status
pm2 status
# Restart if needed
pm2 restart oracle-bridge
# Check logs for startup errors
pm2 logs oracle-bridgeDatabase Recovery ​
Recovering from Connection Issues ​
Restart connection pools:
bashpm2 restart oracle-bridgeClear stale connections:
sqlSELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname = 'helloworlddao' AND state = 'idle' AND state_change < NOW() - INTERVAL '5 minutes';Verify sync is working:
- Check recent records in database
- Compare with canister state
Data Sync Verification ​
After connectivity is restored:
sql
-- Check latest sync timestamp
SELECT MAX(updated_at) FROM users;
-- Count records
SELECT COUNT(*) FROM users;
-- Compare with canister count
-- dfx canister call user_service get_statsExternal Service Issues ​
Email Provider (SendGrid) ​
If email delivery is failing:
- Check SendGrid status page
- Review SendGrid dashboard for:
- Bounces
- Blocks
- Spam reports
- Check API key validity
- Review rate limits
Stripe ​
If Stripe webhooks are failing:
- Check Stripe dashboard > Developers > Webhooks
- Review failed webhook attempts
- Verify webhook secret is correct
- Check endpoint URL is reachable
Post-Resolution ​
Step 1: Verify Services ​
bash
# Health check
curl -s https://oracle.helloworlddao.com/health
# Test email sending
curl -X POST https://oracle.helloworlddao.com/test-email \
-H "Authorization: Bearer <test-token>" \
-d '{"email": "test@example.com"}'Step 2: Monitor ​
- Watch Grafana for 30 minutes
- Verify error rates return to normal
- Confirm email delivery is working
Step 3: Document ​
If this was a significant outage:
- Document root cause
- Update monitoring for earlier detection
- Create tickets for improvements
Prevention ​
Database Best Practices ​
- Connection pooling - Use PgBouncer or built-in pooling
- Health checks - Regular database health monitoring
- Alerts - Set up connection count alerts
- Backups - Regular automated backups
Network Resilience ​
- Retry logic - Implement exponential backoff
- Circuit breakers - Fail fast when database is down
- Timeouts - Set appropriate connection timeouts
- Redundancy - Consider read replicas
Escalation ​
| Condition | Action |
|---|---|
| Database unrecoverable | Contact DBA / cloud support |
| Data corruption suspected | Contact team lead immediately |
| Cloud provider issue | Open support ticket |
| Prolonged email outage | Contact SendGrid support |