AI Inference Integration

The oracle-bridge provides AI-powered endpoints for the DAO's governance and collaboration tools. All AI features route through the oracle-bridge (Node.js off-chain service), which communicates with an Ollama instance running on-cluster. This keeps inference costs off-chain while keeping results verifiable and auditable.

Epic: BL-045 — implemented across oracle-bridge, governance-suite, dao-suite.

Architecture Overview

Browser / Suite frontend
        │
        ▼
   oracle-bridge  (Node.js, port 3000 / 8787 staging)
        │
        ├── PostgreSQL  (proposal summary cache, embeddings)
        │
        └── Ollama HTTP API  (AX42-U cluster, in-cluster service)
                │
                ├── mistral:7b  (drafting assistant)
                ├── llama3.2:3b (semantic search / embeddings)
                └── tinyllama:1.1b (fallback / health checks)

The oracle-bridge is the only service that talks to Ollama. Frontends never call Ollama directly.

Cluster note (2026-04-27): Ollama runs on the AX42-U k3s cluster (Hetzner Helsinki, IP 157.180.13.84). The earlier Sector7 cluster (192.168.2.0/24 LAN) was decommissioned. References to 192.168.2.160 (Theo node) below are historical — substitute the current AX42-U in-cluster service address (e.g. ollama.platform.svc.cluster.local:11434) when reading those examples.

Configuration

Environment Variables

Variable	Default	Description
`OLLAMA_HOST`	(set per env — see below)	Ollama base URL
`OLLAMA_DEFAULT_MODEL`	`mistral:7b`	Model used for drafting and summaries
`OLLAMA_EMBED_MODEL`	`llama3.2:3b`	Model used for embedding generation
`OLLAMA_TIMEOUT_MS`	`30000`	Request timeout in milliseconds
`OLLAMA_CIRCUIT_OPEN_THRESHOLD`	`3`	Failures before circuit opens
`OLLAMA_CIRCUIT_RESET_MS`	`60000`	Time before circuit half-opens
`AI_CACHE_TTL_SECONDS`	`3600`	PostgreSQL cache TTL for summaries

OLLAMA_HOST per env: local dev points to your own Ollama install (http://localhost:11434); staging/production resolves to the AX42-U in-cluster Ollama service. The value is injected via Docker secrets / k8s ConfigMap on staging+prod.

Model Selection

Available models on the AX42-U cluster:

Model	Size	Use case
`tinyllama:1.1b`	small	fallback / health checks
`llama3.2:3b`	medium	semantic search / embeddings
`mistral:7b` (Q4)	large	drafting assistant / proposal summaries

Model swaps take 10–30 seconds (only one model loaded at a time on the inference node). Design prompts to tolerate cold-start latency.

Circuit Breaker Pattern

The oracle-bridge wraps every Ollama call in a circuit breaker (src/ai/circuitBreaker.ts). This prevents cascading failures when Ollama is unreachable or overloaded.

States

CLOSED ──(3 failures)──► OPEN ──(60s timeout)──► HALF-OPEN ──(success)──► CLOSED
                                                        │
                                                   (failure)
                                                        │
                                                      OPEN

State	Behavior
CLOSED	Normal — all requests pass through
OPEN	Fast-fail — returns 503 immediately, no Ollama calls
HALF-OPEN	One probe request allowed — success closes, failure reopens

Implementation

import { CircuitBreaker } from '../ai/circuitBreaker';

const breaker = new CircuitBreaker({
  failureThreshold: parseInt(process.env.OLLAMA_CIRCUIT_OPEN_THRESHOLD ?? '3'),
  resetTimeoutMs: parseInt(process.env.OLLAMA_CIRCUIT_RESET_MS ?? '60000'),
});

async function callOllama(prompt: string): Promise<string> {
  return breaker.execute(() => ollamaClient.generate(prompt));
}

When the circuit is OPEN, endpoints return:

json

{ "error": "AI service temporarily unavailable", "code": "CIRCUIT_OPEN" }

GlitchTip alerts fire when the circuit opens (tag: ai.circuit_open).

API Reference

All endpoints require a valid session cookie (httpOnly, domain=.helloworlddao.com) unless otherwise noted. Oracle-bridge validates the session against auth-service before serving AI responses.

Base URL

Environment	URL
Local	`http://localhost:3000`
Staging	`https://staging-oracle.helloworlddao.com`
Production	`https://oracle.helloworlddao.com`

`POST /api/ai/summarize`

Generate a concise summary of a governance proposal. Results are cached in PostgreSQL by proposal_id with a configurable TTL.

Request

http

POST /api/ai/summarize
Content-Type: application/json
Cookie: session=<token>

{
  "proposal_id": "prop_abc123",
  "title": "Allocate 500 DOM for community garden",
  "body": "We propose to allocate 500 DOM tokens from the treasury..."
}

Response

json

{
  "summary": "Allocates 500 DOM from treasury to fund a shared garden space, benefiting ~40 members. Vote closes in 5 days.",
  "cached": false,
  "model": "mistral:7b",
  "latency_ms": 1240
}

Field	Type	Description
`summary`	string	1–3 sentence plain-language summary
`cached`	boolean	`true` if returned from PostgreSQL cache
`model`	string	Ollama model that generated the response
`latency_ms`	number	Time from request to response (0 if cached)

Error Responses

Status	Code	Meaning
503	`CIRCUIT_OPEN`	Ollama unreachable — circuit is open
422	`VALIDATION_ERROR`	Missing or invalid `proposal_id` / `body`
401	`UNAUTHORIZED`	Invalid or expired session

`POST /api/ai/search`

Semantic search across proposals using cosine similarity over stored embeddings. Results are ranked by semantic relevance, not keyword match.

Request

http

POST /api/ai/search
Content-Type: application/json
Cookie: session=<token>

{
  "query": "environmental sustainability funding",
  "limit": 10,
  "threshold": 0.75
}

Response

json

{
  "results": [
    {
      "proposal_id": "prop_xyz789",
      "title": "Green Energy Infrastructure Grant",
      "score": 0.91,
      "excerpt": "Funding allocation for solar panel installation..."
    }
  ],
  "query_embedding_ms": 340,
  "search_ms": 12
}

Field	Type	Description
`results[].score`	float	Cosine similarity (0.0–1.0, higher = more relevant)
`threshold`	float	Minimum score to include (default: 0.75)
`limit`	int	Max results (default: 10, max: 50)

Embeddings are generated using llama3.2:3b and stored in PostgreSQL as vector(4096). New proposals are embedded asynchronously after creation.

`POST /api/ai/draft`

AI-assisted proposal drafting. Accepts a brief intent and returns a structured first draft.

Request

http

POST /api/ai/draft
Content-Type: application/json
Cookie: session=<token>

{
  "intent": "Propose adding a community library of maker tools",
  "category": "infrastructure",
  "max_tokens": 400
}

Response

json

{
  "draft": {
    "title": "Establish Community Maker Tool Library",
    "body": "## Summary\n\nThis proposal establishes a shared library of maker tools...",
    "suggested_budget": null,
    "tags": ["infrastructure", "community", "tools"]
  },
  "model": "mistral:7b",
  "latency_ms": 2100
}

The draft is a starting point — members must edit and review before submitting. The governance-suite proposal creation form pre-fills from the draft response.

`POST /api/ai/embed`

Generate an embedding vector for arbitrary text. Used internally for indexing; also available for advanced integrations.

Request

http

POST /api/ai/embed
Content-Type: application/json
Cookie: session=<token>

{
  "text": "Sustainable agriculture and community food systems"
}

Response

json

{
  "embedding": [0.021, -0.043, ...],
  "dimensions": 4096,
  "model": "llama3.2:3b"
}

`GET /api/ai/health`

Returns Ollama connectivity status and circuit state. Does not require authentication.

Response

json

{
  "status": "ok",
  "circuit": "CLOSED",
  "model_available": true,
  "ollama_host": "http://ollama.platform.svc.cluster.local:11434",
  "default_model": "mistral:7b"
}

Proposal Summary Caching

Summaries are cached in PostgreSQL (proposal_summaries table) to avoid redundant Ollama calls. Cache key is proposal_id. Cache is invalidated when the proposal body changes (detected by SHA-256 hash of the body).

A reindex cron job runs nightly to generate summaries for any proposals added without a cached summary:

bash

# Manually trigger reindex (oracle-bridge admin endpoint)
curl -X POST https://oracle.helloworlddao.com/api/ai/admin/reindex \
  -H "X-API-Token: fos_<token>"

Monitoring & Alerts

GlitchTip DSN: https://017a18...@glitchtip.founderyos.dev/4

Alert	Trigger	Tag
Circuit opened	Circuit transitions CLOSED → OPEN	`ai.circuit_open`
High latency	Ollama response > 15s	`ai.latency_high`
Cache miss spike	Cache hit rate drops below 40%	`ai.cache_miss_rate`

All AI errors are tagged with service:oracle-bridge and component:ai in GlitchTip.

Local Development

To run AI endpoints locally without a cluster connection, use the Ollama desktop app or Docker:

bash

# Install Ollama locally
curl -fsSL https://ollama.ai/install.sh | sh

# Pull required models
ollama pull mistral:7b
ollama pull llama3.2:3b

# Point oracle-bridge at local Ollama
echo 'OLLAMA_HOST=http://localhost:11434' >> oracle-bridge/.env.local

# Start oracle-bridge
cd oracle-bridge && npm run dev

The circuit breaker is disabled in test mode (NODE_ENV=test) to avoid flakiness in unit tests.

AI Inference Integration ​

Architecture Overview ​

Configuration ​

Environment Variables ​

Model Selection ​

Circuit Breaker Pattern ​

States ​

Implementation ​

API Reference ​

Base URL ​

POST /api/ai/summarize ​

POST /api/ai/search ​

POST /api/ai/draft ​

POST /api/ai/embed ​

GET /api/ai/health ​

Proposal Summary Caching ​

Monitoring & Alerts ​

Local Development ​

AI Inference Integration

Architecture Overview

Configuration

Environment Variables

Model Selection

Circuit Breaker Pattern

States

Implementation

API Reference

Base URL

`POST /api/ai/summarize`

`POST /api/ai/search`

`POST /api/ai/draft`

`POST /api/ai/embed`

`GET /api/ai/health`

Proposal Summary Caching

Monitoring & Alerts

Local Development