AI Integration & RAG Pipeline
This guide covers Visita’s AI integration architecture, focusing on the Retrieval-Augmented Generation (RAG) pipeline for knowledge-based question answering and the governance policies that ensure responsible AI usage.
AI is a Utility, Not a Decision-Maker: AI in Visita summarizes and formats existing data, never generates new facts or makes authoritative decisions.
Architecture Overview
AI Service Layer
┌─────────────────────────────────────────────────────────────┐
│ Application Layer │
│ Components → Server Actions → AI Service Layer │
└─────────────────────────────────────────────────────────────┘
│
┌─────────────────────────────────────────────────────────────┐
│ AI Service Layer │
│ /services/ai/client.ts (Provider Agnostic) │
│ /services/ai/router.ts (Tiered Routing) │
└─────────────────────────────────────────────────────────────┘
│
┌─────────────────────────────────────────────────────────────┐
│ AI Provider Layer │
│ LiteLLM / OpenRouter (Model Orchestration) │
└─────────────────────────────────────────────────────────────┘
│
┌─────────────────────────────────────────────────────────────┐
│ LLM Layer │
│ Tier 1: Gemini Flash (Cheap, Fast) │
│ Tier 2: Claude Haiku (Mid, Balanced) │
│ Tier 3: Claude Sonnet (Quality, Reasoning) │
└─────────────────────────────────────────────────────────────┘
RAG Pipeline Architecture
┌─────────────────────────────────────────────────────────────┐
│ User Query │
│ "What are the safety concerns in Ward 12?" │
└─────────────────────────────────────────────────────────────┘
│
┌─────────────────────────────────────────────────────────────┐
│ Retrieval Phase │
│ 1. Embed query (text → vector) │
│ 2. Similarity search in vector DB │
│ 3. Retrieve top-k relevant documents │
└─────────────────────────────────────────────────────────────┘
│
┌─────────────────────────────────────────────────────────────┐
│ Augmentation Phase │
│ 1. Combine query + retrieved context │
│ 2. Format prompt with instructions │
│ 3. Add metadata (sources, timestamps) │
└─────────────────────────────────────────────────────────────┘
│
┌─────────────────────────────────────────────────────────────┐
│ Generation Phase │
│ 1. Send prompt to LLM │
│ 2. Generate answer based on context │
│ 3. Include source citations │
└─────────────────────────────────────────────────────────────┘
│
┌─────────────────────────────────────────────────────────────┐
│ Response │
│ "Based on recent crime reports, Ward 12 has..." │
│ [Sources: Crime Report #123, Safety Score Q3] │
└─────────────────────────────────────────────────────────────┘
Implementation
1. AI Service Layer
Provider Agnostic: The AI service layer abstracts away specific LLM providers, making it easy to switch models or providers.
import { createClient as createLiteLLMClient } from 'litellm';
export interface AIRequest {
prompt: string;
context?: string[];
tier?: 'cheap' | 'mid' | 'quality';
temperature?: number;
max_tokens?: number;
}
export interface AIResponse {
content: string;
model: string;
tokens_used: number;
sources?: string[];
}
class AIClient {
private client: any;
constructor() {
this.client = createLiteLLMClient({
apiKey: process.env.LITELLM_API_KEY,
baseURL: process.env.LITELLM_BASE_URL,
});
}
async generate(request: AIRequest): Promise<AIResponse> {
const { tier = 'cheap', prompt, context = [], ...options } = request;
// Select model based on tier
const model = this.getModelForTier(tier);
// Build prompt with context
const fullPrompt = this.buildPrompt(prompt, context);
const response = await this.client.chat.completions.create({
model,
messages: [
{ role: 'system', content: this.getSystemPrompt() },
{ role: 'user', content: fullPrompt }
],
temperature: options.temperature || 0.7,
max_tokens: options.max_tokens || 1000,
});
return {
content: response.choices[0].message.content,
model: response.model,
tokens_used: response.usage.total_tokens,
};
}
private getModelForTier(tier: string): string {
const models = {
cheap: 'gemini-flash-1.5', // Fast, inexpensive
mid: 'claude-3-haiku-20240307', // Balanced
quality: 'claude-3-sonnet-20240229' // High quality
};
return models[tier as keyof typeof models];
}
private buildPrompt(prompt: string, context: string[]): string {
if (context.length === 0) return prompt;
const contextText = context
.map((doc, i) => `[Document ${i + 1}]\n${doc}`)
.join('\n\n');
return `
User Question: ${prompt}
Relevant Context:
${contextText}
Instructions:
- Answer based ONLY on the provided context
- If the answer is not in the context, say "I don't have enough information"
- Cite sources using [Document 1], [Document 2], etc.
- Be concise and factual
`;
}
private getSystemPrompt(): string {
return `You are Visita, a civic intelligence assistant. You help users understand their local community by providing information about wards, businesses, safety, and civic matters.
Rules:
1. Only use information provided in the context
2. Never make up facts or speculate
3. Cite your sources clearly
4. If unsure, say "I don't have enough information"
5. Be helpful, factual, and concise`;
}
}
export const aiClient = new AIClient();
2. RAG Pipeline Implementation
import { createClient } from '@/lib/supabase/server';
import { aiClient } from '@/services/ai/client';
export interface RAGRequest {
query: string;
wardCode?: string;
maxDocuments?: number;
tier?: 'cheap' | 'mid' | 'quality';
}
export interface RAGResponse {
answer: string;
sources: DocumentSource[];
metadata: {
documents_retrieved: number;
model_used: string;
tokens_used: number;
};
}
interface DocumentSource {
id: string;
title: string;
snippet: string;
url?: string;
timestamp: string;
}
export class RAGPipeline {
async process(request: RAGRequest): Promise<RAGResponse> {
const { query, wardCode, maxDocuments = 5, tier = 'mid' } = request;
// Step 1: Retrieve relevant documents
const documents = await this.retrieveDocuments(query, wardCode, maxDocuments);
if (documents.length === 0) {
return {
answer: "I don't have enough information to answer that question. Please try a different query or check back later.",
sources: [],
metadata: {
documents_retrieved: 0,
model_used: 'none',
tokens_used: 0
}
};
}
// Step 2: Generate answer with context
const context = documents.map(doc => doc.content);
const aiResponse = await aiClient.generate({
prompt: query,
context,
tier
});
// Step 3: Format response with sources
const sources = documents.map(doc => ({
id: doc.id,
title: doc.title,
snippet: doc.snippet,
url: doc.url,
timestamp: doc.timestamp
}));
return {
answer: aiResponse.content,
sources,
metadata: {
documents_retrieved: documents.length,
model_used: aiResponse.model,
tokens_used: aiResponse.tokens_used
}
};
}
private async retrieveDocuments(
query: string,
wardCode?: string,
limit: number = 5
): Promise<any[]> {
const supabase = await createClient();
// Embed query for semantic search
const queryEmbedding = await this.embedText(query);
// Build query with optional ward filter
let queryBuilder = supabase
.from('knowledge_documents')
.select('id, title, content, snippet, url, timestamp, embedding')
.limit(limit);
// Add ward filter if provided
if (wardCode) {
queryBuilder = queryBuilder.eq('ward_code', wardCode);
}
// Add semantic search using pg_vector
queryBuilder = queryBuilder
.select('*, similarity:embedding <-> ${queryEmbedding}::vector')
.order('similarity', { ascending: true });
const { data, error } = await queryBuilder;
if (error) {
console.error('Error retrieving documents:', error);
return [];
}
return data || [];
}
private async embedText(text: string): Promise<number[]> {
// Use OpenAI embeddings API
const response = await fetch('https://api.openai.com/v1/embeddings', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.OPENAI_API_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: 'text-embedding-3-small',
input: text
})
});
const data = await response.json();
return data.data[0].embedding;
}
}
export const ragPipeline = new RAGPipeline();
3. Knowledge Base Ingestion
import { createClient } from '@/lib/supabase/server';
import { ragPipeline } from '@/lib/ai/rag';
interface Document {
id: string;
title: string;
content: string;
url?: string;
wardCode?: string;
timestamp: string;
}
export async function ingestDocument(doc: Document) {
const supabase = await createClient();
// Step 1: Generate embedding
const embedding = await ragPipeline['embedText'](doc.content);
// Step 2: Create snippet (first 200 chars)
const snippet = doc.content.slice(0, 200) + '...';
// Step 3: Insert into database
const { data, error } = await supabase
.from('knowledge_documents')
.insert({
id: doc.id,
title: doc.title,
content: doc.content,
snippet,
url: doc.url,
ward_code: doc.wardCode,
timestamp: doc.timestamp,
embedding
});
if (error) {
console.error('Error ingesting document:', error);
throw error;
}
console.log(`✅ Document ingested: ${doc.title}`);
return data;
}
// Example usage
const document: Document = {
id: 'ward-12-safety-report-2025',
title: 'Ward 12 Safety Report - Q3 2025',
content: 'Ward 12 has seen a 15% decrease in crime...',
url: '/reports/ward-12-safety-q3-2025',
wardCode: 'WARD012',
timestamp: new Date().toISOString()
};
await ingestDocument(document);
AI Governance
Guiding Principles
AI is never a source of truth. All AI outputs must be derived from existing data and clearly labeled as AI-generated.
- AI is never a source of truth
- AI outputs are always derived, explainable, and replaceable
- AI must not silently influence civic, legal, or financial outcomes
- AI usage must be cost-controlled and auditable
- Humans retain ultimate authority over interpretation and action
Permitted AI Use Cases
✅ Summarization:
- Ward intelligence summaries
- Crime trend explanations
- Weather and rain pattern descriptions
- Aggregated alert context
✅ Classification & Tagging:
- Crime categorization assistance
- Topic tagging for listings or institutions
- Content grouping for search and discovery
✅ Pattern Explanation (NOT Detection):
- Explaining already-computed trends
- Translating statistical outputs into human-readable language
AI may explain patterns, but may not detect or infer new facts that are not present in the underlying data.
Prohibited AI Use Cases
❌ Generating or modifying source-of-truth data
❌ Making automated decisions affecting:
- User permissions
- Legal classification
- Payment outcomes
- Enforcement actions
❌ Real-time moderation without human oversight
❌ Predictive policing or profiling
❌ Sentiment analysis on individuals
❌ Automated content publishing without review
AI Output Data Model
All AI outputs must be stored as derived data:
Example: AI summaries table
CREATE TABLE ward_ai_summaries (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
ward_code TEXT REFERENCES wards(ward_code),
summary_type TEXT NOT NULL, -- 'daily', 'weekly', 'safety', 'weather'
content TEXT NOT NULL,
model_used TEXT NOT NULL,
prompt_hash TEXT,
source_data JSONB, -- References to source documents
generated_at TIMESTAMPTZ DEFAULT NOW(),
expires_at TIMESTAMPTZ, -- When to regenerate
is_current BOOLEAN DEFAULT TRUE
);
-- Index for fast lookups
CREATE INDEX idx_ward_ai_summaries_ward_type
ON ward_ai_summaries(ward_code, summary_type, is_current)
WHERE is_current = TRUE;
Required Properties:
- ✅ Clearly labeled as AI-generated
- ✅ Linked to source data
- ✅ Generation timestamp
- ✅ Model used
- ✅ Safe to delete and regenerate at any time
AI outputs are cacheable artifacts, not permanent records.
Write Constraints
AI systems may only write to:
*_ai_summaries
*_ai_explanations
*_ai_annotations
- Other explicitly designated derived tables
AI systems must never write directly to:
- Core civic data tables
- Crime incident records
- Directory source data
- User profiles
- Financial records
Model Tiering & Cost Control
export class AIRouter {
async routeRequest(request: AIRequest): Promise<string> {
const { tier = 'cheap', prompt, context } = request;
// Log request for auditing
await this.logRequest(request);
// Route to appropriate model
switch (tier) {
case 'cheap':
return 'gemini-flash-1.5'; // $0.075 per 1M tokens
case 'mid':
return 'claude-3-haiku-20240307'; // $0.25 per 1M tokens
case 'quality':
return 'claude-3-sonnet-20240229'; // $3.00 per 1M tokens
default:
throw new Error(`Invalid tier: ${tier}`);
}
}
private async logRequest(request: AIRequest) {
const supabase = await createClient();
await supabase.from('ai_request_logs').insert({
prompt_hash: this.hashPrompt(request.prompt),
tier: request.tier,
context_count: request.context?.length || 0,
timestamp: new Date().toISOString()
});
}
private hashPrompt(prompt: string): string {
// Simple hash for tracking (not sensitive data)
return require('crypto')
.createHash('md5')
.update(prompt)
.digest('hex');
}
}
Best Practices
1. Always Provide Context
Never let AI generate answers without context. This prevents hallucination and ensures factual accuracy.
// ❌ Bad: No context
const answer = await aiClient.generate({
prompt: "What's happening in Ward 12?"
});
// ✅ Good: With context
const documents = await retrieveWardDocuments('WARD012');
const answer = await aiClient.generate({
prompt: "What's happening in Ward 12?",
context: documents.map(d => d.content)
});
2. Use Tiered Routing
Route requests to appropriate models based on importance and cost.
// Low-cost for simple tasks
await aiClient.generate({
prompt: "Classify this crime report",
tier: 'cheap' // Gemini Flash
});
// High-quality for public-facing content
await aiClient.generate({
prompt: "Generate daily ward summary",
tier: 'quality' // Claude Sonnet
});
3. Implement Source Citation
Always include source citations so users can verify information.
const response = await ragPipeline.process({
query: "What are the safety concerns?",
wardCode: 'WARD012'
});
// Response includes:
// {
// answer: "Based on recent reports...",
// sources: [
// { id: 'doc-1', title: 'Crime Report Q3', snippet: '...' }
// ]
// }
4. Cache AI Responses
Cache AI responses to reduce costs and improve performance.
import { unstable_cache } from 'next/cache';
export async function getWardSummary(wardCode: string) {
return unstable_cache(
async () => {
return await ragPipeline.process({
query: `Generate a daily summary for ${wardCode}`,
wardCode,
tier: 'quality'
});
},
[`ward-summary-${wardCode}`],
{
revalidate: 3600, // 1 hour
tags: [`ward-summary-${wardCode}`]
}
)();
}
Testing AI Features
Unit Tests
import { RAGPipeline } from '@/lib/ai/rag';
import { describe, it, expect, jest } from '@jest/globals';
describe('RAGPipeline', () => {
it('retrieves relevant documents', async () => {
const pipeline = new RAGPipeline();
const documents = await pipeline['retrieveDocuments'](
'safety concerns',
'WARD012',
5
);
expect(documents).toHaveLength(5);
expect(documents[0]).toHaveProperty('id');
expect(documents[0]).toHaveProperty('content');
});
it('generates answers with context', async () => {
const pipeline = new RAGPipeline();
const response = await pipeline.process({
query: 'What are the safety concerns?',
wardCode: 'WARD012'
});
expect(response).toHaveProperty('answer');
expect(response).toHaveProperty('sources');
expect(response.sources.length).toBeGreaterThan(0);
});
it('handles empty results gracefully', async () => {
const pipeline = new RAGPipeline();
const response = await pipeline.process({
query: 'nonexistent query',
wardCode: 'NONEXISTENT'
});
expect(response.answer).toContain("don't have enough information");
expect(response.sources).toHaveLength(0);
});
});
Integration Tests
__tests__/ai/integration.test.ts
import { test, expect } from '@playwright/test';
test('AI assistant provides sourced answers', async ({ page }) => {
// Navigate to ward page
await page.goto('/ward/WARD012');
// Ask question
await page.fill('[data-testid="ai-question-input"]',
'What are the safety concerns?');
await page.click('[data-testid="ai-ask-button"]');
// Wait for response
await page.waitForSelector('[data-testid="ai-answer"]');
// Verify answer exists
const answer = await page.textContent('[data-testid="ai-answer"]');
expect(answer).toBeTruthy();
// Verify sources are shown
const sources = await page.$$('[data-testid="ai-source"]');
expect(sources.length).toBeGreaterThan(0);
});
Common Issues & Solutions
Issue: AI Hallucinations
Symptom: AI generates facts not present in source data
Solution:
// Strengthen system prompt
const systemPrompt = `
You are a civic intelligence assistant. You MUST:
1. Only use information from the provided context
2. If the answer is not in the context, say "I don't have enough information"
3. Never make up facts, names, or statistics
4. Cite sources for every claim
Failure to follow these rules will result in incorrect information.
`;
// Add validation
function validateAnswer(answer: string, context: string[]): boolean {
// Check if answer makes claims not supported by context
// Flag for human review if suspicious
return true;
}
Issue: High AI Costs
Symptom: Monthly AI costs are exceeding budget
Solution:
// Implement cost controls
class AICostController {
private monthlySpend = 0;
private readonly budget = 1000; // $1000/month
async checkBudget(): Promise<boolean> {
if (this.monthlySpend >= this.budget) {
// Fall back to cheaper model
return false;
}
return true;
}
async trackSpend(tokens: number, model: string) {
const cost = this.calculateCost(tokens, model);
this.monthlySpend += cost;
}
private calculateCost(tokens: number, model: string): number {
const rates = {
'gemini-flash-1.5': 0.000075,
'claude-3-haiku-20240307': 0.00025,
'claude-3-sonnet-20240229': 0.003
};
return (tokens / 1_000_000) * rates[model as keyof typeof rates];
}
}
Symptom: AI responses take too long (>5 seconds)
Solution:
// Optimize retrieval
class OptimizedRAGPipeline extends RAGPipeline {
async retrieveDocuments(query: string, wardCode?: string, limit = 5) {
// Use cached embeddings
const cached = await this.checkCache(query, wardCode);
if (cached) return cached;
// Parallelize operations
const [embedding, filters] = await Promise.all([
this.embedText(query),
this.buildFilters(wardCode)
]);
// Use approximate nearest neighbor for speed
const { data } = await supabase
.from('knowledge_documents')
.select('*')
.eq('ward_code', wardCode)
.order('embedding <-> ${embedding}::vector')
.limit(limit);
// Cache results
await this.cacheResults(query, wardCode, data);
return data;
}
}
Summary
Key Takeaways:
- AI is a utility - Summarizes and formats, never generates facts
- RAG prevents hallucination - Ground answers in retrieved context
- Governance is critical - Strict rules for civic-appropriate AI
- Cost control matters - Use tiered routing and caching
- Always cite sources - Users must be able to verify information