AI Integration & RAG Pipeline

This guide covers Visita’s AI integration architecture, focusing on the Retrieval-Augmented Generation (RAG) pipeline for knowledge-based question answering and the governance policies that ensure responsible AI usage.

AI is a Utility, Not a Decision-Maker: AI in Visita summarizes and formats existing data, never generates new facts or makes authoritative decisions.

Architecture Overview

AI Service Layer

┌─────────────────────────────────────────────────────────────┐
│                    Application Layer                         │
│  Components → Server Actions → AI Service Layer              │
└─────────────────────────────────────────────────────────────┘
                              │
┌─────────────────────────────────────────────────────────────┐
│                    AI Service Layer                          │
│  /services/ai/client.ts  (Provider Agnostic)                 │
│  /services/ai/router.ts  (Tiered Routing)                    │
└─────────────────────────────────────────────────────────────┘
                              │
┌─────────────────────────────────────────────────────────────┐
│                  AI Provider Layer                           │
│  LiteLLM / OpenRouter (Model Orchestration)                  │
└─────────────────────────────────────────────────────────────┘
                              │
┌─────────────────────────────────────────────────────────────┐
│                    LLM Layer                                 │
│  Tier 1: Gemini Flash (Cheap, Fast)                          │
│  Tier 2: Claude Haiku (Mid, Balanced)                        │
│  Tier 3: Claude Sonnet (Quality, Reasoning)                  │
└─────────────────────────────────────────────────────────────┘

RAG Pipeline Architecture

┌─────────────────────────────────────────────────────────────┐
│                    User Query                                │
│  "What are the safety concerns in Ward 12?"                  │
└─────────────────────────────────────────────────────────────┘
                              │
┌─────────────────────────────────────────────────────────────┐
│              Retrieval Phase                                 │
│  1. Embed query (text → vector)                              │
│  2. Similarity search in vector DB                           │
│  3. Retrieve top-k relevant documents                        │
└─────────────────────────────────────────────────────────────┘
                              │
┌─────────────────────────────────────────────────────────────┐
│              Augmentation Phase                              │
│  1. Combine query + retrieved context                        │
│  2. Format prompt with instructions                          │
│  3. Add metadata (sources, timestamps)                       │
└─────────────────────────────────────────────────────────────┘
                              │
┌─────────────────────────────────────────────────────────────┐
│              Generation Phase                                │
│  1. Send prompt to LLM                                       │
│  2. Generate answer based on context                         │
│  3. Include source citations                                 │
└─────────────────────────────────────────────────────────────┘
                              │
┌─────────────────────────────────────────────────────────────┐
│                    Response                                  │
│  "Based on recent crime reports, Ward 12 has..."             │
│  [Sources: Crime Report #123, Safety Score Q3]               │
└─────────────────────────────────────────────────────────────┘

Implementation

1. AI Service Layer

Provider Agnostic: The AI service layer abstracts away specific LLM providers, making it easy to switch models or providers.

services/ai/client.ts

import { createClient as createLiteLLMClient } from 'litellm';

export interface AIRequest {
  prompt: string;
  context?: string[];
  tier?: 'cheap' | 'mid' | 'quality';
  temperature?: number;
  max_tokens?: number;
}

export interface AIResponse {
  content: string;
  model: string;
  tokens_used: number;
  sources?: string[];
}

class AIClient {
  private client: any;

  constructor() {
    this.client = createLiteLLMClient({
      apiKey: process.env.LITELLM_API_KEY,
      baseURL: process.env.LITELLM_BASE_URL,
    });
  }

  async generate(request: AIRequest): Promise<AIResponse> {
    const { tier = 'cheap', prompt, context = [], ...options } = request;

    // Select model based on tier
    const model = this.getModelForTier(tier);

    // Build prompt with context
    const fullPrompt = this.buildPrompt(prompt, context);

    const response = await this.client.chat.completions.create({
      model,
      messages: [
        { role: 'system', content: this.getSystemPrompt() },
        { role: 'user', content: fullPrompt }
      ],
      temperature: options.temperature || 0.7,
      max_tokens: options.max_tokens || 1000,
    });

    return {
      content: response.choices[0].message.content,
      model: response.model,
      tokens_used: response.usage.total_tokens,
    };
  }

  private getModelForTier(tier: string): string {
    const models = {
      cheap: 'gemini-flash-1.5',      // Fast, inexpensive
      mid: 'claude-3-haiku-20240307', // Balanced
      quality: 'claude-3-sonnet-20240229' // High quality
    };
    return models[tier as keyof typeof models];
  }

  private buildPrompt(prompt: string, context: string[]): string {
    if (context.length === 0) return prompt;

    const contextText = context
      .map((doc, i) => `[Document ${i + 1}]\n${doc}`)
      .join('\n\n');

    return `
User Question: ${prompt}

Relevant Context:
${contextText}

Instructions:
- Answer based ONLY on the provided context
- If the answer is not in the context, say "I don't have enough information"
- Cite sources using [Document 1], [Document 2], etc.
- Be concise and factual
`;
  }

  private getSystemPrompt(): string {
    return `You are Visita, a civic intelligence assistant. You help users understand their local community by providing information about wards, businesses, safety, and civic matters.

Rules:
1. Only use information provided in the context
2. Never make up facts or speculate
3. Cite your sources clearly
4. If unsure, say "I don't have enough information"
5. Be helpful, factual, and concise`;
  }
}

export const aiClient = new AIClient();

2. RAG Pipeline Implementation

lib/ai/rag.ts

import { createClient } from '@/lib/supabase/server';
import { aiClient } from '@/services/ai/client';

export interface RAGRequest {
  query: string;
  wardCode?: string;
  maxDocuments?: number;
  tier?: 'cheap' | 'mid' | 'quality';
}

export interface RAGResponse {
  answer: string;
  sources: DocumentSource[];
  metadata: {
    documents_retrieved: number;
    model_used: string;
    tokens_used: number;
  };
}

interface DocumentSource {
  id: string;
  title: string;
  snippet: string;
  url?: string;
  timestamp: string;
}

export class RAGPipeline {
  async process(request: RAGRequest): Promise<RAGResponse> {
    const { query, wardCode, maxDocuments = 5, tier = 'mid' } = request;

    // Step 1: Retrieve relevant documents
    const documents = await this.retrieveDocuments(query, wardCode, maxDocuments);

    if (documents.length === 0) {
      return {
        answer: "I don't have enough information to answer that question. Please try a different query or check back later.",
        sources: [],
        metadata: {
          documents_retrieved: 0,
          model_used: 'none',
          tokens_used: 0
        }
      };
    }

    // Step 2: Generate answer with context
    const context = documents.map(doc => doc.content);
    const aiResponse = await aiClient.generate({
      prompt: query,
      context,
      tier
    });

    // Step 3: Format response with sources
    const sources = documents.map(doc => ({
      id: doc.id,
      title: doc.title,
      snippet: doc.snippet,
      url: doc.url,
      timestamp: doc.timestamp
    }));

    return {
      answer: aiResponse.content,
      sources,
      metadata: {
        documents_retrieved: documents.length,
        model_used: aiResponse.model,
        tokens_used: aiResponse.tokens_used
      }
    };
  }

  private async retrieveDocuments(
    query: string, 
    wardCode?: string, 
    limit: number = 5
  ): Promise<any[]> {
    const supabase = await createClient();

    // Embed query for semantic search
    const queryEmbedding = await this.embedText(query);

    // Build query with optional ward filter
    let queryBuilder = supabase
      .from('knowledge_documents')
      .select('id, title, content, snippet, url, timestamp, embedding')
      .limit(limit);

    // Add ward filter if provided
    if (wardCode) {
      queryBuilder = queryBuilder.eq('ward_code', wardCode);
    }

    // Add semantic search using pg_vector
    queryBuilder = queryBuilder
      .select('*, similarity:embedding <-> ${queryEmbedding}::vector')
      .order('similarity', { ascending: true });

    const { data, error } = await queryBuilder;

    if (error) {
      console.error('Error retrieving documents:', error);
      return [];
    }

    return data || [];
  }

  private async embedText(text: string): Promise<number[]> {
    // Use OpenAI embeddings API
    const response = await fetch('https://api.openai.com/v1/embeddings', {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${process.env.OPENAI_API_KEY}`,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        model: 'text-embedding-3-small',
        input: text
      })
    });

    const data = await response.json();
    return data.data[0].embedding;
  }
}

export const ragPipeline = new RAGPipeline();

3. Knowledge Base Ingestion

scripts/ingest-docs.ts

import { createClient } from '@/lib/supabase/server';
import { ragPipeline } from '@/lib/ai/rag';

interface Document {
  id: string;
  title: string;
  content: string;
  url?: string;
  wardCode?: string;
  timestamp: string;
}

export async function ingestDocument(doc: Document) {
  const supabase = await createClient();

  // Step 1: Generate embedding
  const embedding = await ragPipeline['embedText'](doc.content);

  // Step 2: Create snippet (first 200 chars)
  const snippet = doc.content.slice(0, 200) + '...';

  // Step 3: Insert into database
  const { data, error } = await supabase
    .from('knowledge_documents')
    .insert({
      id: doc.id,
      title: doc.title,
      content: doc.content,
      snippet,
      url: doc.url,
      ward_code: doc.wardCode,
      timestamp: doc.timestamp,
      embedding
    });

  if (error) {
    console.error('Error ingesting document:', error);
    throw error;
  }

  console.log(`✅ Document ingested: ${doc.title}`);
  return data;
}

// Example usage
const document: Document = {
  id: 'ward-12-safety-report-2025',
  title: 'Ward 12 Safety Report - Q3 2025',
  content: 'Ward 12 has seen a 15% decrease in crime...',
  url: '/reports/ward-12-safety-q3-2025',
  wardCode: 'WARD012',
  timestamp: new Date().toISOString()
};

await ingestDocument(document);

AI Governance

Guiding Principles

AI is never a source of truth. All AI outputs must be derived from existing data and clearly labeled as AI-generated.

AI is never a source of truth
AI outputs are always derived, explainable, and replaceable
AI must not silently influence civic, legal, or financial outcomes
AI usage must be cost-controlled and auditable
Humans retain ultimate authority over interpretation and action

Permitted AI Use Cases

✅ Summarization:

Ward intelligence summaries
Crime trend explanations
Weather and rain pattern descriptions
Aggregated alert context

✅ Classification & Tagging:

Crime categorization assistance
Topic tagging for listings or institutions
Content grouping for search and discovery

✅ Pattern Explanation (NOT Detection):

Explaining already-computed trends
Translating statistical outputs into human-readable language

AI may explain patterns, but may not detect or infer new facts that are not present in the underlying data.

Prohibited AI Use Cases

❌ Generating or modifying source-of-truth data ❌ Making automated decisions affecting:

User permissions
Legal classification
Payment outcomes
Enforcement actions ❌ Real-time moderation without human oversight ❌ Predictive policing or profiling ❌ Sentiment analysis on individuals ❌ Automated content publishing without review

AI Output Data Model

All AI outputs must be stored as derived data:

Example: AI summaries table

CREATE TABLE ward_ai_summaries (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  ward_code TEXT REFERENCES wards(ward_code),
  summary_type TEXT NOT NULL, -- 'daily', 'weekly', 'safety', 'weather'
  content TEXT NOT NULL,
  model_used TEXT NOT NULL,
  prompt_hash TEXT,
  source_data JSONB, -- References to source documents
  generated_at TIMESTAMPTZ DEFAULT NOW(),
  expires_at TIMESTAMPTZ, -- When to regenerate
  is_current BOOLEAN DEFAULT TRUE
);

-- Index for fast lookups
CREATE INDEX idx_ward_ai_summaries_ward_type 
ON ward_ai_summaries(ward_code, summary_type, is_current) 
WHERE is_current = TRUE;

Required Properties:

✅ Clearly labeled as AI-generated
✅ Linked to source data
✅ Generation timestamp
✅ Model used
✅ Safe to delete and regenerate at any time

AI outputs are cacheable artifacts, not permanent records.

Write Constraints

AI systems may only write to:

*_ai_summaries
*_ai_explanations
*_ai_annotations
Other explicitly designated derived tables

AI systems must never write directly to:

Core civic data tables
Crime incident records
Directory source data
User profiles
Financial records

Model Tiering & Cost Control

services/ai/router.ts

export class AIRouter {
  async routeRequest(request: AIRequest): Promise<string> {
    const { tier = 'cheap', prompt, context } = request;

    // Log request for auditing
    await this.logRequest(request);

    // Route to appropriate model
    switch (tier) {
      case 'cheap':
        return 'gemini-flash-1.5'; // $0.075 per 1M tokens
      
      case 'mid':
        return 'claude-3-haiku-20240307'; // $0.25 per 1M tokens
      
      case 'quality':
        return 'claude-3-sonnet-20240229'; // $3.00 per 1M tokens
      
      default:
        throw new Error(`Invalid tier: ${tier}`);
    }
  }

  private async logRequest(request: AIRequest) {
    const supabase = await createClient();
    
    await supabase.from('ai_request_logs').insert({
      prompt_hash: this.hashPrompt(request.prompt),
      tier: request.tier,
      context_count: request.context?.length || 0,
      timestamp: new Date().toISOString()
    });
  }

  private hashPrompt(prompt: string): string {
    // Simple hash for tracking (not sensitive data)
    return require('crypto')
      .createHash('md5')
      .update(prompt)
      .digest('hex');
  }
}

Best Practices

1. Always Provide Context

Never let AI generate answers without context. This prevents hallucination and ensures factual accuracy.

// ❌ Bad: No context
const answer = await aiClient.generate({
  prompt: "What's happening in Ward 12?"
});

// ✅ Good: With context
const documents = await retrieveWardDocuments('WARD012');
const answer = await aiClient.generate({
  prompt: "What's happening in Ward 12?",
  context: documents.map(d => d.content)
});

2. Use Tiered Routing

Route requests to appropriate models based on importance and cost.

// Low-cost for simple tasks
await aiClient.generate({
  prompt: "Classify this crime report",
  tier: 'cheap' // Gemini Flash
});

// High-quality for public-facing content
await aiClient.generate({
  prompt: "Generate daily ward summary",
  tier: 'quality' // Claude Sonnet
});

3. Implement Source Citation

Always include source citations so users can verify information.

const response = await ragPipeline.process({
  query: "What are the safety concerns?",
  wardCode: 'WARD012'
});

// Response includes:
// {
//   answer: "Based on recent reports...",
//   sources: [
//     { id: 'doc-1', title: 'Crime Report Q3', snippet: '...' }
//   ]
// }

4. Cache AI Responses

Cache AI responses to reduce costs and improve performance.

lib/ai/cache.ts

import { unstable_cache } from 'next/cache';

export async function getWardSummary(wardCode: string) {
  return unstable_cache(
    async () => {
      return await ragPipeline.process({
        query: `Generate a daily summary for ${wardCode}`,
        wardCode,
        tier: 'quality'
      });
    },
    [`ward-summary-${wardCode}`],
    {
      revalidate: 3600, // 1 hour
      tags: [`ward-summary-${wardCode}`]
    }
  )();
}

Testing AI Features

Unit Tests

__tests__/ai/rag.test.ts

import { RAGPipeline } from '@/lib/ai/rag';
import { describe, it, expect, jest } from '@jest/globals';

describe('RAGPipeline', () => {
  it('retrieves relevant documents', async () => {
    const pipeline = new RAGPipeline();
    const documents = await pipeline['retrieveDocuments'](
      'safety concerns',
      'WARD012',
      5
    );
    
    expect(documents).toHaveLength(5);
    expect(documents[0]).toHaveProperty('id');
    expect(documents[0]).toHaveProperty('content');
  });

  it('generates answers with context', async () => {
    const pipeline = new RAGPipeline();
    const response = await pipeline.process({
      query: 'What are the safety concerns?',
      wardCode: 'WARD012'
    });

    expect(response).toHaveProperty('answer');
    expect(response).toHaveProperty('sources');
    expect(response.sources.length).toBeGreaterThan(0);
  });

  it('handles empty results gracefully', async () => {
    const pipeline = new RAGPipeline();
    const response = await pipeline.process({
      query: 'nonexistent query',
      wardCode: 'NONEXISTENT'
    });

    expect(response.answer).toContain("don't have enough information");
    expect(response.sources).toHaveLength(0);
  });
});

Integration Tests

__tests__/ai/integration.test.ts

import { test, expect } from '@playwright/test';

test('AI assistant provides sourced answers', async ({ page }) => {
  // Navigate to ward page
  await page.goto('/ward/WARD012');

  // Ask question
  await page.fill('[data-testid="ai-question-input"]', 
    'What are the safety concerns?');
  await page.click('[data-testid="ai-ask-button"]');

  // Wait for response
  await page.waitForSelector('[data-testid="ai-answer"]');

  // Verify answer exists
  const answer = await page.textContent('[data-testid="ai-answer"]');
  expect(answer).toBeTruthy();

  // Verify sources are shown
  const sources = await page.$$('[data-testid="ai-source"]');
  expect(sources.length).toBeGreaterThan(0);
});

Common Issues & Solutions

Issue: AI Hallucinations

Symptom: AI generates facts not present in source data Solution:

// Strengthen system prompt
const systemPrompt = `
You are a civic intelligence assistant. You MUST:
1. Only use information from the provided context
2. If the answer is not in the context, say "I don't have enough information"
3. Never make up facts, names, or statistics
4. Cite sources for every claim

Failure to follow these rules will result in incorrect information.
`;

// Add validation
function validateAnswer(answer: string, context: string[]): boolean {
  // Check if answer makes claims not supported by context
  // Flag for human review if suspicious
  return true;
}

Issue: High AI Costs

Symptom: Monthly AI costs are exceeding budget Solution:

// Implement cost controls
class AICostController {
  private monthlySpend = 0;
  private readonly budget = 1000; // $1000/month

  async checkBudget(): Promise<boolean> {
    if (this.monthlySpend >= this.budget) {
      // Fall back to cheaper model
      return false;
    }
    return true;
  }

  async trackSpend(tokens: number, model: string) {
    const cost = this.calculateCost(tokens, model);
    this.monthlySpend += cost;
  }

  private calculateCost(tokens: number, model: string): number {
    const rates = {
      'gemini-flash-1.5': 0.000075,
      'claude-3-haiku-20240307': 0.00025,
      'claude-3-sonnet-20240229': 0.003
    };
    return (tokens / 1_000_000) * rates[model as keyof typeof rates];
  }
}

Issue: Slow RAG Performance

Symptom: AI responses take too long (>5 seconds) Solution:

// Optimize retrieval
class OptimizedRAGPipeline extends RAGPipeline {
  async retrieveDocuments(query: string, wardCode?: string, limit = 5) {
    // Use cached embeddings
    const cached = await this.checkCache(query, wardCode);
    if (cached) return cached;

    // Parallelize operations
    const [embedding, filters] = await Promise.all([
      this.embedText(query),
      this.buildFilters(wardCode)
    ]);

    // Use approximate nearest neighbor for speed
    const { data } = await supabase
      .from('knowledge_documents')
      .select('*')
      .eq('ward_code', wardCode)
      .order('embedding <-> ${embedding}::vector')
      .limit(limit);

    // Cache results
    await this.cacheResults(query, wardCode, data);
    
    return data;
  }
}

Summary

RAG Pipeline

AI Governance

Tiered Routing

Source Citation

Key Takeaways:

AI is a utility - Summarizes and formats, never generates facts
RAG prevents hallucination - Ground answers in retrieved context
Governance is critical - Strict rules for civic-appropriate AI
Cost control matters - Use tiered routing and caching
Always cite sources - Users must be able to verify information

Core Features

Advanced Features

AI Integration & RAG Pipeline

AI Integration & RAG Pipeline

Architecture Overview

AI Service Layer

RAG Pipeline Architecture

Implementation

1. AI Service Layer

2. RAG Pipeline Implementation

3. Knowledge Base Ingestion

AI Governance

Guiding Principles

Permitted AI Use Cases

Prohibited AI Use Cases

AI Output Data Model

Write Constraints

Model Tiering & Cost Control

Best Practices

1. Always Provide Context

2. Use Tiered Routing

3. Implement Source Citation

4. Cache AI Responses

Testing AI Features

Unit Tests

Integration Tests

Common Issues & Solutions

Issue: AI Hallucinations

Issue: High AI Costs

Issue: Slow RAG Performance

Summary

RAG Pipeline

AI Governance

Tiered Routing

Source Citation

Next: Weather Engine

Next: Community Action

Core Features

Advanced Features

​AI Integration & RAG Pipeline

​Architecture Overview

​AI Service Layer

​RAG Pipeline Architecture

​Implementation

​1. AI Service Layer

​2. RAG Pipeline Implementation

​3. Knowledge Base Ingestion

​AI Governance

​Guiding Principles

​Permitted AI Use Cases

​Prohibited AI Use Cases

​AI Output Data Model

​Write Constraints

​Model Tiering & Cost Control

​Best Practices

​1. Always Provide Context

​2. Use Tiered Routing

​3. Implement Source Citation

​4. Cache AI Responses

​Testing AI Features

​Unit Tests

​Integration Tests

​Common Issues & Solutions

​Issue: AI Hallucinations

​Issue: High AI Costs

​Issue: Slow RAG Performance

​Summary

RAG Pipeline

AI Governance

Tiered Routing

Source Citation

Next: Weather Engine

Next: Community Action

AI Integration & RAG Pipeline

Architecture Overview

AI Service Layer

RAG Pipeline Architecture

Implementation

1. AI Service Layer

2. RAG Pipeline Implementation

3. Knowledge Base Ingestion

AI Governance

Guiding Principles

Permitted AI Use Cases

Prohibited AI Use Cases

AI Output Data Model

Write Constraints

Model Tiering & Cost Control

Best Practices

1. Always Provide Context

2. Use Tiered Routing

3. Implement Source Citation

4. Cache AI Responses

Testing AI Features

Unit Tests

Integration Tests

Common Issues & Solutions

Issue: AI Hallucinations

Issue: High AI Costs

Issue: Slow RAG Performance

Summary