Intelligence Engine

Visita’s Intelligence Engine is the backbone of the platform’s awareness capabilities. It automatically gathers, processes, and geo-locates crime and safety news from South African cities, powering features like Ward Safety Maps, Live Dashboard Briefings, and the “On This Day” historical intelligence.

This is an autonomous system that runs 24/7 via GitHub Actions. It requires no manual intervention for daily operation.

Architecture

The engine follows a Search → Scrape → Extract → Locate → Store pipeline:

Core Components

VisitaScout (`lib/intelligence/scout-v2.ts`)

The main orchestration class that drives the entire pipeline.

import { VisitaScout } from '@/lib/intelligence/scout-v2';

const scout = new VisitaScout();

// Standard scan (cron jobs)
await scout.investigateCity('Johannesburg', 'shallow');

// Deep scan (user-triggered or backfill)
await scout.investigateCity('Cape Town', 'deep');

// Historical backfill with reference date
await scout.investigateCity('Durban', 'shallow', '2026-01-05');

Priority Modes:

shallow: 2 queries × 2 results each (routine hourly scans)
deep: 5 queries × 5 results each (includes hijacking, protests, accidents)

Brave Search (`lib/tools/brave.ts`)

Wrapper for the Brave Search API. Returns structured search results with freshness metadata.

import { searchWeb } from '@/lib/tools/brave';

const results = await searchWeb('crime news Johannesburg today', 5, 'pd'); // past day
// Returns: [{ title, url, description, published_age, meta_url }]

Environment Variable: BRAVE_SEARCH_API_KEY

Browser Tool (`lib/tools/browser.ts`)

Uses Playwright with Chromium to fetch full page content, then extracts readable text using Mozilla’s Readability library.

import { fetchPageContent } from '@/lib/tools/browser';

const article = await fetchPageContent('https://example.com/news/crime-report');
// Returns: { title, content, siteName }

This requires Playwright browsers to be installed. In CI/CD, run:

npx playwright install chromium --with-deps

LLM Extraction

The scout uses Qwen 2.5 72B Instruct via OpenRouter to extract structured intelligence from raw article text.

Extracted Fields

Field	Description
`title`	Short headline (max 50 chars)
`summary`	Data-rich summary (max 200 chars)
`description`	Full detailed narrative
`category`	Robbery, Murder, Hijacking, Public Violence, etc.
`severity_level`	1-5 scale (5=Mass Casualty, 4=Murder, 3=Armed Robbery, 2=Theft, 1=Disturbance)
`location_text`	Specific street address or landmark
`venue_name`	Name of business/venue if applicable
`vehicle_details`	Make, model, color, registration
`suspects_count`	Number of perpetrators
`modus_operandi`	Follow-home, Smash-and-grab, Blue-light gang, etc.
`occurred_at`	ISO 8601 timestamp

Geocoding

After extraction, the location_text is geocoded using Mapbox to get precise lat/lng coordinates.

import { geocodeAddress } from '@/app/actions/geocoding';

const coords = await geocodeAddress('Sandton City Mall, Johannesburg, South Africa');
// Returns: { latitude: -26.1075, longitude: 28.0567 }

Only locations that can be geocoded are stored. This ensures every incident has a valid “pin” on the map.

Duplicate Handling

Source Deduplication

Before processing any URL, the scout checks if it already exists in crime_intelligence.crime_reports.source_url.

Spatial Consolidation

Incidents within 100 meters of each other (using Haversine distance) within the last 24 hours are merged to avoid redundant pins.

Database Schema

The intelligence is stored in the crime_intelligence schema:

`incidents` Table

The “pin” on the map. Core location and metadata.

Column	Type	Description
`id`	UUID	Primary key
`title`	TEXT	Short headline
`summary`	TEXT	Data-rich summary
`type`	TEXT	”Crime”, “Accident”, etc.
`crime_category`	TEXT	Robbery, Murder, Hijacking, etc.
`incident_date`	TIMESTAMP	When it occurred
`severity_level`	INT	1-5 scale
`latitude` / `longitude`	FLOAT	Coordinates
`city`	TEXT	City name
`status`	TEXT	Unverified, Confirmed, etc.
`source_url`	TEXT	Original article URL

`crime_reports` Table

Detailed intelligence linked to an incident.

Column	Type	Description
`id`	UUID	Primary key
`incident_id`	UUID	FK to incidents
`description`	TEXT	Full narrative
`category_of_crime`	TEXT	Crime category
`vehicle_details`	TEXT	Vehicle information
`perpetrators_details`	TEXT	Suspect descriptions
`modus_operandi`	TEXT	Attack pattern
`source_url`	TEXT	Original article URL

Scheduled Automation

Hourly Tier 1 Scan (`.github/workflows/intel-tier1-hourly.yml`)

Runs every hour, targeting high-priority metros:

name: Intel Scout - Tier 1 Metros
on:
  schedule:
    - cron: '0 * * * *' # Every hour
jobs:
  scout:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - run: npm install
      - run: npx tsx scripts/run-scout-tier1.ts
        env:
          BRAVE_SEARCH_API_KEY: ${{ secrets.BRAVE_KEY }}
          OPENROUTER_API_KEY: ${{ secrets.OPENROUTER_KEY }}

Daily News Sweep (`.github/workflows/news-intel-daily.yml`)

Runs at 06:00 SAST (04:00 UTC), covering all monitored cities:

name: Daily News Scout
on:
  schedule:
    - cron: '0 4 * * *' # 06:00 SAST
  workflow_dispatch: # Manual trigger
jobs:
  scout-daily-news:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 20
      - run: npm ci
      - run: npx playwright install chromium --with-deps
      - run: npx tsx scripts/news-scout-daily.ts
        env:
          NEXT_PUBLIC_SUPABASE_URL: ${{ secrets.NEXT_PUBLIC_SUPABASE_URL }}
          SUPABASE_SERVICE_ROLE_KEY: ${{ secrets.SUPABASE_SERVICE_ROLE_KEY }}
          OPENROUTER_API_KEY: ${{ secrets.OPENROUTER_API_KEY }}
          BRAVE_SEARCH_API_KEY: ${{ secrets.BRAVE_SEARCH_API_KEY }}

Required Secrets

Ensure these are configured in your GitHub repository settings:

Secret	Purpose
`BRAVE_SEARCH_API_KEY`	Brave Search API access
`OPENROUTER_API_KEY`	LLM access via OpenRouter
`NEXT_PUBLIC_SUPABASE_URL`	Supabase project URL
`SUPABASE_SERVICE_ROLE_KEY`	Admin database access
`NEXT_PUBLIC_MAPBOX_TOKEN`	Geocoding API access

Manual Scripts

Run Scout for a Single City

npx tsx scripts/test-seed.ts

Backfill Historical Data (Time Machine)

npx tsx scripts/seed-recent-intel.ts

This script iterates through 7 major cities over the past 7 days, using the referenceDate parameter to correctly infer dates from article text.

Monitoring

Console Logs: Each step logs progress with emoji indicators:
- 🕵️ Scout started
- ⏭️ Skipping known source
- 📍 Pinning new incident
- 🔄 Merging with existing incident
- 📝 Filing detailed report
- ❌ Error occurred
Supabase Dashboard: Monitor record counts in crime_intelligence.incidents and crime_intelligence.crime_reports.

Future Enhancements

RSS feed ingestion for major news outlets
Community-submitted incident validation
Real-time push notifications for high-severity events
Integration with SAPS official crime statistics

Core Features

Advanced Features

Intelligence Engine

Intelligence Engine

Architecture

Core Components

VisitaScout (`lib/intelligence/scout-v2.ts`)

Brave Search (`lib/tools/brave.ts`)

Browser Tool (`lib/tools/browser.ts`)

LLM Extraction

Extracted Fields

Geocoding

Duplicate Handling

Source Deduplication

Spatial Consolidation

Database Schema

`incidents` Table

`crime_reports` Table

Scheduled Automation

Hourly Tier 1 Scan (`.github/workflows/intel-tier1-hourly.yml`)

Daily News Sweep (`.github/workflows/news-intel-daily.yml`)

Required Secrets

Manual Scripts

Run Scout for a Single City

Backfill Historical Data (Time Machine)

Monitoring

Future Enhancements

Core Features

Advanced Features

​Intelligence Engine

​Architecture

​Core Components

​VisitaScout (lib/intelligence/scout-v2.ts)

​Brave Search (lib/tools/brave.ts)

​Browser Tool (lib/tools/browser.ts)

​LLM Extraction

​Extracted Fields

​Geocoding

​Duplicate Handling

​Source Deduplication

​Spatial Consolidation

​Database Schema

​incidents Table

​crime_reports Table

​Scheduled Automation

​Hourly Tier 1 Scan (.github/workflows/intel-tier1-hourly.yml)

​Daily News Sweep (.github/workflows/news-intel-daily.yml)

​Required Secrets

​Manual Scripts

​Run Scout for a Single City

​Backfill Historical Data (Time Machine)

​Monitoring

​Future Enhancements

Intelligence Engine

Architecture

Core Components

VisitaScout (`lib/intelligence/scout-v2.ts`)

Brave Search (`lib/tools/brave.ts`)

Browser Tool (`lib/tools/browser.ts`)

LLM Extraction

Extracted Fields

Geocoding

Duplicate Handling

Source Deduplication

Spatial Consolidation

Database Schema

`incidents` Table

`crime_reports` Table

Scheduled Automation

Hourly Tier 1 Scan (`.github/workflows/intel-tier1-hourly.yml`)

Daily News Sweep (`.github/workflows/news-intel-daily.yml`)

Required Secrets

Manual Scripts

Run Scout for a Single City

Backfill Historical Data (Time Machine)

Monitoring

Future Enhancements