Intelligence Engine
Visita’s Intelligence Engine is the backbone of the platform’s awareness capabilities. It automatically gathers, processes, and geo-locates crime and safety news from South African cities, powering features like Ward Safety Maps, Live Dashboard Briefings, and the “On This Day” historical intelligence.
This is an autonomous system that runs 24/7 via GitHub Actions. It requires no manual intervention for daily operation.
Architecture
The engine follows a Search → Scrape → Extract → Locate → Store pipeline:
Core Components
VisitaScout (lib/intelligence/scout-v2.ts)
The main orchestration class that drives the entire pipeline.
import { VisitaScout } from '@/lib/intelligence/scout-v2';
const scout = new VisitaScout();
// Standard scan (cron jobs)
await scout.investigateCity('Johannesburg', 'shallow');
// Deep scan (user-triggered or backfill)
await scout.investigateCity('Cape Town', 'deep');
// Historical backfill with reference date
await scout.investigateCity('Durban', 'shallow', '2026-01-05');
Priority Modes:
shallow: 2 queries × 2 results each (routine hourly scans)
deep: 5 queries × 5 results each (includes hijacking, protests, accidents)
Wrapper for the Brave Search API. Returns structured search results with freshness metadata.
import { searchWeb } from '@/lib/tools/brave';
const results = await searchWeb('crime news Johannesburg today', 5, 'pd'); // past day
// Returns: [{ title, url, description, published_age, meta_url }]
Environment Variable: BRAVE_SEARCH_API_KEY
Uses Playwright with Chromium to fetch full page content, then extracts readable text using Mozilla’s Readability library.
import { fetchPageContent } from '@/lib/tools/browser';
const article = await fetchPageContent('https://example.com/news/crime-report');
// Returns: { title, content, siteName }
This requires Playwright browsers to be installed. In CI/CD, run:npx playwright install chromium --with-deps
The scout uses Qwen 2.5 72B Instruct via OpenRouter to extract structured intelligence from raw article text.
| Field | Description |
|---|
title | Short headline (max 50 chars) |
summary | Data-rich summary (max 200 chars) |
description | Full detailed narrative |
category | Robbery, Murder, Hijacking, Public Violence, etc. |
severity_level | 1-5 scale (5=Mass Casualty, 4=Murder, 3=Armed Robbery, 2=Theft, 1=Disturbance) |
location_text | Specific street address or landmark |
venue_name | Name of business/venue if applicable |
vehicle_details | Make, model, color, registration |
suspects_count | Number of perpetrators |
modus_operandi | Follow-home, Smash-and-grab, Blue-light gang, etc. |
occurred_at | ISO 8601 timestamp |
Geocoding
After extraction, the location_text is geocoded using Mapbox to get precise lat/lng coordinates.
import { geocodeAddress } from '@/app/actions/geocoding';
const coords = await geocodeAddress('Sandton City Mall, Johannesburg, South Africa');
// Returns: { latitude: -26.1075, longitude: 28.0567 }
Only locations that can be geocoded are stored. This ensures every incident has a valid “pin” on the map.
Duplicate Handling
Source Deduplication
Before processing any URL, the scout checks if it already exists in crime_intelligence.crime_reports.source_url.
Spatial Consolidation
Incidents within 100 meters of each other (using Haversine distance) within the last 24 hours are merged to avoid redundant pins.
Database Schema
The intelligence is stored in the crime_intelligence schema:
incidents Table
The “pin” on the map. Core location and metadata.
| Column | Type | Description |
|---|
id | UUID | Primary key |
title | TEXT | Short headline |
summary | TEXT | Data-rich summary |
type | TEXT | ”Crime”, “Accident”, etc. |
crime_category | TEXT | Robbery, Murder, Hijacking, etc. |
incident_date | TIMESTAMP | When it occurred |
severity_level | INT | 1-5 scale |
latitude / longitude | FLOAT | Coordinates |
city | TEXT | City name |
status | TEXT | Unverified, Confirmed, etc. |
source_url | TEXT | Original article URL |
crime_reports Table
Detailed intelligence linked to an incident.
| Column | Type | Description |
|---|
id | UUID | Primary key |
incident_id | UUID | FK to incidents |
description | TEXT | Full narrative |
category_of_crime | TEXT | Crime category |
vehicle_details | TEXT | Vehicle information |
perpetrators_details | TEXT | Suspect descriptions |
modus_operandi | TEXT | Attack pattern |
source_url | TEXT | Original article URL |
Scheduled Automation
Hourly Tier 1 Scan (.github/workflows/intel-tier1-hourly.yml)
Runs every hour, targeting high-priority metros:
name: Intel Scout - Tier 1 Metros
on:
schedule:
- cron: '0 * * * *' # Every hour
jobs:
scout:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- run: npm install
- run: npx tsx scripts/run-scout-tier1.ts
env:
BRAVE_SEARCH_API_KEY: ${{ secrets.BRAVE_KEY }}
OPENROUTER_API_KEY: ${{ secrets.OPENROUTER_KEY }}
Daily News Sweep (.github/workflows/news-intel-daily.yml)
Runs at 06:00 SAST (04:00 UTC), covering all monitored cities:
name: Daily News Scout
on:
schedule:
- cron: '0 4 * * *' # 06:00 SAST
workflow_dispatch: # Manual trigger
jobs:
scout-daily-news:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 20
- run: npm ci
- run: npx playwright install chromium --with-deps
- run: npx tsx scripts/news-scout-daily.ts
env:
NEXT_PUBLIC_SUPABASE_URL: ${{ secrets.NEXT_PUBLIC_SUPABASE_URL }}
SUPABASE_SERVICE_ROLE_KEY: ${{ secrets.SUPABASE_SERVICE_ROLE_KEY }}
OPENROUTER_API_KEY: ${{ secrets.OPENROUTER_API_KEY }}
BRAVE_SEARCH_API_KEY: ${{ secrets.BRAVE_SEARCH_API_KEY }}
Required Secrets
Ensure these are configured in your GitHub repository settings:
| Secret | Purpose |
|---|
BRAVE_SEARCH_API_KEY | Brave Search API access |
OPENROUTER_API_KEY | LLM access via OpenRouter |
NEXT_PUBLIC_SUPABASE_URL | Supabase project URL |
SUPABASE_SERVICE_ROLE_KEY | Admin database access |
NEXT_PUBLIC_MAPBOX_TOKEN | Geocoding API access |
Manual Scripts
Run Scout for a Single City
npx tsx scripts/test-seed.ts
Backfill Historical Data (Time Machine)
npx tsx scripts/seed-recent-intel.ts
This script iterates through 7 major cities over the past 7 days, using the referenceDate parameter to correctly infer dates from article text.
Monitoring
-
Console Logs: Each step logs progress with emoji indicators:
🕵️ Scout started
⏭️ Skipping known source
📍 Pinning new incident
🔄 Merging with existing incident
📝 Filing detailed report
❌ Error occurred
-
Supabase Dashboard: Monitor record counts in
crime_intelligence.incidents and crime_intelligence.crime_reports.
Future Enhancements