Skip to main content

Intelligence Engine

Visita’s Intelligence Engine is the backbone of the platform’s awareness capabilities. It automatically gathers, processes, and geo-locates crime and safety news from South African cities, powering features like Ward Safety Maps, Live Dashboard Briefings, and the “On This Day” historical intelligence.
This is an autonomous system that runs 24/7 via GitHub Actions. It requires no manual intervention for daily operation.

Architecture

The engine follows a Search → Scrape → Extract → Locate → Store pipeline:

Core Components

VisitaScout (lib/intelligence/scout-v2.ts)

The main orchestration class that drives the entire pipeline.
import { VisitaScout } from '@/lib/intelligence/scout-v2';

const scout = new VisitaScout();

// Standard scan (cron jobs)
await scout.investigateCity('Johannesburg', 'shallow');

// Deep scan (user-triggered or backfill)
await scout.investigateCity('Cape Town', 'deep');

// Historical backfill with reference date
await scout.investigateCity('Durban', 'shallow', '2026-01-05');
Priority Modes:
  • shallow: 2 queries × 2 results each (routine hourly scans)
  • deep: 5 queries × 5 results each (includes hijacking, protests, accidents)

Brave Search (lib/tools/brave.ts)

Wrapper for the Brave Search API. Returns structured search results with freshness metadata.
import { searchWeb } from '@/lib/tools/brave';

const results = await searchWeb('crime news Johannesburg today', 5, 'pd'); // past day
// Returns: [{ title, url, description, published_age, meta_url }]
Environment Variable: BRAVE_SEARCH_API_KEY

Browser Tool (lib/tools/browser.ts)

Uses Playwright with Chromium to fetch full page content, then extracts readable text using Mozilla’s Readability library.
import { fetchPageContent } from '@/lib/tools/browser';

const article = await fetchPageContent('https://example.com/news/crime-report');
// Returns: { title, content, siteName }
This requires Playwright browsers to be installed. In CI/CD, run:
npx playwright install chromium --with-deps

LLM Extraction

The scout uses Qwen 2.5 72B Instruct via OpenRouter to extract structured intelligence from raw article text.

Extracted Fields

FieldDescription
titleShort headline (max 50 chars)
summaryData-rich summary (max 200 chars)
descriptionFull detailed narrative
categoryRobbery, Murder, Hijacking, Public Violence, etc.
severity_level1-5 scale (5=Mass Casualty, 4=Murder, 3=Armed Robbery, 2=Theft, 1=Disturbance)
location_textSpecific street address or landmark
venue_nameName of business/venue if applicable
vehicle_detailsMake, model, color, registration
suspects_countNumber of perpetrators
modus_operandiFollow-home, Smash-and-grab, Blue-light gang, etc.
occurred_atISO 8601 timestamp

Geocoding

After extraction, the location_text is geocoded using Mapbox to get precise lat/lng coordinates.
import { geocodeAddress } from '@/app/actions/geocoding';

const coords = await geocodeAddress('Sandton City Mall, Johannesburg, South Africa');
// Returns: { latitude: -26.1075, longitude: 28.0567 }
Only locations that can be geocoded are stored. This ensures every incident has a valid “pin” on the map.

Duplicate Handling

Source Deduplication

Before processing any URL, the scout checks if it already exists in crime_intelligence.crime_reports.source_url.

Spatial Consolidation

Incidents within 100 meters of each other (using Haversine distance) within the last 24 hours are merged to avoid redundant pins.

Database Schema

The intelligence is stored in the crime_intelligence schema:

incidents Table

The “pin” on the map. Core location and metadata.
ColumnTypeDescription
idUUIDPrimary key
titleTEXTShort headline
summaryTEXTData-rich summary
typeTEXT”Crime”, “Accident”, etc.
crime_categoryTEXTRobbery, Murder, Hijacking, etc.
incident_dateTIMESTAMPWhen it occurred
severity_levelINT1-5 scale
latitude / longitudeFLOATCoordinates
cityTEXTCity name
statusTEXTUnverified, Confirmed, etc.
source_urlTEXTOriginal article URL

crime_reports Table

Detailed intelligence linked to an incident.
ColumnTypeDescription
idUUIDPrimary key
incident_idUUIDFK to incidents
descriptionTEXTFull narrative
category_of_crimeTEXTCrime category
vehicle_detailsTEXTVehicle information
perpetrators_detailsTEXTSuspect descriptions
modus_operandiTEXTAttack pattern
source_urlTEXTOriginal article URL

Scheduled Automation

Hourly Tier 1 Scan (.github/workflows/intel-tier1-hourly.yml)

Runs every hour, targeting high-priority metros:
name: Intel Scout - Tier 1 Metros
on:
  schedule:
    - cron: '0 * * * *' # Every hour
jobs:
  scout:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - run: npm install
      - run: npx tsx scripts/run-scout-tier1.ts
        env:
          BRAVE_SEARCH_API_KEY: ${{ secrets.BRAVE_KEY }}
          OPENROUTER_API_KEY: ${{ secrets.OPENROUTER_KEY }}

Daily News Sweep (.github/workflows/news-intel-daily.yml)

Runs at 06:00 SAST (04:00 UTC), covering all monitored cities:
name: Daily News Scout
on:
  schedule:
    - cron: '0 4 * * *' # 06:00 SAST
  workflow_dispatch: # Manual trigger
jobs:
  scout-daily-news:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 20
      - run: npm ci
      - run: npx playwright install chromium --with-deps
      - run: npx tsx scripts/news-scout-daily.ts
        env:
          NEXT_PUBLIC_SUPABASE_URL: ${{ secrets.NEXT_PUBLIC_SUPABASE_URL }}
          SUPABASE_SERVICE_ROLE_KEY: ${{ secrets.SUPABASE_SERVICE_ROLE_KEY }}
          OPENROUTER_API_KEY: ${{ secrets.OPENROUTER_API_KEY }}
          BRAVE_SEARCH_API_KEY: ${{ secrets.BRAVE_SEARCH_API_KEY }}

Required Secrets

Ensure these are configured in your GitHub repository settings:
SecretPurpose
BRAVE_SEARCH_API_KEYBrave Search API access
OPENROUTER_API_KEYLLM access via OpenRouter
NEXT_PUBLIC_SUPABASE_URLSupabase project URL
SUPABASE_SERVICE_ROLE_KEYAdmin database access
NEXT_PUBLIC_MAPBOX_TOKENGeocoding API access

Manual Scripts

Run Scout for a Single City

npx tsx scripts/test-seed.ts

Backfill Historical Data (Time Machine)

npx tsx scripts/seed-recent-intel.ts
This script iterates through 7 major cities over the past 7 days, using the referenceDate parameter to correctly infer dates from article text.

Monitoring

  • Console Logs: Each step logs progress with emoji indicators:
    • 🕵️ Scout started
    • ⏭️ Skipping known source
    • 📍 Pinning new incident
    • 🔄 Merging with existing incident
    • 📝 Filing detailed report
    • Error occurred
  • Supabase Dashboard: Monitor record counts in crime_intelligence.incidents and crime_intelligence.crime_reports.

Future Enhancements

  • RSS feed ingestion for major news outlets
  • Community-submitted incident validation
  • Real-time push notifications for high-severity events
  • Integration with SAPS official crime statistics