Data Modeling Principles
This guide covers Visita’s approach to database design, focusing on the distinction between tables, views, and materialized views, and how to optimize for performance while maintaining data integrity.
Postgres is King: All authoritative data lives in Supabase tables. Views and materialized views are derived data for performance and clarity.
Core Principles
1. Tables Store Truth
All authoritative data resides in normalized tables:
Example: Core business table
CREATE TABLE businesses (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name TEXT NOT NULL,
description TEXT,
location GEOGRAPHY(POINT, 4326),
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
-- Enable RLS for security
ALTER TABLE businesses ENABLE ROW LEVEL SECURITY;
Never store derived data in tables. Tables should only contain user input or authoritative source data.
2. Views Express Logic
Complex business logic, joins, and aggregations belong in SQL Views:
Example: Business directory view with aggregations
CREATE VIEW business_directory AS
SELECT
b.id,
b.name,
b.description,
b.location,
COUNT(p.id) as product_count,
AVG(r.rating) as avg_rating,
ARRAY_AGG(DISTINCT c.name) as categories
FROM businesses b
LEFT JOIN products p ON b.id = p.business_id
LEFT JOIN reviews r ON b.id = r.business_id
LEFT JOIN business_categories bc ON b.id = bc.business_id
LEFT JOIN categories c ON bc.category_id = c.id
GROUP BY b.id, b.name, b.description, b.location;
Benefits:
- ✅ Centralized business logic
- ✅ Real-time data (always current)
- ✅ Easier to maintain than application-level joins
- ✅ Can be queried like a table
Important: PostgreSQL does NOT support indexes on standard views. Performance must come from the underlying tables or materialized views.
Use materialized views only when real-time calculation is too expensive:
Example: Ward statistics materialized view
CREATE MATERIALIZED VIEW ward_statistics AS
SELECT
w.ward_code,
COUNT(b.id) as business_count,
COUNT(p.id) as project_count,
AVG(s.risk_score) as avg_risk_score,
SUM(d.amount) as total_donations
FROM wards w
LEFT JOIN businesses b ON ST_Contains(w.geom, b.location)
LEFT JOIN action_projects p ON w.ward_code = p.ward_code
LEFT JOIN safety_scores s ON w.ward_code = s.ward_code
LEFT JOIN donations d ON w.ward_code = d.ward_code
GROUP BY w.ward_code;
-- Create index for fast lookups
CREATE INDEX idx_ward_statistics_ward_code ON ward_statistics(ward_code);
When to Use Materialized Views:
- ✅ Complex aggregations across large datasets
- ✅ Data staleness (e.g., 1-hour lag) is acceptable
- ✅ Clear refresh strategy exists (cron job or trigger)
- ✅ Performance testing proves standard view is insufficient
When NOT to Use Materialized Views:
- ❌ Simple joins or small datasets
- ❌ Real-time accuracy is critical
- ❌ No clear refresh strategy
- ❌ Haven’t tested standard view performance
Materialization is a performance optimization, not a default. Always start with a standard view and optimize only when metrics demand it.
Indexing Strategy
Indexes are designed around query patterns, not just columns.
1. Ward-Level Queries
Most queries filter by ward_code:
-- Index for ward lookups
CREATE INDEX idx_businesses_ward_code ON businesses(ward_code);
CREATE INDEX idx_projects_ward_code ON action_projects(ward_code);
CREATE INDEX idx_signals_ward_code ON signals(ward_code);
-- Composite index for ward + status
CREATE INDEX idx_projects_ward_status ON action_projects(ward_code, status);
2. Time-Series Analytics
Queries often filter by date ranges:
-- Index for time-based queries
CREATE INDEX idx_events_created_at ON events(created_at DESC);
CREATE INDEX idx_signals_created_at ON signals(created_at DESC);
-- Partial index for recent data only
CREATE INDEX idx_recent_signals ON signals(created_at DESC)
WHERE created_at > NOW() - INTERVAL '90 days';
3. Spatial Queries
PostGIS indexes for geographic queries:
-- GiST index for spatial queries
CREATE INDEX idx_businesses_location ON businesses USING GIST(location);
CREATE INDEX idx_wards_geom ON wards USING GIST(geom);
-- Index for distance queries
CREATE INDEX idx_businesses_location_gist ON businesses
USING GIST(ll_to_earth(location[0], location[1]));
4. Full-Text Search
Enable text search on content fields:
-- Add search vector column
ALTER TABLE businesses ADD COLUMN search_vector tsvector;
UPDATE businesses SET search_vector =
setweight(to_tsvector('english', name), 'A') ||
setweight(to_tsvector('english', description), 'B');
-- Create GIN index for fast text search
CREATE INDEX idx_businesses_search ON businesses USING GIN(search_vector);
-- Create trigger to keep search_vector updated
CREATE TRIGGER update_businesses_search_vector
BEFORE INSERT OR UPDATE ON businesses
FOR EACH ROW EXECUTE FUNCTION tsvector_update_trigger(
search_vector, 'pg_catalog.english', name, description
);
Search & Discovery Architecture
Supabase as Source of Truth
Supabase is King: All authoritative data lives in PostgreSQL tables.
Data flow: Supabase → Algolia
// 1. Data enters via Server Actions
export async function createBusiness(business: Business) {
const supabase = await createClient();
const { data, error } = await supabase
.from('businesses')
.insert(business)
.select()
.single();
if (error) throw error;
// 2. Trigger Algolia sync (async)
await syncToAlgolia('businesses', data);
return data;
}
Algolia as Discovery Layer
Algolia is a Cache: Search indexes are read-only derivatives optimized for speed and faceting.
lib/search/sync-algolia.ts
export async function syncToAlgolia(indexName: string, record: any) {
const algolia = await createAlgoliaClient();
const index = algolia.initIndex(indexName);
// Transform record for search optimization
const searchRecord = {
objectID: record.id,
name: record.name,
description: record.description,
location: record.location,
ward_code: record.ward_code,
// Add computed fields for faceting
_geoloc: {
lat: record.location.coordinates[1],
lng: record.location.coordinates[0]
}
};
await index.saveObject(searchRecord);
}
Benefits:
- ✅ Fast search with typo tolerance
- ✅ Faceting and filtering
- ✅ Geo-search capabilities
- ✅ Analytics and insights
Never write to Algolia directly from user input. Always sync from the authoritative Supabase source.
Directory & Geography Model
Graph-Based Relationships
Listings ≠ Geography: A business listing is an entity that has a location, but is not owned by that location.
Many-to-many relationship: Businesses and Wards
CREATE TABLE business_service_areas (
business_id UUID REFERENCES businesses(id),
ward_code TEXT REFERENCES wards(ward_code),
service_type TEXT, -- 'primary', 'secondary', 'delivery'
PRIMARY KEY (business_id, ward_code)
);
-- A business can serve multiple wards
INSERT INTO business_service_areas VALUES
('business-uuid-1', 'WARD001', 'primary'),
('business-uuid-1', 'WARD002', 'delivery'),
('business-uuid-1', 'WARD003', 'delivery');
The “Place” Layer
Institutions and businesses unified at the “Place” layer:
CREATE VIEW places AS
SELECT
'business' as type,
id,
name,
description,
location,
ward_code
FROM businesses
UNION ALL
SELECT
'institution' as type,
id,
name,
description,
location,
ward_code
FROM institutions;
Benefits:
- ✅ Consistent geospatial querying
- ✅ Unified search across types
- ✅ Simplified map rendering
Canonical URLs
Prevent duplicate content with canonical URLs:
app/ward/[ward_code]/business/[id]/page.tsx
export default function BusinessPage({ params }) {
// Always redirect to canonical URL
const canonicalUrl = `/business/${params.id}`;
// If accessed via ward-specific URL, redirect
if (params.ward_code) {
redirect(canonicalUrl);
}
// Render business page
return <BusinessProfile id={params.id} />;
}
Best Practices
1. Use Generated Types
Never use as any type casting. Use generated Supabase types for type safety.
// ❌ Bad: Manual type casting
const { data } = await supabase
.from('businesses' as any)
.select('*');
return { data: data as any };
// ✅ Good: Generated types
import type { Tables } from '@/types/supabase';
const { data } = await supabase
.from('businesses')
.select('*')
.returns<Tables<'businesses'>[]>();
return { data };
2. Prefer Views Over Application Logic
Push logic to the database when possible for better performance and consistency.
// ❌ Bad: Application-level joins
const businesses = await supabase.from('businesses').select('*');
const products = await supabase.from('products').select('*');
const enriched = businesses.map(business => ({
...business,
products: products.filter(p => p.business_id === business.id)
}));
// ✅ Good: Database view
const { data } = await supabase
.from('business_directory')
.select('*'); // View already includes products
3. Index for Query Patterns
Index for questions, not columns. Design indexes around your actual query patterns.
-- Identify common query patterns
EXPLAIN ANALYZE SELECT * FROM businesses
WHERE ward_code = 'WARD001'
AND created_at > NOW() - INTERVAL '30 days'
AND status = 'active';
-- Create composite index for this pattern
CREATE INDEX idx_businesses_ward_date_status ON businesses
(ward_code, created_at DESC, status)
WHERE status = 'active';
Use EXPLAIN ANALYZE to identify bottlenecks:
-- Analyze query performance
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM ward_statistics
WHERE ward_code = 'WARD001';
-- Look for:
-- - Seq Scan (bad - needs index)
-- - Index Scan (good)
-- - High buffer usage (needs optimization)
Common Anti-Patterns
❌ Indexing Views
-- ❌ This will fail - views can't have indexes
CREATE INDEX idx_business_directory_name ON business_directory(name);
-- ✅ Solution: Index underlying tables or use materialized view
CREATE INDEX idx_businesses_name ON businesses(name);
❌ Forcing Hierarchy
-- ❌ Bad: Rigid hierarchy
CREATE TABLE locations (
id UUID PRIMARY KEY,
parent_id UUID REFERENCES locations(id), -- Rigid tree
type TEXT CHECK (type IN ('country', 'province', 'city', 'ward'))
);
-- ✅ Good: Graph relationships
CREATE TABLE location_relationships (
parent_id UUID REFERENCES locations(id),
child_id UUID REFERENCES locations(id),
relationship_type TEXT, -- 'contains', 'overlaps', 'serves'
PRIMARY KEY (parent_id, child_id)
);
❌ Duplicating Data for SEO
// ❌ Bad: Multiple URLs for same content
/page/business/123
/ward/WARD001/business/123
/city/johannesburg/business/123
// ✅ Good: Canonical URL + redirects
/page/business/123 // Canonical
/ward/*/business/* // Redirect to canonical
Migration Strategy
1. Start with Tables
-- Initial schema
CREATE TABLE businesses (...);
CREATE TABLE products (...);
CREATE TABLE categories (...);
2. Add Views for Complex Queries
-- Create view for common joins
CREATE VIEW business_directory AS SELECT ...;
3. Optimize with Materialized Views (Only if Needed)
-- After performance testing
CREATE MATERIALIZED VIEW business_stats AS SELECT ...;
CREATE INDEX idx_business_stats_ward ON business_stats(ward_code);
-- Set up refresh strategy
CREATE OR REPLACE FUNCTION refresh_business_stats()
RETURNS void AS $$
BEGIN
REFRESH MATERIALIZED VIEW CONCURRENTLY business_stats;
END;
$$ LANGUAGE plpgsql;
-- Schedule refresh (e.g., every hour)
SELECT cron.schedule('refresh-business-stats', '0 * * * *',
'SELECT refresh_business_stats();');
Summary
Key Takeaways:
- Tables store truth - All authoritative data lives here
- Views express logic - Complex joins and business rules
- Materialized views are for performance - Only when needed
- Index for questions - Design around query patterns
- Supabase is source of truth - Algolia is a read-only cache
- Graph over hierarchy - Many-to-many relationships model reality better