Mocking Geospatial Data for Tests

Isolating spatial logic from live production datasets is a prerequisite for deterministic geospatial validation. This guide establishes production-grade patterns aligned with Geospatial QA Fundamentals & Architecture, specifically addressing how to construct memory-safe, reproducible test fixtures that preserve topological integrity, coordinate reference system (CRS) alignment, and geometric precision across vector and raster workloads. Unlike traditional software testing, spatial mocking must account for projection transformations, floating-point coordinate drift, and complex spatial relationships. A pipeline-first design treats mock data as versioned artifacts, provisioned via infrastructure-as-code and validated against strict tolerance configurations before reaching CI/CD execution stages.

Pipeline-First Architecture & Fixture Generation

Live production datasets introduce unacceptable variance into automated validation pipelines due to schema drift, network latency, and unpredictable feature density. Ephemeral test environments must generate synthetic geometries, tile caches, and attribute tables that mirror production schemas without incurring storage overhead or external API dependencies.

A robust architecture treats mock data as immutable, versioned artifacts cached alongside test runners. Memory-safe execution is enforced by bounding fixture sizes, streaming large geometries through Python generators, and explicitly releasing GDAL/OGR contexts to prevent handle leaks in parallel workers. Instead of loading static shapefiles or GeoJSON from disk, engineers should construct in-memory GeoDataFrame objects using seeded coordinate arrays. This eliminates I/O bottlenecks and guarantees byte-level reproducibility across distributed test matrices.

Configuration management must externalize tolerance thresholds, precision models, and geometry validation rules into structured YAML or TOML manifests. A production-ready mock configuration enforces coordinate precision limits, defines acceptable Hausdorff distances for topology checks, and specifies raster resolution bounds. When integrating these configs into pytest, fixtures should parse tolerance parameters at collection time and apply them globally via session-scoped plugins or custom markers. This prevents ad-hoc tolerance overrides that compromise test determinism and ensures consistent behavior across local development, staging, and CI runners.

Validation Rules & Tolerance Enforcement

Spatial validation requires assertion primitives that evaluate geometric relationships rather than exact coordinate equality. Floating-point arithmetic and projection transformations introduce unavoidable micro-drift, which must be bounded by explicit tolerance envelopes. Engineers should implement custom assertion wrappers that delegate to Spatial Assertion Types Explained for standardized topology verification.

Effective tolerance enforcement relies on three core primitives:

  1. Coordinate Precision Bounds: Round-trip validation after CRS transformations must remain within a defined epsilon (e.g., 1e-7 for WGS84, 1e-3 for projected meters).
  2. Topological Distance Metrics: Hausdorff and Fréchet distances quantify shape similarity without requiring vertex-by-vertex equality.
  3. Spatial Relationship Thresholds: Overlap ratios, intersection areas, and buffer containment checks must accept configurable deviation margins.

For authoritative definitions of geometric validity rules, reference the OGC Simple Features specification. Validation pipelines should reject self-intersecting polygons, unclosed rings, and invalid multiparts before executing spatial joins or overlays. Custom pytest markers like @pytest.mark.spatial_tolerance(0.001) allow engineers to scope tolerance expectations per test case while maintaining global defaults.

Test Pyramid Alignment & Scoping Rules

Mocking strategies must scale according to the Understanding the GIS Test Pyramid. Unit tests operate on lightweight, synthetic fixtures generated in memory. Integration tests mock external tile servers, geocoding APIs, or spatial databases using deterministic response payloads. End-to-end tests consume scaled-down production snapshots that preserve schema complexity without the computational cost of full datasets.

Applying Scoping Rules for Map Data Validation prevents combinatorial explosion during spatial operations. Engineers should:

  • Restrict validation to explicit bounding boxes and CRS boundaries.
  • Cap feature density per fixture to maintain sub-second execution times.
  • Isolate raster and vector workflows to prevent cross-modal memory contention.
  • Use spatial indexing (e.g., R-tree or Quadtree) in test fixtures to mirror production query performance without loading full datasets.

By scoping validation to predictable geometric extents and limiting fixture cardinality, teams achieve deterministic execution times and eliminate flaky tests caused by unpredictable spatial index fragmentation.

Security Boundaries & Infrastructure Isolation

Mock environments must never inherit production credentials, network routes, or write permissions. Security Boundaries in Spatial QA require strict isolation between test runners and live spatial infrastructure. Ephemeral containers should provision read-only database roles, enforce network policies that block outbound production endpoints, and rotate mock credentials at pipeline initialization.

For relational spatial engines, follow Best practices for mocking PostGIS connections. Implement connection pooling with explicit timeout configurations, disable autovacuum during test runs to stabilize query plans, and seed databases using transactional rollbacks or Docker volume snapshots. CI/CD pipelines should provision isolated databases via Docker Compose or Kubernetes operators, ensuring that spatial extension versions (e.g., postgis, pgRouting) match production exactly to prevent version-specific geometry function discrepancies.

Deterministic Implementation Patterns

The following patterns demonstrate how to structure spatial mocks for Python-based QA pipelines. These examples prioritize memory safety, CRS normalization, and tolerance-aware assertions.

import pytest
import geopandas as gpd
import numpy as np
from shapely.geometry import Polygon, box
from shapely.validation import make_valid

# Session-scoped fixture for deterministic CRS normalization
@pytest.fixture(scope="session")
def crs_config():
    return {"source": "EPSG:4326", "target": "EPSG:3857", "tolerance": 1e-6}

# Generator-based fixture for memory-safe polygon synthesis
def generate_synthetic_polygons(count: int, seed: int = 42):
    rng = np.random.default_rng(seed)
    for _ in range(count):
        coords = rng.uniform(-10, 10, (5, 2))
        poly = Polygon(coords)
        yield make_valid(poly)

@pytest.fixture
def synthetic_gdf(crs_config):
    geoms = list(generate_synthetic_polygons(100))
    gdf = gpd.GeoDataFrame({"id": range(100), "geometry": geoms}, crs=crs_config["source"])
    return gdf.to_crs(crs_config["target"])

# Tolerance-aware spatial assertion wrapper
def assert_spatial_intersection(gdf_a, gdf_b, min_overlap_ratio=0.85, tolerance=1e-6):
    intersections = gdf_a.overlay(gdf_b, how="intersection")
    overlap_ratios = intersections.area / gdf_a.area
    assert (overlap_ratios >= min_overlap_ratio).all(), "Spatial overlap below tolerance threshold"

Key implementation guidelines:

  • Use shapely.validation.make_valid() immediately after geometry generation to prevent invalid topology from propagating.
  • Apply contextlib or explicit gc.collect() calls when streaming large raster tiles or vector batches.
  • Pin dependency versions in requirements.txt or pyproject.toml to avoid silent CRS or precision model changes across library updates.
  • Validate fixture schemas against production contracts using pydantic or pandera before executing spatial operations.

Pipeline Integration & CI/CD Readiness

To achieve publication-ready reliability, spatial mocks must integrate seamlessly into automated deployment pipelines. Configure pytest to run spatial tests in isolated worker pools using pytest-xdist, ensuring that GDAL/OGR thread safety limits are respected. Cache generated fixtures using GitHub Actions or GitLab CI artifact storage to reduce redundant computation. Implement pre-commit hooks that validate CRS consistency and geometry validity before code merges.

By treating mock data as first-class pipeline artifacts, enforcing strict tolerance configurations, and aligning validation with spatial test hierarchies, engineering teams eliminate environmental variance and achieve deterministic, reproducible geospatial QA.