Testing Coordinate Precision Loss During Conversion

7 min read

Coordinate precision loss during format, CRS, or serialization conversion is a silent failure mode that routinely corrupts spatial joins, invalidates topology, and introduces sub-centimeter drift in high-accuracy pipelines. The tools at the centre of this problem are concrete: numpy.float32, the PROJ transformation pipeline behind pyproj.Transformer, and the GDAL/OGR COORDINATE_PRECISION creation option. This page sits within Attribute & Metadata Checks because coordinate precision is, fundamentally, a metadata contract — the declared dtype and CRS of a coordinate array — that must be asserted, not assumed. For GIS QA engineers, data engineers, and platform/DevOps teams, detecting and preventing this drift means moving beyond naive equality assertions toward tolerance-aware tests aligned with the broader core pipeline architecture and its deterministic predicates.

Root-Cause Framing: Why Precision Bleeds

Precision degradation rarely stems from a single operation. At the engineering level it emerges from the intersection of binary storage width, transformation engines, and implicit serialization defaults — each of which can independently strip significant digits before any validation runs:

Float32 vs Float64 storage. GeoJSON, Parquet, and certain PostGIS configurations default to 32-bit floats for coordinate arrays. A 32-bit IEEE 754 float carries a 23-bit mantissa, retaining roughly 7 decimal significant digits — about 1 metre of positional resolution at the equator. High-accuracy survey data, cadastral boundaries, and engineering-grade LiDAR require 64-bit floats (52-bit mantissa, ~15 digits, ~1 cm). The loss is not noise; it is a deterministic truncation of the mantissa.
PROJ transformation pipelines. Default PROJ pipelines often omit high-accuracy datum-shift grids (e.g., conus, ntv2, or ntf_r93). Without explicit grid parameters or +towgs84 overrides, transformations introduce systematic offsets that compound across multi-step conversions. See the official PROJ transformation documentation for pipeline configuration standards.
GDAL/OGR export rounding. Vector drivers apply implicit coordinate truncation during serialization. The COORDINATE_PRECISION creation option defaults to 7 for GeoJSON (per RFC 7946) and 15 for Shapefile, but downstream parsers — especially JavaScript-based or lightweight ETL tools — frequently re-serialize at lower precision. Refer to the GDAL GeoJSON driver specifications for creation-option overrides.
Implicit ETL type coercion. GeoPandas, Dask, and PyArrow may silently downcast coordinate arrays during concatenation, partitioning, or schema inference to optimize memory bandwidth. This strips precision before validation occurs, making the drift invisible until topology checks fail downstream.

Because the loss is deterministic, it is also testable: given a known input coordinate and a known conversion, the maximum tolerable deviation can be expressed as a fixed bound and asserted in CI.

Precision Reference and Tolerance Bounds

The table below maps the storage and conversion knobs that govern precision to their effective ground resolution and a recommended assertion threshold. Use it to pick the tolerance constant your pytest checks compare against.

Mechanism	Controlling parameter	Mantissa / digits	Effective resolution	Recommended test tolerance
32-bit float storage	`numpy.dtype('float32')`	23-bit / ~7 digits	~1 m at equator	reject for cadastral; flag if `dtype != float64`
64-bit float storage	`numpy.dtype('float64')`	52-bit / ~15 digits	~1 cm	baseline reference
GeoJSON serialization	`COORDINATE_PRECISION` (RFC 7946 default 7)	7 decimal places	~1.1 cm at equator	`>= 7` for survey-grade
Shapefile serialization	`COORDINATE_PRECISION` (default 15)	full double	~1 cm	parity with source
Datum shift omission	PROJ grid / `+towgs84`	n/a	0.1–2 m systematic	`< 0.05` m for engineering work

For a coordinate value $x$ stored as float64 and downcast to float32, the absolute drift is bounded by the machine epsilon of the narrower type:

\Delta x = |x - \mathrm{f32}(x)| \le \tfrac{1}{2}\,\varepsilon_{32}\,|x|, \qquad \varepsilon_{32} = 2^{-23} \approx 1.19\times10^{-7}

In Web Mercator (EPSG:3857) a mid-latitude easting is on the order of $10^{7}$ metres, so the relative epsilon multiplies up to roughly $10^{7}\times\tfrac{1}{2}\varepsilon_{32}\approx 0.6$ m of worst-case drift — which is why projected coordinates are far more sensitive to float32 storage than geographic degrees. This is the core reason the verification step below asserts against an absolute metre tolerance rather than a digit count.

Step-by-Step Implementation

The following pattern isolates the exact deviation threshold that triggers topology failures and integrates directly into a pytest suite. It is written against Shapely 2.x, GeoPandas 0.14+, pyproj 3.6+, and pytest 7+.

Step 1 — Establish a float64 reference geometry. Pin the input as float64 explicitly so the source of truth cannot be silently downcast by NumPy’s default promotion rules.

import numpy as np
import geopandas as gpd
from shapely.geometry import Point
from pyproj import Transformer

# float64 reference — the accuracy contract anchors here
ref_coords = np.array([[-122.4194155, 37.7749295]], dtype=np.float64)
ref_gdf = gpd.GeoDataFrame(
    geometry=[Point(ref_coords[0])], crs="EPSG:4326"
)

Step 2 — Transform with an explicit, pinned PROJ pipeline. Use always_xy=True to avoid axis-order surprises and capture the full-precision projected coordinate before any storage step.

transformer = Transformer.from_crs(
    "EPSG:4326", "EPSG:3857", always_xy=True
)
x64, y64 = transformer.transform(ref_coords[0, 0], ref_coords[0, 1])

Step 3 — Simulate the lossy storage step. Downcasting to float32 reproduces what Parquet/Arrow pipelines do under default schema inference.

x32 = np.float32(x64)
y32 = np.float32(y64)

Step 4 — Quantify the drift in projection units (metres). Compute the per-axis absolute delta and take the worst case.

drift_x = abs(x64 - float(x32))
drift_y = abs(y64 - float(y32))
max_drift = max(drift_x, drift_y)

Step 5 — Compare against a configured tolerance. Load the threshold from a data contract rather than hard-coding it, mirroring how spatial tolerance thresholds are configured for every geometric assertion on this site.

TOLERANCE_METERS = 0.01  # cadastral-grade; load from YAML/JSON contract in CI
assert max_drift <= TOLERANCE_METERS, (
    f"Precision drift {max_drift:.6f} m exceeds {TOLERANCE_METERS} m"
)

Running this against a mid-latitude point reveals that float32 serialization in Web Mercator space typically introduces ~0.05–0.15 m of drift, immediately flagging datasets that claim sub-centimeter accuracy they cannot actually deliver.

Verification Pattern

Wrap the logic as a parameterized test so it runs as a deterministic CI gate. The following is copy-button-ready and exits non-zero the moment any sampled coordinate breaches the contract:

import numpy as np
import pytest
from pyproj import Transformer

TRANSFORMER = Transformer.from_crs("EPSG:4326", "EPSG:3857", always_xy=True)
TOLERANCE_METERS = 0.01

@pytest.mark.parametrize("lon,lat", [
    (-122.4194155, 37.7749295),   # San Francisco
    (151.2092955, -33.8688197),   # Sydney
    (139.6917060, 35.6894875),    # Tokyo
])
def test_float32_storage_preserves_tolerance(lon, lat):
    x64, y64 = TRANSFORMER.transform(lon, lat)
    drift = max(abs(x64 - float(np.float32(x64))),
                abs(y64 - float(np.float32(y64))))
    assert drift <= TOLERANCE_METERS, f"{drift:.6f} m > {TOLERANCE_METERS} m"

Run it directly with pytest -q test_precision.py, or as a one-liner smoke check in a pre-commit hook:

python -c "import numpy as np; from pyproj import Transformer; t=Transformer.from_crs('EPSG:4326','EPSG:3857',always_xy=True); x,y=t.transform(-122.4194155,37.7749295); print(max(abs(x-float(np.float32(x))),abs(y-float(np.float32(y)))))"

A non-zero exit (or a printed value above your tolerance) is the signal to keep coordinates in float64 end-to-end or to widen the accuracy contract deliberately rather than by accident.

Failure Modes and Edge Cases

Precision testing has spatial corner cases that a naive metre-tolerance check will miss. Cover these explicitly:

Anti-meridian wrap (±180° longitude). Float32 storage near the ±180° seam can flip a coordinate across the meridian after rounding, turning a 1 cm drift into a globe-spanning artifact once the geometry is reprojected. Test points at 179.9999999 and -179.9999999 and assert the sign is preserved, not just the magnitude.
Polar CRS and high northings. In polar stereographic or UTM zones at extreme latitudes, the projected coordinate magnitude grows, and the relative epsilon analysis above means absolute drift scales with it. A tolerance tuned for mid-latitude Web Mercator may silently pass dangerous drift near the poles — parameterize the tolerance per CRS unit and extent.
Empty and null geometries. A Point() with no coordinates, or a null geometry slot in a GeoDataFrame, raises on .transform() or returns inf/nan. Guard with geom.is_empty and assert that null handling is explicit so a missing coordinate is never mistaken for a zero-drift pass.
Mixed Z/M coordinates. Downcasting a 3D or measured geometry can corrupt the Z or M ordinate while leaving X/Y intact, so an X/Y-only drift check reports a false pass. Extend the delta computation across all ordinates returned by shapely.get_coordinates(geom, include_z=True).
Repeated round-trips. A single conversion may sit within tolerance, but ETL stages that reproject and re-serialize repeatedly accumulate drift. Assert against the original float64 reference after the full pipeline, not against the previous stage, so error compounding is caught. This is the same parity discipline applied when comparing GeoJSON vs Shapefile outputs across formats.

Conclusion

Coordinate precision loss is not a theoretical edge case; it is a deterministic consequence of mantissa width, transformation defaults, and implicit type coercion — and because it is deterministic, it is fully testable with fixed tolerance bounds. By pinning float64, configuring explicit PROJ pipelines, and gating drift in CI, engineering teams eliminate silent spatial degradation before it corrupts analytics or breaks production topology. For the broader contract layer this check belongs to, return to Attribute & Metadata Checks.