Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 81 additions & 0 deletions docs/temporal-covering.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# Temporal-covering descriptor

`meta/temporal-covering.json` is the **single codegen source of truth**
(RFC #870 TemporalParquet / #913 Temporal Data Lake) for projecting a MEOS
temporal column into Parquet/Iceberg **covering columns**. The pipeline
folds it into `meos-idl.json` as `temporalCovering`. Every binding/engine
(PyMEOS, JMEOS, MobilityDuck, MobilitySpark, …) generates the **identical**
covering schema from this one mapping, so a temporal table prunes the same
way on every platform — no per-engine covering code to maintain.

## What it is

A temporal value is stored on disk as a canonical MEOS-WKB `BLOB`. Iceberg
and Parquet cannot prune on a `BLOB`. The covering descriptor names, per
temporal-type **class**, the primitive columns to *materialise alongside*
the value — the bounding box and SRID — which Iceberg collects as manifest
statistics and Parquet as row-group min/max. A bbox/time predicate then
prunes whole files and row groups with **no spatial-aware engine**
(GeoParquet 1.1 `covering.bbox`; MVB v3 measured this as ~10× faster than
the `ST_Intersects` path).

The mapping is keyed by **class**, not by type — adding a type is one entry
in its class:

| Class | Box | Types | Covering columns |
|---|---|---|---|
| `spatial` | `STBOX` via `tspatial_to_stbox` | tgeompoint, tgeogpoint, tgeometry, tgeography, tcbuffer, tnpoint, tpose, trgeometry | `xmin xmax ymin ymax [zmin zmax] tmin tmax srid` |
| `number` | `TBOX` via `tnumber_to_tbox` | tint, tfloat, tbigint | `vmin vmax tmin tmax` |

The canonical value column is unchanged and lossless; covering columns are
denormalised derivations of the value's box. `zmin`/`zmax` are emitted only
for 3D values (`when: hasZ`).

## In the catalog

`temporalCovering` carries the verbatim `classes`, plus derived lookups for
codegen:

```json
"temporalCovering": {
"valueCodec": { "asHexWkb": "temporal_as_hexwkb",
"fromHexWkb": "temporal_from_hexwkb" },
"byType": { "tgeompoint": { "class": "spatial", "box": {...},
"srid": "tspatial_srid", "columns": [...] }, ... },
"symbols": ["stbox_xmin", "tbox_xmin", "tspatial_to_stbox", ...],
"count": 11
}
```

- `byType` — `"tgeompoint"` → its class, box converter, SRID accessor, and
covering columns (each with its MEOS bbox accessor and SQL type). A
generator reads this directly; it never re-derives the mapping.
- `symbols` — every MEOS C symbol the descriptor depends on. The covering
parity audit (`tools/covering_parity.py`) checks each is exported by the
catalog and each covered type is a real `MeosType` — a miss is reported as
a worklist (add/export the accessor in MEOS), never a fabricated pass.

## How a generator uses it

For a column `traj TGEOMPOINT`, emit alongside the WKB value column:

```sql
xmin = stbox_xmin(tspatial_to_stbox(traj)), xmax = stbox_xmax(...),
ymin = stbox_ymin(...), ymax = stbox_ymax(...),
tmin = stbox_tmin(...), tmax = stbox_tmax(...),
srid = tspatial_srid(traj)
```

(each engine in its own idiom — DuckDB generated columns, a Spark UDF
projection, a PyMEOS writer), plus the `temporal` and GeoParquet `geo` /
`covering.bbox` file metadata keys from `metadataKeys`.

## Not yet covered

- **Time-only** (`tbool`, `ttext`): a `tmin`/`tmax` covering needs a span
lower/upper bound accessor; `temporal_to_tstzspan` is exported but a span
bound accessor is not. Surfaced as a MEOS export gap (close in MEOS C),
not filled binding-side.
- **Point-cloud / cell-index** (`tpcpoint`, `tpcpatch`, `th3index`,
`tquadbin`): fold into the `spatial` class once the catalog confirms a
uniform temporal→`STBOX` converter for these families.
39 changes: 39 additions & 0 deletions generate_covering.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
"""Generate the temporal-covering projection from the enriched MEOS catalog.

Usage:
python run.py <path-to-MobilityDB-meos-include> # produce output/meos-idl.json
python generate_covering.py # read output/meos-idl.json
python generate_covering.py path.json out.json
"""

import json
import sys
from pathlib import Path

from generator.covering import build_covering_projection

IN_PATH = Path(sys.argv[1]) if len(sys.argv) > 1 else Path("output/meos-idl.json")
OUT_PATH = (Path(sys.argv[2]) if len(sys.argv) > 2
else Path("output/meos-covering-projection.json"))


def main() -> None:
if not IN_PATH.exists():
sys.exit(f"Catalog not found: {IN_PATH} — run `python run.py` first.")

catalog = json.loads(IN_PATH.read_text())
if "temporalCovering" not in catalog:
sys.exit(f"{IN_PATH} has no `temporalCovering` — it is attached by "
"run.py; regenerate the catalog.")

proj = build_covering_projection(catalog)

OUT_PATH.parent.mkdir(parents=True, exist_ok=True)
OUT_PATH.write_text(json.dumps(proj, indent=2) + "\n")
print(f" → {OUT_PATH} written", file=sys.stderr)
print(f"[covering-projection] {proj['count']} temporal types projected "
f"to covering columns", file=sys.stderr)


if __name__ == "__main__":
main()
63 changes: 63 additions & 0 deletions generator/covering.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
"""Temporal-covering projection generator.

Projects the ``temporalCovering`` block of the MEOS catalog
(``meos-idl.json``, produced by ``parser/covering.py``) onto the canonical,
language-agnostic covering-column contract: per temporal type, the ordered
covering columns with the fully-composed MEOS expression that derives each
from the value.

Every binding generator (PyMEOS, JMEOS, MobilityDuck, MobilitySpark, …)
renders this same contract in its own idiom — a DuckDB ``GENERATED`` column,
a Spark UDF projection, a PyMEOS writer — so a temporal table prunes the
same way on every platform (Iceberg manifest + Parquet row-group min/max).
The ``VALUE`` placeholder is the temporal column reference the binding
substitutes.

Pure ``dict`` → ``dict``; no libclang and no MEOS runtime.
"""

from __future__ import annotations


def _column_expr(column: dict, box_from: str) -> str:
"""Compose the MEOS expression that derives one covering column from the
temporal value (``VALUE``). A ``box`` column is read off the value's box;
a ``value`` column is read off the value directly."""
if column["source"] == "value":
return f"{column['accessor']}(VALUE)"
return f"{column['accessor']}({box_from}(VALUE))"


def build_covering_projection(catalog: dict) -> dict:
"""Project ``temporalCovering`` onto the canonical covering-column contract."""
cov = catalog.get("temporalCovering")
if not cov:
raise ValueError("catalog has no `temporalCovering` — run run.py")

types = {}
for tname, spec in cov["byType"].items():
box = spec.get("box")
box_from = box["from"] if box else None
columns = []
for col in spec["columns"]:
entry = {
"name": col["name"],
"sqlType": col["sqlType"],
"expr": _column_expr(col, box_from),
}
if col.get("when"):
entry["when"] = col["when"]
columns.append(entry)
types[tname] = {
"class": spec["class"],
"boxType": box["type"] if box else None,
"columns": columns,
}

return {
"version": cov["version"],
"valueCodec": cov["valueCodec"],
"metadataKeys": cov["metadataKeys"],
"types": types,
"count": len(types),
}
74 changes: 74 additions & 0 deletions meta/temporal-covering.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
{
"_comment": "Temporal-covering descriptor — the single codegen source of truth for projecting a MEOS temporal column into Parquet/Iceberg covering columns (GeoParquet 1.1 `covering.bbox`). Every binding/engine generates the IDENTICAL covering schema from this mapping, so a temporal table prunes the same way on every platform (Iceberg manifest pruning + Parquet row-group min/max) with no spatial-aware engine. Curated canonical data keyed by temporal-type FAMILY (a `class`), not per type — adding a type is one entry in its class. The canonical MEOS-WKB value column is unchanged and lossless; the covering columns are denormalised derivations of the value's bounding box. RFC #870 (TemporalParquet) / #913 (Temporal Data Lake).",
"provenance": {
"rfc": "MobilityDB RFC #870 (TemporalParquet) + #913 (Temporal Data Lake)",
"discussion": "MobilityDB#861 (edge-to-cloud SQL portability: one query, three platforms)",
"geoParquet": "GeoParquet 1.1 covering.bbox (geoparquet.org/releases/v1.1.0)",
"benchmark": "MVB v3 — the scalar AND-chain on materialised covering columns prunes row groups identically to the spatial-aware path and ~10x faster, with no DuckDB spatial extension"
},
"version": "1.0.0",
"valueCodec": {
"asHexWkb": "temporal_as_hexwkb",
"fromHexWkb": "temporal_from_hexwkb",
"note": "The canonical MEOS-WKB stays the lossless value column (BLOB); covering columns are denormalised and never the source of truth."
},
"metadataKeys": {
"temporal": "temporal",
"geo": "geo",
"covering": "bbox"
},
"classes": {
"spatial": {
"doc": "Spatial temporal types — STBOX covering (x/y[/z] extent + time extent + SRID).",
"box": {"type": "STBOX", "from": "tspatial_to_stbox"},
"srid": "tspatial_srid",
"types": ["tgeompoint", "tgeogpoint", "tgeometry", "tgeography", "tcbuffer", "tnpoint", "tpose", "trgeometry"],
"columns": [
{"name": "xmin", "sqlType": "double", "accessor": "stbox_xmin", "source": "box"},
{"name": "xmax", "sqlType": "double", "accessor": "stbox_xmax", "source": "box"},
{"name": "ymin", "sqlType": "double", "accessor": "stbox_ymin", "source": "box"},
{"name": "ymax", "sqlType": "double", "accessor": "stbox_ymax", "source": "box"},
{"name": "zmin", "sqlType": "double", "accessor": "stbox_zmin", "source": "box", "when": "hasZ"},
{"name": "zmax", "sqlType": "double", "accessor": "stbox_zmax", "source": "box", "when": "hasZ"},
{"name": "tmin", "sqlType": "timestamptz", "accessor": "stbox_tmin", "source": "box"},
{"name": "tmax", "sqlType": "timestamptz", "accessor": "stbox_tmax", "source": "box"},
{"name": "srid", "sqlType": "int", "accessor": "tspatial_srid", "source": "value"}
]
},
"number": {
"doc": "Numeric temporal types — TBOX covering (value range + time extent).",
"box": {"type": "TBOX", "from": "tnumber_to_tbox"},
"srid": null,
"types": ["tint", "tfloat", "tbigint"],
"columns": [
{"name": "vmin", "sqlType": "double", "accessor": "tbox_xmin", "source": "box"},
{"name": "vmax", "sqlType": "double", "accessor": "tbox_xmax", "source": "box"},
{"name": "tmin", "sqlType": "timestamptz", "accessor": "tbox_tmin", "source": "box"},
{"name": "tmax", "sqlType": "timestamptz", "accessor": "tbox_tmax", "source": "box"}
]
},
"timeOnly": {
"doc": "Time-only temporal types — no spatial box; time extent only.",
"box": null,
"srid": null,
"types": ["tbool", "ttext"],
"columns": [
{"name": "tmin", "sqlType": "timestamptz", "accessor": "temporal_start_timestamptz", "source": "value"},
{"name": "tmax", "sqlType": "timestamptz", "accessor": "temporal_end_timestamptz", "source": "value"}
]
}
},
"deferred": {
"pointcloudCellIndex": {
"types": ["tpcpoint", "tpcpatch", "th3index", "tquadbin"],
"reason": "STBOX covering via a type-specific box path (e.g. tpcbox_to_stbox); fold into the `spatial` class once the catalog confirms a uniform temporal->STBOX converter for these families."
}
},
"notes": [
"The covering columns are a denormalisation of the value's bounding box; the canonical MEOS-WKB BLOB remains the lossless source of truth.",
"Materialising the covering columns as primitive Parquet columns gives Iceberg manifest-level file pruning and Parquet row-group min/max pruning, with no spatial-aware engine.",
"zmin/zmax are emitted only for 3D values (`when: hasZ`); 2D values omit them or store null.",
"`source: box` accessors take the box returned by `class.box.from(value)`; `source: value` accessors take the temporal value directly.",
"This descriptor is type-agnostic per class exactly as `portable-aliases.json` is type-agnostic per operator family — codegen consumes it identically across every binding."
]
}
88 changes: 88 additions & 0 deletions meta/temporal-covering.schema.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://github.com/MobilityDB/MEOS-API/blob/main/meta/temporal-covering.schema.json",
"title": "Temporal-covering descriptor — canonical SoT",
"description": "Schema for `meta/temporal-covering.json` (RFC #870/#913). Catches shape regressions earlier than the unit tests; validated as a test step in `tests/test_covering.py`.",
"type": "object",
"additionalProperties": true,
"required": ["provenance", "version", "valueCodec", "metadataKeys", "classes", "notes"],
"properties": {
"_comment": {"type": "string"},
"provenance": {
"type": "object",
"additionalProperties": true,
"required": ["rfc"],
"properties": {
"rfc": {"type": "string"},
"discussion": {"type": "string"},
"geoParquet": {"type": "string"},
"benchmark": {"type": "string"}
}
},
"version": {"type": "string"},
"valueCodec": {
"type": "object",
"additionalProperties": true,
"required": ["asHexWkb", "fromHexWkb"],
"properties": {
"asHexWkb": {"type": "string"},
"fromHexWkb": {"type": "string"},
"note": {"type": "string"}
}
},
"metadataKeys": {
"type": "object",
"additionalProperties": true,
"required": ["temporal", "covering"],
"properties": {
"temporal": {"type": "string"},
"geo": {"type": "string"},
"covering": {"type": "string"}
}
},
"classes": {
"type": "object",
"minProperties": 1,
"additionalProperties": {
"type": "object",
"additionalProperties": true,
"required": ["types", "columns"],
"properties": {
"doc": {"type": "string"},
"srid": {"type": ["string", "null"]},
"box": {
"type": ["object", "null"],
"required": ["type", "from"],
"properties": {
"type": {"type": "string"},
"from": {"type": "string"}
}
},
"types": {
"type": "array",
"minItems": 1,
"items": {"type": "string", "pattern": "^t[a-z0-9]+$"}
},
"columns": {
"type": "array",
"minItems": 1,
"items": {
"type": "object",
"additionalProperties": false,
"required": ["name", "sqlType", "accessor", "source"],
"properties": {
"name": {"type": "string", "pattern": "^[a-z][a-z0-9]*$"},
"sqlType": {"enum": ["double", "int", "timestamptz"]},
"accessor": {"type": "string"},
"source": {"enum": ["box", "value"]},
"when": {"enum": ["hasZ"]}
}
}
}
}
}
},
"deferred": {"type": "object"},
"notes": {"type": "array", "items": {"type": "string"}}
}
}
Loading