Skip to content

jsonstat/validator

Repository files navigation

jsonstat-validator

A semantic validator for JSON-stat 2.0. It reuses the official JSON Schema 2020-12 definitions for the structural pass and layers on top a rules engine that checks the cross-field cube invariants JSON Schema cannot express.

Designed in DESIGN.md. Architectural companion to the LLM-Wiki JSON-stat knowledge base.


Why

JSON Schema can validate shape (required properties, oneOfs, enums, the IANA link regex) but cannot express relationships like "value array length must equal the product of size". This package implements exactly those checks (the S/D/C catalogue) with a stable, versioned error-code vocabulary.

{
  "version": "2.0",
  "class": "dataset",
  "id": ["sex", "year"],
  "size": [2, 3],
  "dimension": { "sex": { "category": { "index": ["M", "F"] } }, "year": { "category": { "index": { "2022": 0, "2023": 1, "2024": 2 } } } },
  "value": [10, 20, 30, 40, 50]
}

A schema engine says "this is shaped like a dataset". jsonstat-validator tells you the actual problem: value has 5 entries but product(size) is 2 × 3 = 6 — see VALUE_LEN_MISMATCH.


Install

Pick the surface that matches your stack. They all share one rules-manifest.json and one corpus/cases.json, so they produce identical findings on identical input.

Surface Package Install
TypeScript / Node @jsonstat-validator/ts npm i @jsonstat-validator/ts
CLI jsonstat-validate npx jsonstat-validate (no install)
Browser / CDN <script src> (see Web / CDN)
Rust jsonstat-validator cargo add jsonstat-validator
Wasm (JS) @jsonstat-validator/wasm npm i @jsonstat-validator/wasm

Quick start — TypeScript / Node

npm install @jsonstat-validator/ts
import { validate } from "@jsonstat-validator/ts";

const result = validate(doc);          // doc may be an object OR a JSON string
console.log(result.valid);             // true / false
console.log(result.summary);           // { errors, warnings, infos, structuralErrors, byCode }
for (const f of result.findings) {     // Finding[] — empty when valid
  console.log(`[${f.severity}] ${f.code}  ${f.path}${f.message}`);
}

validate() accepts a parsed object or a raw JSON string (it parses for you). To validate a file on disk, use the async validateFile():

import { validateFile } from "@jsonstat-validator/ts";
const result = await validateFile("./my-cube.json");

What the result looks like

{
  "valid": false,
  "findings": [
    {
      "code": "VALUE_LEN_MISMATCH",
      "ruleId": "S3",
      "severity": "error",
      "path": "/value",
      "message": "Dense 'value' length 5 must equal product(size) = 6.",
      "expected": 6,
      "actual": 5,
      "specRef": "wiki/format-specification.md"
    }
  ],
  "summary": { "errors": 1, "warnings": 0, "infos": 0, "structuralErrors": 0, "byCode": { "VALUE_LEN_MISMATCH": 1 } },
  "options": { /* resolved ValidateOptions */ },
  "meta": { "engineVersion": "0.3.0", "ruleSetVersion": "1.0.0", "schemaVersion": "1.05", "durationMs": 3 }
}

Structural (JSON Schema) violations are normalized into the same shape with code: "STRUCTURAL_VIOLATION" and the offending keyword kept in meta — so consumers only ever handle one finding shape.

Tuning a run

validate(doc, {
  mode: "full",              // "full" | "structural" | "semantic"
  minSeverity: "info",       // "error" | "warning" | "info" — filters returned findings
  maxCollectionDepth: 3,     // cap nested-collection recursion
  continueOnStructuralError: true,
  budget: { maxCells: 50_000_000, maxBytes: 200 * 1024 * 1024, maxFindings: 1000 },
  onFinding: (f) => { /* streaming sink, for very large docs */ },
});

Quick start — CLI

No install needed — run it once with npx:

# validate a file
npx jsonstat-validate my-cube.json

# validate JSON on stdin (note the `-`)
echo '{"version":"2.0","class":"dataset","id":["x"],"size":[2],"dimension":{"x":{"category":{"index":["a","b"]}}},"value":[1]}' \
  | npx jsonstat-validate -

# machine-readable output for CI
npx jsonstat-validate my-cube.json --format json
$ npx jsonstat-validate my-cube.json
valid: false
summary: 1 errors, 0 warnings, 0 infos, 0 structural
  [error] VALUE_LEN_MISMATCH  /value  — Dense 'value' length 1 must equal product(size) = 2.

Exit code is 0 when valid, 1 when invalid — drop it straight into a pipeline or pre-commit hook:

npx jsonstat-validate data/*.json || exit 1

Options:

jsonstat-validate <file|-> [options]
  --mode full|structural|semantic   validation phases to run (default: full)
  --format json|text                output format (default: text)
  --min-severity error|warning|info only show findings at/above this severity (default: info)
  --structural-only                 alias for --mode structural
  --semantic-only                   alias for --mode semantic

Quick start — Web / CDN

The TS package is dependency-light and browser-safe, so you can use it straight from a CDN with no build step. Works on esm.sh, jsDelivr, and unpkg.

<script type="module"> (recommended)

<script type="module">
  // esm.sh serves the ESM browser build:
  import { validate } from "https://esm.sh/@jsonstat-validator/ts";
  // or jsDelivr / unpkg:
  // import { validate } from "https://cdn.jsdelivr.net/npm/@jsonstat-validator/ts/+esm";

  const result = validate(myJsonstatDocument);
  console.log(result.valid, result.summary);
</script>

Classic <script src> (global JsonstatValidator)

<!-- Minified IIFE bundle from jsDelivr (also on unpkg) -->
<script src="https://cdn.jsdelivr.net/npm/@jsonstat-validator/ts/dist/browser/jsonstat-validator.min.js"></script>
<script>
  const { validate } = window.JsonstatValidator;
  const result = validate(myJsonstatDocument);
  console.log(result.valid, result.findings);
</script>

The browser build excludes validateFile() (it needs node:fs). Use validate() in the browser and fetch the document yourself.

Bundled in an app (Vite, webpack, Rollup, …)

npm install @jsonstat-validator/ts
import { validate } from "@jsonstat-validator/ts";

The package's exports map points bundlers at the browser build automatically, so the same import works in Node and the browser with no special config.


Quick start — Rust

cargo add jsonstat-validator
use jsonstat_validator::{validate_from_str, ValidateOptions};

fn main() {
    let result = validate_from_str(json, &ValidateOptions::default());
    println!("valid: {}", result.valid);
    for f in &result.findings {
        println!("[{}] {} {} — {}", f.severity, f.code, f.path, f.message);
    }
}

Or build and run the example binary directly from a checkout:

cd crates/validator && cargo test                       # corpus parity
echo '{"version":"2.0","class":"dataset","id":["x"],"size":[2],"dimension":{"x":{"category":{"index":["a","b"]}}},"value":[1]}' \
  | cargo run --example validate -- -

Quick start — Wasm

The Wasm surface is a thin JS wrapper over the Rust crate, exposing the same validate() shape as @jsonstat-validator/ts. The crate is compiled to WebAssembly with wasm-pack (--target web).

npm install @jsonstat-validator/wasm
import { init, validate } from "@jsonstat-validator/wasm";

await init();                       // load the .wasm once (the browser fetches it)
const result = validate(doc);       // same ValidationResult shape as @jsonstat-validator/ts
console.log(result.valid, result.summary);

In Node, import from @jsonstat-validator/wasm/node instead — it auto-loads the .wasm from disk and also exports validateFile(). See packages/wasm/README.md for the full surface, and note that options.onFinding / options.budget are not carried across the JS↔wasm boundary (findings are still returned in full).

The Wasm surface emits the same semantic findings as the TS and Rust surfaces — enforced by a TS↔Wasm corpus parity test that runs in CI.


Error codes

See rules-manifest.json for the authoritative, append-only catalogue. Codes include VALUE_LEN_MISMATCH, SPARSE_KEY_OUT_OF_RANGE, STATUS_LEN_MISMATCH, DIM_KEY_ID_MISMATCH, ID_SIZE_LEN_MISMATCH, ROLE_ID_UNKNOWN, INDEX_COUNT_MISMATCH, INDEX_POSITIONS_INVALID, LABEL_KEY_UNKNOWN, UNIT_KEY_UNKNOWN, COORD_KEY_UNKNOWN, NOTE_KEY_UNKNOWN, CHILD_ID_UNKNOWN, CHILD_CYCLE, METRIC_UNIT_MISSING (warning), BUNDLE_DEPRECATED (info), CUBE_SIZE_OVERFLOW, PARSE_ERROR, and STRUCTURAL_VIOLATION. The vocabulary is versioned independently (meta.ruleSetVersion) from the package SemVer — see DESIGN.md §4.4 for the full S/D/C rule table.


Building from source

The repo is an npm workspace plus a Cargo crate. From a checkout:

# TypeScript: npm install runs the `prepare` hook, which auto-builds both
# workspaces (so `npx jsonstat-validate` works immediately) and re-links the bin.
npm install
npm test                 # 27/27 corpus + rule tests

# After editing TS/CLI sources, rebuild:
npm run build            # regenerates @jsonstat-validator/ts and jsonstat-validate

# Rust:
cargo test --manifest-path crates/validator/Cargo.toml

How the published package stays self-contained

The manifest and vendored schemas live at the repo root as the single source of truth shared with the Rust surface. They are not read from disk at runtime (that would break npm publish and browser use). Instead packages/ts/tools/gen-assets.mjs compiles them into packages/ts/src/generated/ TypeScript modules at build time, so the published tarball carries everything it needs. The Rust crate does the equivalent with include_str!.

Producing the browser bundle

packages/ts/tools/build-browser.mjs uses esbuild to emit two self-contained files under dist/browser/: jsonstat-validator.min.js (IIFE, global JsonstatValidator) and jsonstat-validator.mjs (ESM).

Publishing

Releases are cut by release.yml, which publishes two places on a GitHub Release (or workflow_dispatch):

  • npm (the @jsonstat-validator/ts, jsonstat-validate, and @jsonstat-validator/wasm packages) uses npm Trusted Publishing (OpenID Connect) — no stored token. Prerequisite: each package is registered as a Trusted Publisher trusting this repo + workflow file on the npm side.
  • crates.io (the jsonstat-validator crate) has no OIDC equivalent, so it authenticates with a long-lived CARGO_REGISTRY_TOKEN stored as an environment secret named exactly that, under a GitHub environment named exactly crates-io (matching environment: crates-io in the workflow). The token's first publish needs publish-new scope (creating the crate) plus publish-update (every version after); once the crate exists, rotate it to a crate-scoped publish-update token.

Recommended repo setup before the first publish:

  1. Settings → Environments → New environment → name it crates-io; optionally enable Required reviewers and limit Deployment branches and tags.
  2. On that environment → Add secret → name CARGO_REGISTRY_TOKEN, paste a crates.io API token.
  3. Dry-run locally: cargo publish --dry-run --manifest-path crates/validator/Cargo.toml.
  4. Tag and push vX.Y.Z, create the GitHub Release; both jobs run (crate job awaits approval if you set a reviewer).

Always commit the root rules-manifest.json bump and the Rust _vendored snapshot togethervendored_parity.rs compares committed blobs, so they must land in the same commit or that test fails.


Status

Three native surfaces ship and are kept at parity — they emit identical findings on identical input:

Parity is enforced, not asserted: one shared corpus/cases.json drives the TS and Rust suites and the TS↔Wasm parity test that runs in CI, closing the TS ↔ Rust ↔ Wasm triangle. A committed-snapshot guard (crates/validator/tests/vendored_parity.rs) keeps the Rust _vendored copy byte-identical to the repo-root rules-manifest.json and schemas/vendored/, catching pure-metadata drift the corpus test misses. Releases publish via release.yml: npm through Trusted Publishing (OIDC), and the Rust crate to crates.io through a CARGO_REGISTRY_TOKEN secret.

Versioning

The package is pre-1.0. engineVersion is 0.3.0; the error-code vocabulary is versioned independently via meta.ruleSetVersion (1.0.0, append-only within a major) so downstream tooling can branch on the vocabulary rather than the package version — see DESIGN.md §4.5. The public validate() surface and result shape are stable in practice, but breaking changes remain allowed before 1.0.0.

Toward 1.0.0

Remaining work tracked toward the 1.0.0 cut (see CHANGELOG.md):

  • schemas/curated/ de-duplicated schema set, with a curated≡vendored parity test.
  • A curated-parity CI job (pending the curated/ schemas above).
  • A written SemVer / ruleSetVersion stability policy as the commitment artifact for 1.0.0.

The M1–M5 milestone history is recorded in DESIGN.md §11 and the git log.


License

Apache-2.0.

About

Semantic validator for JSON-stat 2.0 — cross-field cube invariants on top of the official JSON Schema 2020-12 definitions. TypeScript + Rust + WebAssembly.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors