Architecture

A compiler for hostile, underspecified legal deltas.

LawVM's architecture is not chosen by taste. It is forced by the structural properties of law itself. The essay derives the necessity; this page describes the result.

The compiler model

A jurisdiction frontend is a phased compiler with explicit contracts:

  1. Acquire and archive source artifacts
  2. Parse amendment text into a clause surface
  3. Extract payloads and normalize source-locally
  4. Elaborate against live legal state (snapshot-pure)
  5. Lower to canonical typed operations
  6. Replay over a base state
  7. Materialize point-in-time text
  8. Adjudicate against oracle or witness surfaces

The pipeline separates lowering, target resolution, replay, and divergence accounting so that mismatches are inspectable rather than collapsed into a single opaque failure state.

Two planes

LawVM operates on two simultaneous planes:

Semantic plane: source artifacts → clause surface → payload surface → elaborated intent → canonical effects → timelines → PIT materialization. This is the path from raw legal text to point-in-time state.

Epistemic plane: parse witnesses → observations → obligations → adjudications → claims → evidence bundle. This is the path that records why the result should be trusted — what was observed, what was inferred, what was recovered, and what remains unresolved.

Both planes run together. A replay result without its epistemic trail is not a LawVM result.

Three hard waists

The architecture has three stable interfaces that must not be bypassed:

  1. Clause surface — the first stable representation of amendment meaning. A typed AST for amendment instruction language.
  2. Payload surface — the amendment body after source-local normalization, before live-state-dependent meaning recovery. This is where the source text stops being raw and starts being structured, but meaning recovery against the current statute state has not yet happened.
  3. Canonical execution — replay consumes only typed canonical execution artifacts, not raw amendment XML. No unresolved meaning crosses this boundary.

Strict mode and quirks mode

LawVM serves two worlds:

Quirks mode is for the historical corpus. Real legislative text is full of omitted context, editorial shortcuts, inconsistent numbering, source encoding oddities, and amendments that only make sense against a specific live consolidated witness. Quirks mode uses recovery heuristics — but marks every recovery path with provenance. It never pretends inferred structure was explicit in source.

Strict mode is for a future where law is authored to compile cleanly. Every amendment is structurally unambiguous, every target is explicitly addressable, every action is typed, every temporal effect is explicit. Strict mode forbids: target guessing, hidden insertion anchoring, fallback whole-section replacement, ambiguous omission expansion, silent date estimation.

The endgame is not "replace legal prose with code." It is: law remains human-readable, but official publication also emits canonical machine-readable state/change artifacts alongside the human text. Strict mode is the compilation target for that future. Quirks mode is the recovery compiler for the past.

Frontend / kernel boundary

The shared kernel is jurisdiction-agnostic: canonical legal-address and tree model, operation vocabulary, replay execution, timeline semantics, materialization, structural invariants.

Frontends are jurisdiction-local: source acquisition, parsing conventions, drafting idioms, payload extraction, elaboration rules, source pathology, oracle comparison.

The important design question is never "can we extract something useful?" It is: what is the smallest honest executable claim for this jurisdiction, and what source family makes that claim defensible?

Beyond Layer 0

LawVM is deliberately narrow. It computes what the legal text says at a point in time. It does not compute what the law means, how it is applied in practice, or what it costs. Those are higher layers:

LayerQuestionScope
L0: LawVMWhat does the text say?Text-state compilation, provenance, timelines
L1: Legal viewsWhich view to run?Territorial, commencement, transitional overlays
L2: InterpretationWhat do authorities say it means?Court holdings, guidance, doctrine
L3: PraxisHow is it actually applied?Enforcement, institutional behavior
L4: ReasoningWhat follows for this fact pattern?Compliance, simulation, argument
L5: ProductsWhat can users do?Search, Q&A, drafting assistants

Upper layers attach claims to L0 anchors without mutating the text-state kernel. LawVM is designed as a substrate: stable identities, span-level anchoring, explicit provenance, overlay hooks.

Downstream examples: Lakikartta joins the legal graph to budget data (92k statutes, 500B€ budget weights, PageRank/Katz/DebtRank centrality). MeV mechanism tests analyze whether government bills' mechanisms produce their stated goals. These are separate projects that demonstrate what becomes possible once L0 text-state compilation is reliable.