Getting Started

Prerequisites

  • Python 3.12+
  • uv package manager
  • Git

Install

git clone https://github.com/eliask/lawvm.git
cd lawvm
uv sync

Import source archives

Finland replay requires local archived sources. Import the public Finlex archives (~13 GB ZIP input, ~5 GB on disk after ingestion):

uv run lawvm import-zip \
  --statute-zip https://www.finlex.fi/api/assets/open-data/archives/statute.zip \
  --consolidated-zip https://www.finlex.fi/api/assets/open-data/archives/statute-consolidated.zip

This downloads the official Finlex open data archives and ingests them into local .farchive files. The import is a one-time operation; subsequent commands read from the local archive.

First replay

Replay statute 2002/738 (Työturvallisuuslaki / Occupational Safety Act) as of January 1, 2024:

uv run lawvm replay 2002/738 --as-of 2024-01-01

This compiles all amendment acts affecting 2002/738, replays them over the base statute, and materializes the point-in-time text. The output is the complete statute as it stood on that date.

First diff

Compare LawVM's replay against the Finlex consolidation:

uv run lawvm diff 2002/738

This shows section-by-section divergences. Green sections match. Red sections diverge. Each divergence starts an investigation: replay defect, source gap, editorial convention, or candidate issue in the official consolidation surface.

First explain

See the amendment chain and operation history:

uv run lawvm explain 2002/738

This shows which amendments affected the statute, what operations they compiled to, and the temporal sequence of changes.

What success looks like

When replay succeeds on a statute with dozens of amendments spanning decades, you get:

  • Point-in-time text that matches the official consolidation character-for-character
  • Full provenance: every provision traced to the amendment that changed it
  • Temporal versioning: query any past date, get the text that was in force

When replay diverges, typed residuals explain the likely class: replay defect, source pathology, editorial artifact, or oracle staleness.

Run the benchmark

uv run lawvm bench --mode finlex_oracle

Replays the configured Finnish alpha corpus and reports aggregate metrics. See Artifacts for methodology and interpretation.

Explore further

uv run lawvm --help

The CLI surface includes replay, diff, explain, benchmark, bisect, diagnose, and many more tools. The full command surface is documented in --help output.

Architecture documentation lives in notes/ in the repository. Start with notes/SPEC_INDEX.md.