WORK-409
ID:WORK-409Status:pending

Visual-regression harness

A shared Playwright harness that photographs the generated gallery + layout fixtures and diffs them against a baseline. This closes the AI iteration loop and — critically — gives the v0.23.0 skeleton/skin extraction its "diff must be empty" proof. No screenshot testing exists in the repo today; this is greenfield.

Baselines are ephemeral, not committed. Fixtures and theme CSS churn constantly, so committed golden PNGs would mean endless binary diffs and a re-baseline commit on nearly every PR — noise that trains everyone to ignore the check, plus repo bloat. Instead the baseline is whatever you compare against in the moment (__screenshots__/ is gitignored): capture-then- compare locally, or compare-against-base in CI.

Priority:highComplexity:moderateMilestone:v0.22.0Source:SPEC-094
claude/work-409-visual-regression-harness View source

Criteria completion

Criteria completion: 3 of 5 (60%) checked; tracking started on Jun 12, no incremental history yet0%25%50%75%100%Jun 12Jun 15

Tracking started Jun 12 — check back for trends.

Branches 4
History 4
  1. 6776aa6
    Content editedby bjornolofandersson
  2. 9bbe3d1
    Created (pending)by bjornolofandersson
  3. 406471d
    Content editedby Claude
    WORK-416: bundle behaviors into the gallery so lifecycle runes render
  4. df1ffa9
    Content editedby Claude
    plan: break down SPEC-094 first slice into v0.22.0 milestone

Scope

  • Reusable Playwright config + snapshot test ("load artifact → await document.fonts.ready → settle behaviors → snapshot"), living once as a shared harness package — not copy-pasted per theme. Network/iframe runes (map/sandbox/embed) excluded.
  • Rune gallery: per-data-gallery-cell element clips (a diff localises to the rune). Layout fixtures: whole-page shots per viewport.
  • Lumina wires it in with thin glue only: a sibling spec calling registerGalleryTests. A theme adopts it with a spec + script, no logic copy.
  • Capture-then-compare (local) and prove-inert (capture on base ref → switch branch → diff) workflows.
  • Runs in a pinned container (Playwright's official image) for deterministic anti-aliasing / font hinting.
  • Distribution: opt-in, separately-installed package — Playwright / browser binaries must not enter the core install path.
  • CI (follow-up): a compare-against-base job — render base + head, diff, report. Informational by default (uploads the diff / posts a summary, passes — visual changes are usually legitimate), with an opt-in expect-empty mode where a non-empty diff is a failure (for refactors asserting zero visual change).

Acceptance Criteria

  • A reusable harness screenshots both subjects (rune clips per mode; layout pages per mode × viewport), diffs against a baseline, and runs end-to-end in the pinned container.
  • Baselines are ephemeral (__screenshots__/ gitignored) — no committed golden PNGs; capture-then-compare + prove-inert workflows are documented.
  • A second theme could adopt it with a spec + script only (no logic copy).
  • The harness ships as an opt-in package; the core CLI / runtime install pulls no browser binaries.
  • CI compare-against-base job: informational by default, opt-in expect-empty (follow-up — browser-dependent).

Dependencies

  • Requires WORK-407 and WORK-408 (the subjects to photograph).
  • The gallery now inlines behaviors (WORK-416), so the harness must wait for behaviors to settle before shooting and exclude the network-dependent runes (map tiles, sandbox external iframe/CDN) from deterministic baselines (stub/skip). Synchronous runes (PE runes, diagram, chart, nav) baseline fine after settle.

References

  • SPEC-094 · Playwright · packages/lumina/.