Visual-regression harness
A shared Playwright harness that photographs the generated gallery + layout fixtures and diffs them against a baseline. This closes the AI iteration loop and — critically — gives the v0.23.0 skeleton/skin extraction its "diff must be empty" proof. No screenshot testing exists in the repo today; this is greenfield.
Baselines are ephemeral, not committed. Fixtures and theme CSS churn constantly, so committed golden PNGs would mean endless binary diffs and a re-baseline commit on nearly every PR — noise that trains everyone to ignore the check, plus repo bloat. Instead the baseline is whatever you compare against in the moment (__screenshots__/ is gitignored): capture-then- compare locally, or compare-against-base in CI.