Adapter testing infrastructure and drift prevention

The catalyst for SPEC-058 was that capabilities accreted onto @refrakt-md/sveltekit over the v0.14.x patch line and silently failed to reach the other five adapters. The refrakt documentation site only exercises SvelteKit on every release, so every other adapter is implicitly trusting "looks the same in the code" as a parity guarantee — and the cost of that trust came due as a six-feature gap.

This spec defines the testing infrastructure that prevents the next round of drift. The shape: shrink the per-adapter code surface as far as is reasonable, then catch the residual drift with cheap automation rather than expensive deployments. A live-deployment-per-adapter validation strategy was considered and deferred — the maintenance cost of keeping five hosted sites in sync is the very class of drift this spec is trying to detect.

This is implementation-deferred work. The milestone of SPEC-058 ships first; this spec captures the test infrastructure that should land after that parity work is in place, when the per-adapter surfaces are at their narrowest and snapshotting them is most stable.

main View source

Relationships

ID:SPEC-058Status:draft

Framework adapter parity with the SvelteKit reference

ID:SPEC-030Status:accepted

Framework Adapter System

main current draft

claude/update-adapters-5CJgQ draft main draft

Problem

Three classes of drift go undetected today:

1. Code drift — features added to one adapter, missed in others

The motivating case for SPEC-058. Six SvelteKit-only capabilities (site-tokens CSS, SEO option threading, CSS tree-shaking, pipeline-stats output, security + variables options, content HMR) accumulated over v0.14.0–v0.14.3 with no mechanism flagging the omissions. The omission only became visible when someone audited adapter parity by hand.

2. Output drift — adapters that should produce equivalent output produce subtly different output

Even when every adapter implements a feature, the per-framework rendering paths can diverge in the produced HTML, CSS load order, or SEO tag emission. The rune-tree HTML should be byte-identical across adapters (it's generated by the same identity transform). The CSS imports should match block-for-block (same tree-shake result). The SEO meta tags should be semantically equivalent. None of these invariants is currently checked.

3. Dependency drift — peer deps, framework version bumps, or upstream API changes break an adapter without anyone noticing

Each adapter has its own peer-dep range (Astro ^5.0.0, Next ^14 || ^15, Nuxt ^3.0.0, Eleventy ^3.0.0). A peer-dep mismatch, an upstream breaking change, or a refrakt internal API rename can leave an adapter's build broken on npm install and nobody discovers it until a user reports it on GitHub.

Out of scope

Live hosted deployments per adapter — Cloudflare / Vercel / Netlify subdomain for each adapter's example site. Considered and deferred: the maintenance cost of keeping five hosted sites' configs in sync replicates the very drift problem we're trying to detect. Acceptable as a once-per-minor-release manual sanity check, but not as a per-PR signal.
Visual regression testing (screenshot diffs via Playwright / Percy) — value is in detecting CSS / layout regressions, but the rune-rendered HTML being byte-identical and the CSS imports matching block-for-block already covers the structural cases. Visual diffing's incremental value is in CSS authoring drift, which isn't what this spec targets. Revisit if a regression class slips through the snapshot tests.
Cross-framework benchmarking (build times, bundle sizes, runtime perf) — useful for marketing copy, not for parity validation. Different concern.
Automating the capability matrix updates — Phase 1 keeps the matrix as a human-maintained checked-in doc with a PR-template prompt. Generating it from code is YAGNI until the matrix proves load-bearing.

Solution

Four layers, ordered cheapest-first:

Layer 1 — Shrink the per-adapter code surface

The cheapest form of drift prevention is "less code to drift". SPEC-058 already moves several SvelteKit-only utilities (composeSiteTokensCss, setupContentHmr, formatPipelineSummary, computeUsedCssBlocks) into @refrakt-md/transform/node or @refrakt-md/content, and replaces template-astro/src/setup.ts (~90 lines) with a thin createRefraktLoader wrapper (~20 lines).

This spec ratifies that reduction as a deliberate ongoing strategy: when a feature applies to more than one adapter, the canonical implementation lives in a shared package and adapters call it. Per-adapter code is reserved for the framework-specific bits that genuinely cannot be shared (Astro's integration API, Nuxt's module API, Eleventy's addPassthroughCopy, etc.).

After SPEC-058 ships, the typical adapter package should land in the 200–400 LOC range. Anything that grows past 500 LOC without a clear framework-specific reason is a code smell — the next addition probably belongs in shared infra.

Layer 2 — Examples directory with a shared content fixture

Add an examples/ directory at the monorepo root containing one example site per non-SvelteKit adapter (the docs site itself serves as the SvelteKit example):

examples/
  shared-fixture/
    content/                # ~15 .md pages exercising the rune surface
    refrakt.config.json     # canonical config: tokens, presets, tints, SEO, variables, security
  astro-site/
    refrakt.config.json     # → "contentDir": "../shared-fixture/content"
    astro.config.mjs
    package.json
    ...
  nuxt-site/
  next-site/
  eleventy-site/
  html-site/

Each example consumes workspace packages via workspace:* and runs against the live monorepo source — no published-package indirection. Builds via the adapter's standard CLI (astro build, nuxt build, next build, npx @11ty/eleventy, tsx examples/html-site/build.ts).

The shared fixture is the content corpus used for adapter validation everywhere — snapshot tests, CI smoke builds, manual release deploys, doc page screenshots. One source of truth.

Fixture content scope: cover representative members of every major rune family (hint, recipe, palette, hero, bento, accordion, datatable, nav-menubar, code block with shiki, table with svelte override), exercise the layout transform (docs layout with sidebar + breadcrumb + TOC; default layout for simple pages; blog-article layout), include at least one page using variables interpolation, at least one using tint= scoped projection, and at least one with frontmatter SEO overrides. Aim for ~15 pages, not 100 — coverage of distinct mechanisms, not exhaustive enumeration.

Layer 3 — CI: build every example on every PR

A .github/workflows/adapter-builds.yml workflow that runs npm run build inside each examples/*-site/ directory on every PR. Five parallel jobs; ~2 minutes wall time total.

Failure modes this catches:

Workspace dep mismatch (adapter imports a symbol that the current @refrakt-md/transform no longer exports)
Peer-dep break (Astro 5.x bump introduces a breaking integration-API change)
Missing export (a new feature lands in @refrakt-md/sveltekit and consumers, but the adapter's index.ts was forgotten)
Framework version drift (the example pins astro@^5.0.0; an astro@5.5 upstream change breaks the integration)

The smoke test is the cheapest signal: it doesn't validate output correctness, but it proves the adapter still runs against the current workspace state.

Layer 4 — Cross-adapter snapshot tests

For each example, after the build completes, extract three structured artifacts from the output and snapshot via vitest:

Rune-tree HTML — the inner content rendered by the identity transform, stripped of framework-specific shells (Astro page wrapper, Next RSC boundary, Vue compiler artifacts). Selector for extraction: <main class="rf-content"> or equivalent — locator documented per adapter. Expected: byte-identical across all adapters (same input → same identity transform → same HTML).
CSS imports list — the ordered set of stylesheet file paths the page loads, resolved from <link rel="stylesheet"> href attributes and <style> inline blocks (for adapters that inline tokens). Expected: block-for-block match between adapters (same tree-shake result). Comparison strips host-framework asset-hash suffixes.
SEO emission — <meta> tags + JSON-LD <script> contents from <head>. Expected: semantic equivalence (same set of og:* properties, same JSON-LD @type + payload). Adapters that emit via different mechanisms (Next.js metadata object, Nuxt useHead rendering, etc.) all produce the same final HTML.

The SvelteKit-rendered fixture is the reference snapshot. The other adapters' extracted artifacts diff against it. A drift in any of the three dimensions surfaces as a snapshot diff in the PR.

Implementation note: the extraction can be pure DOM traversal (cheerio or linkedom) — no need for a real browser. Each example's test file runs build, reads the output HTML files, extracts the three artifacts, and compares.

Layer 5 — Capability matrix doc

A checked-in docs/adapter-capabilities.md listing every cross-cutting feature × every adapter, with:

Feature name + one-line description
File path + line range that implements it for each adapter (or "shared via @refrakt-md/X")
Status: parity, partial, not-applicable (with one-line justification for the latter two)

The PR template grows a checkbox: "Did this PR touch adapter behavior? If yes, update docs/adapter-capabilities.md."

Soft enforcement; no CI gate on the doc. The point is making the omission visible during review, not blocking on it. If the doc proves load-bearing — i.e., people actually consult it — Phase 2 can add a script that diffs the matrix against the codebase. Until then, human maintenance is the right cost level.

Release-time manual sanity check

Once per minor release (not per PR), deploy each example to a one-shot preview URL on Cloudflare Pages (free tier, no DNS required — <branch>.<project>.pages.dev subdomains are auto-generated). Browse each preview manually; verify the fixture pages render as expected; screenshot for the release announcement if useful.

This is the only deployment infra this spec touches. No DNS, no production accounts, no per-PR cost.

Implementation phases

The four layers are independently shippable. Recommended order:

Examples directory + shared fixture — the substrate everything else builds on. Land first.
CI smoke builds — wire up the workflow once the examples exist. Immediate signal for dep / version drift.
Cross-adapter snapshot tests — layered on top of the CI build pass. Adds output-equivalence validation.
Capability matrix doc — independent; can land any time, including in parallel with the others.

Each phase is one work item. Total estimated effort: ~3–5 days of focused work for one engineer.

Validation

The infrastructure itself is validated by reproducing the SPEC-058 gap as a failing test in retrospect: temporarily revert one of SPEC-058's wiring items (e.g., remove composeSiteTokensCss from the Astro integration's Vite plugin) on a branch and confirm the snapshot test for the Astro example fails with a --rf-color-text missing in the extracted CSS imports. If it doesn't, the snapshot extraction is missing something and needs tuning.

References

SPEC-058 — the parity work whose validation infrastructure this spec defines
SPEC-030 — framework adapter system
packages/sveltekit/ — the reference adapter whose snapshots are the comparison target
site/content/ — pattern for the shared fixture's content shape