Status: accepted (draft recommendations from §9 adopted) Date: 2026-04-19 Tracked in: TODO.md — Path B: conformance driver contract Companion: REGRESSION_LOG.md — parity tests land as a new batch once the first scenarios pass.
ODS is a spec-driven framework with N renderer implementations that must all produce equivalent behavior for any valid spec. Today N=2 (Flutter local + React web); tomorrow could be Swift/SwiftUI, a terminal UI, or 3rd-party implementations once the spec goes public.
The regression suite so far writes separate tests per framework and compares behavior by eye. That scales badly and has already let two cross-framework divergences slip in (REGRESSION_LOG.md bugs #5 and #6).
This contract defines a framework-neutral “driver” interface. Every renderer ships an adapter that implements the interface; one shared scenario library runs against any adapter. Internal parity falls out; the same contract becomes a public 3rd-party conformance suite when the spec is ready to be called a “spec.”
email field of form signup,” not “find <input id=email>.” A
renderer choosing to map email onto a TextField, a <input>, or
a TUI prompt is none of the test’s business.Principle: the driver speaks in spec vocabulary. A field is named in the spec; the driver addresses it by that name. A button has a label in the spec; the driver clicks it by that label. No framework concepts (no “widget,” no “component instance,” no “selector”) cross the boundary.
The surface is split into five groups: Lifecycle, Input, Observation, Auth, Determinism.
interface Lifecycle {
/** Load a spec and reach the ready state (first page rendered). */
mount(spec: OdsSpec): Promise<void>
/** Tear down. Safe to call after any failure. */
unmount(): Promise<void>
/** Clear all app data but keep the spec loaded. Must be faster than
* unmount + mount. Used between scenario steps. */
reset(): Promise<void>
/** Capabilities the driver implements (see §6). Declared at
* construction; scenarios tagged with a missing capability are
* skipped, not failed. */
capabilities: ReadonlySet<Capability>
}
interface Input {
/** Set a value on a form field, addressed by the field's spec `name`.
* For forms that appear more than once on a page, `formId` is
* required; otherwise the single form on the page is implied. */
fillField(fieldName: string, value: FieldValue, formId?: string): Promise<void>
/** Click a button, addressed by its visible label. For duplicate
* labels, the nth occurrence (0-based) is selected. */
clickButton(label: string, occurrence?: number): Promise<void>
/** Click a row-level action in a list. */
clickRowAction(
dataSource: string,
rowId: string,
actionLabel: string,
): Promise<void>
/** Navigate via a menu item (matches ODS menu[].label). */
clickMenuItem(label: string): Promise<void>
}
type FieldValue =
| string // text, email, multiline, date, datetime, select
| number // number
| boolean // checkbox
Note:
mountplusfillField/clickButtonis enough to exercise every built-in action (navigate, submit, update, delete, showMessage) — scenarios drive those indirectly through the button/menu they’re attached to. The driver does not expose “execute action X directly”; that would let tests cheat past spec-level UI.
interface Observation {
/** Identity of the currently shown page. */
currentPage(): Promise<{ id: string; title: string }>
/** Structured snapshot of everything on the current page. See §4. */
pageContent(): Promise<ComponentSnapshot[]>
/** All rows in a data source, sorted by `_id` asc for determinism.
* Filters/sorts currently applied in a list component are NOT
* reflected here — this is the authoritative data, not UI state. */
dataRows(dataSource: string): Promise<Row[]>
/** Live form field values (what would be submitted if you clicked
* submit right now). */
formValues(formId: string): Promise<Record<string, FieldValue>>
/** The most recent toast / banner / message emitted by an action.
* Returns null if nothing has been emitted since last reset/mount. */
lastMessage(): Promise<Message | null>
}
interface Message {
text: string
level: 'info' | 'success' | 'warning' | 'error'
}
type Row = Record<string, unknown> & { _id: string }
interface Auth {
/** Login with email + password. Returns true on success. */
login(email: string, password: string): Promise<boolean>
/** Logout. Safe to call when already logged out. */
logout(): Promise<void>
/** Create an account (for selfRegistration specs). Returns user id
* on success, null on failure. */
registerUser(params: {
email: string
password: string
displayName?: string
role?: string
}): Promise<string | null>
/** Current authenticated user, or null for a guest session. */
currentUser(): Promise<UserSnapshot | null>
}
interface UserSnapshot {
id: string
email: string
displayName: string
roles: ReadonlyArray<string>
}
interface Determinism {
/** Fix "now" for default-value resolution (CURRENTDATE, NOW, +7d). */
setClock(isoTimestamp: string): Promise<void>
/** Seed the RNG used for generated IDs / slugs. */
setSeed(seed: number): Promise<void>
}
These MUST be honored by every driver; without them scenarios with date defaults or relative timestamps aren’t cross-run reproducible.
export interface OdsDriver
extends Lifecycle, Input, Observation, Auth, Determinism {}
The key design question: how do we describe “what the user sees” in a way that is the same across renderers?
Answer: a structural snapshot in spec vocabulary, returned from
pageContent(). Each snapshot element is a discriminated union keyed
by kind, mirroring ODS component types 1:1 plus runtime state.
export type ComponentSnapshot =
| TextSnapshot
| FormSnapshot
| ListSnapshot
| KanbanSnapshot
| ChartSnapshot
| ButtonSnapshot
| SummarySnapshot
| TabsSnapshot
| DetailSnapshot
interface BaseSnapshot {
kind: string
visible: boolean // honors visibleWhen, role gates, etc.
}
interface TextSnapshot extends BaseSnapshot {
kind: 'text'
content: string // formula-resolved
}
interface FormSnapshot extends BaseSnapshot {
kind: 'form'
id: string
fields: Array<{
name: string
type: FieldType
label: string
value: FieldValue | null
required: boolean
error: string | null // validation error attached to this field
}>
}
interface ListSnapshot extends BaseSnapshot {
kind: 'list'
dataSource: string
columnFields: string[]
rowCount: number // rows currently displayed (after filters)
sortField: string | null
sortDir: 'asc' | 'desc' | null
// Row `_id`s in displayed order after the driver applies defaultSort
// (and any future runtime sort/filter state). Distinct from
// `dataRows`, which returns the unsorted authoritative view.
// Empty array when the list has no rows. Added 2026-04-26 alongside
// s26 (`list defaultSort drives displayed row order`).
displayedRowIds: string[]
}
interface KanbanSnapshot extends BaseSnapshot {
kind: 'kanban'
dataSource: string
statusField: string
columns: Array<{ status: string; cardCount: number }>
}
interface ChartSnapshot extends BaseSnapshot {
kind: 'chart'
dataSource: string
chartType: 'bar' | 'line' | 'pie'
title: string | null
seriesCount: number
}
interface ButtonSnapshot extends BaseSnapshot {
kind: 'button'
label: string
enabled: boolean
}
interface SummarySnapshot extends BaseSnapshot {
kind: 'summary'
label: string
value: string // formula-resolved display string
}
interface TabsSnapshot extends BaseSnapshot {
kind: 'tabs'
tabs: Array<{ label: string; active: boolean }>
}
interface DetailSnapshot extends BaseSnapshot {
kind: 'detail'
dataSource: string
fields: Array<{ name: string; label: string; value: unknown }>
}
Why not just dump the rendered tree? Because that’s a framework idiom (React’s VDOM, Flutter’s element tree, SwiftUI’s opaque body). Structural snapshots are the minimum shared vocabulary.
Why per-component snapshot shape instead of a uniform “props” bag?
Because scenarios should be able to say expect(list.rowCount).toBe(3)
without casting. Type-safe spec-level assertions are the whole point.
What’s deliberately missing from snapshots:
dataRows() for that — avoids
coupling render order to test assertions).Escape hatch. Exactly one framework-specific hole: an optional
raw: unknown on each snapshot, populated only when a driver opts in.
Tests MUST NOT read raw — it exists for debugging and for
renderer-specific follow-up tests, never for conformance scenarios.
A scenario is a named closure that takes a driver and performs actions + assertions.
import { expect } from 'vitest'
import type { OdsDriver, Scenario } from 'ods-conformance'
export const s01_form_submit: Scenario = {
name: 'form submit inserts a row + shows success message',
spec: () => ({
appName: 'Mini Todo',
startPage: 'home',
pages: {
home: {
component: 'page', title: 'Home',
content: [
{ component: 'form', id: 'addForm', dataSource: 'tasks',
fields: [{ name: 'title', type: 'text', label: 'Title', required: true }] },
{ component: 'button', label: 'Save',
onClick: [
{ action: 'submit', dataSource: 'tasks', target: 'addForm' },
{ action: 'showMessage', message: 'Saved!', level: 'success' },
] },
{ component: 'list', dataSource: 'tasks',
columns: [{ field: 'title', label: 'Title' }] },
],
},
},
dataSources: {
tasks: { url: 'local://tasks', method: 'POST',
fields: [{ name: 'title', type: 'text' }] },
},
}),
capabilities: ['form', 'list', 'action:submit', 'action:showMessage'],
run: async (d: OdsDriver) => {
await d.fillField('title', 'Buy milk')
await d.clickButton('Save')
const rows = await d.dataRows('tasks')
expect(rows).toHaveLength(1)
expect(rows[0].title).toBe('Buy milk')
const msg = await d.lastMessage()
expect(msg?.text).toBe('Saved!')
expect(msg?.level).toBe('success')
},
}
Runner responsibilities (not the scenario’s):
mount(spec) before run, unmount() aftercapabilities aren’t supported by the driversetSeed(0) and setClock('2026-01-01T00:00:00Z') before each runWhy closures over JSON?
A flat set of capability tags, versioned alongside the ODS spec.
type Capability =
// Required baseline every conforming driver must support.
| 'core' // pages, text, form, button, list, navigate, submit, showMessage
// Optional feature packs.
| 'kanban'
| 'chart'
| 'tabs'
| 'detail'
| 'summary'
| 'formulas' // computed fields
| 'rowActions' // per-row list actions
| 'cascadeRename'
| 'auth:multiUser'
| 'auth:selfRegistration'
| 'auth:ownership' // row-level security
// Granular action variants.
| 'action:submit'
| 'action:update'
| 'action:delete'
| 'action:navigate'
| 'action:showMessage'
Drivers declare the capabilities they support; scenarios declare what they need. The runner takes the intersection.
Spec versioning is separate from capabilities. A driver also
declares supportedSpecVersion (semver range). Scenarios that use a
newer spec feature set a requiresSpecVersion range; drivers not in
range skip.
Keeps the picture clean: the spec version tracks schema, the capability set tracks runtime behavior.
Two modes, same interface.
import { FlutterDriver } from '@ods/driver-flutter'
import { ReactDriver } from '@ods/driver-react'
import { runScenarios, scenarios } from 'ods-conformance'
for (const driver of [new FlutterDriver(), new ReactDriver()]) {
runScenarios(driver, scenarios)
}
In-process is what the JS/TS side does natively. Dart is the awkward case — the Flutter driver will expose a JS/TS-compatible adapter that calls into a Dart runtime. Options:
package:flutter_test from a Dart scenario runner.Decision (draft): each driver package ships the native adapter for its host language. The Flutter adapter’s Dart scenario runner parses the same scenario closures (transpiled to Dart) OR operates via the wire protocol below. This is the biggest open question in the design — see §9.
A JSON-RPC 2.0 interface over a local websocket. Every driver method is a request; snapshots are responses. Deliberately boring:
{"jsonrpc":"2.0","id":1,"method":"mount","params":{"spec":{...}}}
{"jsonrpc":"2.0","id":2,"method":"fillField","params":{"fieldName":"title","value":"Buy milk"}}
{"jsonrpc":"2.0","id":3,"method":"pageContent"}
Shipping the wire protocol turns “any renderer” into “any renderer with a 200-line JSON-RPC server.” Not for MVP; documented here so the in-process surface doesn’t accidentally box us out of it.
docs/conformance-driver-contract.md.OdsDriver TypeScript interface in packages/ods-conformance/src/contract.ts.ReactDriver that adapts the existing AppEngine state +
DataService to the interface. In-process; no IPC.FlutterDriver in Dart with the equivalent surface. For
Phase A we accept separate Dart and TS scenario runners that share
the spec of the contract but run locally per language. Revisit
unification in Phase B.ods-conformance + driver contract docs.raw escape hatch — ship it from day one, or not? Opinion
splits possible: present it and tests will (eventually) misuse it;
omit it and debugging conformance failures gets harder.
Draft recommendation: omit from MVP, add under the name
debugInspect() in a separate DebugDriver mix-in once we know
which callers actually need it.
Row identification in clickRowAction — rowId assumes PB-ish
string ids. Some future driver might use integer or composite keys.
Draft recommendation: keep rowId: string for MVP; document that
drivers stringify native ids. Composite keys warrant a new method if
they ever appear.
“Page snapshot order vs rendered order.” Scenarios that say
“the list is below the form” — do we preserve that ordering in
pageContent()? Draft recommendation: yes, snapshots preserve
the spec’s content[] order exactly.
OAuth2 scenarios — the driver surface doesn’t cover them.
Draft recommendation: out of scope for Phase A. Add
loginWithOAuth2(provider, ...) when we have a second renderer
that supports it.
action:submit or submitAction? Colons
read nicely but hyphens are safer for shell / filename round-trips.
Draft recommendation: action:submit in code, never used as a
filename.To review together, focus on:
Once we agree on §3/§4 and close out §9, the Phase A implementation is mostly mechanical.