This doc explains the test layers in this monorepo, what each one is for, and where the boundaries are. If you’re contributing a fix or feature, the section on “where does my test go” is the practical answer.
From fastest + narrowest to slowest + broadest:
Per-module, no UI, no network, no database (or a fake one). Pure function and class behavior.
Frameworks/react-web/tests/unit/Frameworks/flutter-local/test/engine/,
test/models/, test/parser/Run via npm test (React) and flutter test test/engine ... (Flutter).
Hundreds of tests; should take seconds.
Render a single component in isolation against a fake data layer. Verify the component reacts to state changes, dispatches events, and emits the expected DOM/widget tree.
Frameworks/react-web/tests/component/ (Testing
Library + Vitest jsdom env)test/widget/ — 42 tests; in the gate. The harness
must use bootEngineFor(tester, …) + disposeAllFor(tester, booted)
so SQLite work runs inside tester.runAsync; otherwise sqflite_ffi’s
native-bridge timers stall flutter_test’s FakeAsync zone forever.
See _test_harness.dart.Multi-module flows against the real AppEngine and a temp-folder
SQLite. Verifies end-to-end behavior of the framework — submit
inserts, action chains fire, cascade rename propagates, etc.
Frameworks/flutter-local/test/integration/
(batch1_* through batch9_*).@slow-tagged — perf baselines that flake on
Windows I/O timing budgets. Excluded from publish.sh and CI by
default. Run on demand with flutter test --tags=slow.React doesn’t have a separate “integration” tier — its component tests + the conformance suite + E2E cover that ground.
The contract that pins behavior consistency between renderers. Each scenario runs against every renderer’s driver; a scenario that passes in one but not the other is a parity bug.
Frameworks/conformance/specs/*.jsonFrameworks/conformance/src/scenarios.ts
(TS) and Frameworks/flutter-local/test/conformance/scenarios.dart
(Dart)Frameworks/react-web/tests/conformance/react-driver.ts
and Frameworks/flutter-local/test/conformance/flutter_driver.dart26 scenarios as of 2026-04-25, pinning ~14 capabilities. The contract itself is documented in docs/adr/0001-conformance-driver-contract.md.
This is the contract — write the test first. When you change cross-framework behavior, the scenario goes in before the implementation. Both drivers should be red, then both should be green. A merged feature without a failing-then-passing scenario was not built test-first. See CONTRIBUTING.md → Conformance scenarios for the full workflow.
End-to-end through the real React app + a real PocketBase, driven by Playwright. Slowest, broadest coverage; gated separately in CI.
Frameworks/react-web/tests/e2e/cd Frameworks/react-web && npx playwright test --project=chromiumtests/e2e/global-setup.ts and global-teardown.ts. No manual
setup needed; the tests bring their own PB binary.| Folder | Purpose | Examples |
|---|---|---|
smoke/ |
“Does the app come up?” — checks at startup boundaries | app-loads.spec.ts, navigation.spec.ts |
critical/ |
Paths that must work — guards on routing, auth, validation | admin-guard.spec.ts, routing.spec.ts |
regression/ |
Pin a previously-found bug or batch finding | multi-user.spec.ts, app-crud.spec.ts |
workflows/ |
Multi-step user journeys that touch several screens | (empty — future) |
accessibility/ |
@axe-core/playwright scans on key pages |
(empty — future) |
The empty folders are intentional placeholders. Both accessibility
and workflows are tracked in TODO.md as “nice-to-haves.”
When you add an E2E test, ask:
smoke/critical/regression/ (with bug id in
the test name when possible)workflows/accessibility/Stryker rewrites code with small mutations (e.g., > → <, true →
false, + → -) and re-runs the test suite. A “surviving mutant”
means the code was broken but the tests still passed — i.e., the tests
aren’t sensitive enough.
Frameworks/react-web/stryker.config.jsoncd Frameworks/react-web && npm run mutation.github/workflows/mutation.yml plus
on-demand workflow_dispatch. NOT in publish.sh or the per-PR
gate — a mutation run takes minutes per file.Mutation thresholds in the config (high: 80, low: 60) are
informational for now — break: 0 means the run never fails the
build. Switch to a real break threshold after 2-3 weekly runs settle
and we know the realistic baseline.
Decision tree for new tests:
Is it cross-framework behavior? (Same spec should produce the
same observable behavior on Flutter + React.)
→ Conformance scenario (Frameworks/conformance/).
tests/component/) if isolated;
E2E (tests/e2e/) if it requires the full app + PocketBase.test/widget/ — 42 tests, in the
gate as of 2026-04-26) for component-level rendering;
integration test (test/integration/) for multi-module flows.
Widget tests must use bootEngineFor(tester, …) from the
harness so SQLite work runs inside tester.runAsync —
otherwise sqflite_ffi’s native-bridge timers stall the
FakeAsync zone forever.tests/unit/).test/engine/, test/models/,
test/parser/.test/integration/batch9_performance_test.dart is the home for
these (@slow-tagged). React doesn’t have a perf gate yet.publish.sh runs the same gate locally that CI runs:
flutter test test/engine test/models test/parser test/integration test/conformance --exclude-tags=slownpm test (vitest unit + component + conformance)E2E tests are NOT in publish.sh — they’re slower and live in the
e2e job in .github/workflows/react.yml.
Living counts (kept rough):
| Layer | Count |
|---|---|
| Flutter (incl. widget, excl. slow) | ~865 |
| React (unit + component + conformance) | ~1145 |
| Conformance scenarios | 26 (× 2 drivers = 52 parity tests) |
| React E2E (Playwright) | ~50 |
For exact current numbers and the bugs each batch found, see REGRESSION_LOG.md.