Field testing

We dogfood pedestrian routing internally. The frozen unit is an origin–destination pair: one facade call across all three product profiles (Fast / Calm / Safe), one shared time bucket, full per-edge path details. Testers compare the three options and rate the choice, then walk one assigned mode inside a daytime walk window, annotating guidance nodes on the spot with mode-aware issue chips. Managers freeze the pairs, plan the cycle, and review the collected feedback next to street-level imagery. Everything lives behind this page.

I'm walking routes (tester)

Field app Your assigned pair on the phone — open the personal link you received (/field?t=<you>&day=<n>), compare the three options, walk your assigned mode, tap nodes, rate the walk.
Tester guide Two-minute read before your first walk: the compare step, issue chips per mode, photos, dictation, what gets recorded.

I'm running the cycle (manager / reviewer)

Map demo Build an O/D pair and freeze it with "Save for field" — one facade call routes all three product profiles and records geometry, per-edge path details, and the shared time bucket in a single pair-doc. Doubles as the desk-review tool: any node opens in Mapillary / Street View.
Bench Desk comparison of our pedestrian route against Google's (ours blue, Google amber) on a Google basemap, with a distance / overlap card and Street View inspection. Eyeball where the two routes diverge.
Field plan Assign pair × mode × day per tester, each with a walk window; generates the personal links.
Field review Collected feedback: per-pair overview (walks, ratings, issue counts) and a map view where flagged nodes sit next to street-level imagery, with the triage queue pre-sorted by fix surface.
Manager guide The full runbook: freezing pairs, assignments, collection, photo EXIF-join, imagery keys.

How the pieces fit

  1. Freeze — build O/D pairs in the demo and save them for field testing. Each frozen pair carries all three product profile routes plus per-edge path details, so every complaint joins back to what the engine believed at that exact spot (frozen snapshots keep all feedback comparable).
  2. Plan — assign tester × day × mode with a walk window on the field plan and send out personal links with the tester guide.
  3. Walk — testers first compare the three options and rate choice value, then walk the assigned mode, annotating nodes with mode-aware chips and rating correctness and mode alignment at the end in the field app; rows queue offline and sync automatically.
  4. Review — triage on field review, queue pre-sorted by fix surface; verify flagged nodes remotely via street-level imagery and the engine's per-edge beliefs; export the raw feedback.jsonl for analysis.

Every feedback row joins to its frozen pair's exact geometry, per-edge path details, weights, and time bucket — rows double as replayable regression cases for routing and guidance work, and the pair-docs feed the bench directly (edge overlap, detour ratios, time deltas across the full pair set).

Cycle scope and method

Ratings are descriptive health metrics — no decision hangs on a rating alone. Issue chips and their joins to the frozen pair-docs are what drives fixes. "Ratings tell us how bad it is; chips and joins tell us what to fix; the bench tells us whether we fixed it."