You send failed or ambiguous runs. We do the watching, apply your rubric, and return structured labels, escalation flags, and weekly failure patterns. U.S. time zones, no stack change.
| Mission | Decision | Failure type | Esc. |
|---|---|---|---|
| M-40281 | Fail | Blocked aisle | — |
| M-40284 | Fail | Nav failure | Yes |
| M-40287 | Pass | — | — |
| M-40291 | Fail | Map / site change | Yes |
| M-40293 | Fail | LiDAR inconsistent | Yes |
Cleaning robots generate too many failed or unclear runs. Someone expensive inside your company is watching the videos, guessing at the labels, and losing the pattern.
No new dashboards, no new API. Send us the runs you already flag; we return structured output in the format you already use.
Daily or batched export of failed or ambiguous runs: mission video or replay, basic event logs, task metadata, site and robot ID, and your SOP or review rubric.
Our review team watches the video, reads the logs, and applies structured labels against the agreed rubric. Unclear cases escalate to lead review.
Per-run output: success/failure decision, failure type, severity, escalation flag, reviewer note, root-cause bucket. Returned in spreadsheet, CSV, Airtable, or your format.
Weekly failure summary: top modes, repeat sites, robots with unusual failure rates, operational vs. product patterns, and concise action recommendations.
Every reviewed run returns a consistent row. Every Monday, a one-page failure summary lands in your inbox.
| Mission | Decision | Failure type | Severity | Escalate | Note |
|---|---|---|---|---|---|
| M-40281 | Failure | Blocked aisle | Low | — | Pallet left in zone 4B after 17:40 shift. |
| M-40284 | Failure | Navigation failure | High | Yes | Robot R-12 stuck at same dock pillar 3rd time this week. |
| M-40287 | Success | — | — | — | Initially flagged; review confirms complete clean. |
| M-40291 | Failure | Mapping / site change | Med | Yes | New rack layout in B-row; map needs refresh. |
| M-40293 | Unclear | Needs technical review | Med | Yes | LiDAR returns inconsistent — recommend on-site check. |
Success/failure, failure type, severity, escalation flag, reviewer note, and root-cause bucket — in your format.
High-severity, repeat-mode, and safety-relevant runs surface immediately so your on-call isn't reading every row.
Top failure modes, repeat sites, robots with unusual failure rates, and operational-vs-product signal. One page, Monday.
Agreed before launch, not after the first disagreement.
Canonical examples for each failure type, reviewed together.
Sample of live work is reviewed twice; agreement rate tracked.
30 minutes with your team to resolve edge cases and recalibrate.
One workflow, one SLA, one volume cap. If it doesn't save your team more hours than it costs, we shouldn't expand it.
If review is taking skilled people 20-30 minutes per run, the monthly cost gets big quickly.
The point is not to replace your team. It is to stop using senior people for first-pass review.
Fixed fee. Scope and price set before kickoff and doesn't change during the pilot.
30-minute working session: align on the first rubric, confirm data fields, pick a start date. No deck required.