North America Review Operations · 24h SLA

Failed robot runs, reviewed in 24 hours.

You send failed or ambiguous runs. We do the watching, apply your rubric, and return structured labels, escalation flags, and weekly failure patterns. U.S. time zones, no stack change.

Start a 4-week pilot→ See a sample report →

SLA

24 hours

Operations

U.S. time zones

Integration

Zero-change — you send, we return

NDA-ready
No model training
Limited reviewer access
Secure upload
Deletion on request

thresor_week_17.csv Sample output

Mission	Decision	Failure type	Esc.
M-40281	Fail	Blocked aisle	—
M-40284	Fail	Nav failure	Yes
M-40287	Pass	—	—
M-40291	Fail	Map / site change	Yes
M-40293	Fail	LiDAR inconsistent	Yes

5 of 127 runs returned in 19h 42m

The problem

Fleet review is eating your ops team.

Cleaning robots generate too many failed or unclear runs. Someone expensive inside your company is watching the videos, guessing at the labels, and losing the pattern.

× Current state

Engineers and deployment leads review video by hand
Labels are inconsistent run-to-run and week-to-week
Repeat failure modes go un-identified for weeks
Feedback loops to product and ops are slow and noisy
Nobody owns "is this a site issue or a software issue?"

✓ With Thresor

You forward failed runs to us — we do the review
Consistent rubric, gold-set, double-reviewed samples
Per-run labels returned within 24 hours
Weekly summary shows repeat modes and site patterns
Your team reads the report; we do the watching

How it works

Four steps. One SLA.

No new dashboards, no new API. Send us the runs you already flag; we return structured output in the format you already use.

01

You send

Daily or batched export of failed or ambiguous runs: mission video or replay, basic event logs, task metadata, site and robot ID, and your SOP or review rubric.
02

We review

Our review team watches the video, reads the logs, and applies structured labels against the agreed rubric. Unclear cases escalate to lead review.
03

You receive

Per-run output: success/failure decision, failure type, severity, escalation flag, reviewer note, root-cause bucket. Returned in spreadsheet, CSV, Airtable, or your format.
04

We summarize

Weekly failure summary: top modes, repeat sites, robots with unusual failure rates, operational vs. product patterns, and concise action recommendations.

Deliverables

Structured data, on your schedule.

Every reviewed run returns a consistent row. Every Monday, a one-page failure summary lands in your inbox.

Sample · per-run output

thresor_week_17.csv

Mission	Decision	Failure type	Severity	Escalate	Note
M-40281	Failure	Blocked aisle	Low	—	Pallet left in zone 4B after 17:40 shift.
M-40284	Failure	Navigation failure	High	Yes	Robot R-12 stuck at same dock pillar 3rd time this week.
M-40287	Success	—	—	—	Initially flagged; review confirms complete clean.
M-40291	Failure	Mapping / site change	Med	Yes	New rack layout in B-row; map needs refresh.
M-40293	Unclear	Needs technical review	Med	Yes	LiDAR returns inconsistent — recommend on-site check.

01 · Per-run labels

Consistent, rubric-driven labels

Success/failure, failure type, severity, escalation flag, reviewer note, and root-cause bucket — in your format.

02 · Escalation flags

Know what needs a human today

High-severity, repeat-mode, and safety-relevant runs surface immediately so your on-call isn't reading every row.

03 · Weekly summary

The pattern, not the noise

Top failure modes, repeat sites, robots with unusual failure rates, and operational-vs-product signal. One page, Monday.

Quality controls

Why the labels hold up.

Shared rubric

Agreed before launch, not after the first disagreement.

Gold set

Canonical examples for each failure type, reviewed together.

Double review

Sample of live work is reviewed twice; agreement rate tracked.

Weekly calibration

30 minutes with your team to resolve edge cases and recalibrate.

Pricing

Start with a fixed-fee pilot.

One workflow, one SLA, one volume cap. If it doesn't save your team more hours than it costs, we shouldn't expand it.

What review may already cost

Reclaim the hours hiding in failed-run review.

If review is taking skilled people 20-30 minutes per run, the monthly cost gets big quickly.

500 runs x 20-30 min review time 167-250 hrs/mo

Example: $90-$110/hr all-in internal cost ~$15k-$27.5k/mo

Thresor fixed pilot $7.5k-$10k

The point is not to replace your team. It is to stop using senior people for first-pass review.

4-week pilot

$7,500 – $10,000

Fixed fee. Scope and price set before kickoff and doesn't change during the pilot.

✓ Up to 500 failed or ambiguous runs reviewed
✓ One task family — e.g. warehouse cleaning
✓ 24-hour turnaround on standard queue
✓ Shared rubric, gold set, calibration
✓ Weekly failure summary
✓ Delivery in spreadsheet, CSV, Airtable, or agreed format

Start a pilot→

Post-pilot pricing set on volume.

Tell us what's breaking.

30-minute working session: align on the first rubric, confirm data fields, pick a start date. No deck required.