The clinical-workflow data layer that telemetry can't capture.
We are building a credentialed, policy-learning-grade corpus of surgical and sterile-processing workflows. The differentiator isn't phase recognition — open corpora already do that. It's the non-telemetry workflow context: interruptions, rework, deviations, and handoffs, captured under credentialed clinical access and labeled to clinical standards.
Credentialed capture of non-telemetry workflow data
Robot telemetry tells you what a machine did. Open robotics corpora (for example NVIDIA's Open-H) give you scale on generic manipulation. Neither contains the layer that makes real clinical work real: the human workflow around the procedure — who handed what to whom, where the count came up short, what was reworked, and which deviations from protocol occurred. That layer is invisible to sensors and absent from scraped video. Capturing it requires credentialed access to the clinical environment and clinical judgment to label it. That is the wedge.
Why us: our founding team includes a surgical RN and a hospital robotics coordinator. Credentialed clinical access — not dataset hosting — is the differentiator.
The four-layer annotation taxonomy
Four independent label tracks over the same timeline. A buyer can consume any subset. L1–L2 are the conventional surface; L3–L4 are where the policy-learning value concentrates.
Workflow phase / step
A two-level hierarchy over the workflow timeline — phases decomposed into ordered steps. This is the conventional surface that existing surgical-video datasets stop at.
Action triplets
<actor-role, verb, object> intervals with start/end timestamps. The first slot is who acts (tech, nurse, assist, robot, surgeon), because in workflow automation who acts is a control variable.
Instrument / object state
Per-object tracks that persist across an episode — location zone, sterility state, and count events. This is what turns a video into a manipulable world model a policy can reason over.
Workflow-context events
Typed events for the things that make clinical work real: interruptions, corrections / rework, protocol deviations, and handoffs. This is the differentiating layer and the hardest, most valuable supervision to collect.
A policy trained only on clean runs learns the happy path and fails the moment reality diverges. L4 explicitly localizes the recoveries, corrections, and deviations — exactly the transitions a robust policy must imitate, and exactly what a classification corpus structurally cannot provide.
- Negative & recovery examples — the rare, expensive supervision a robust policy needs.
- Demonstration-quality filtering — tell a clean demo from a salvaged one; filter, down-weight, or train recovery on precisely those segments.
- Reward & constraint signal — deviations and sterility-breach events are directly usable as negative reward or constraint-violation labels for safe-policy and offline-RL methods.
RLDS / LeRobot-compatible serialization
The corpus serializes natively to the RLDS step/episode model used by Open X-Embodiment and round-trips to LeRobot. Your existing OXE / LeRobot pipeline ingests our episodes without a custom loader.
Sidecar is the source of truth
Versioned & auditable
Inspect the schema yourself
One fully-worked sample episode — a ~6-minute SPD tray-reassembly with all four label layers (L1 phase/step, L2 action triplets, L3 instrument-state tracks, L4 workflow-context events including a missing-instrument exception and substitution) plus a derived RLDS step excerpt. Load it, validate it, see exactly what an episode looks like.
Synthetic example conforming to schema v0.1 — no real patient or clinical data. Every value is fabricated for illustration; no clinical capture has occurred.
Validate your own capture
We publish the episode format as a machine-readable JSON Schema (draft 2020-12) so you can validate your own captured episode against ours before any data-sharing agreement exists. The schema is permissive by design — a superset of our fields still validates — so it interops rather than locks you in.
Download JSON Schema (kindly-episode.schema-v0.1.json)# Node (ajv-cli)
npx ajv-cli@5 validate --spec=draft2020 -c ajv-formats \
-s kindly-episode.schema-v0.1.json -d your-episode.json
# Python (jsonschema)
import json, jsonschema
from jsonschema import Draft202012Validator
schema = json.load(open("kindly-episode.schema-v0.1.json"))
Draft202012Validator(schema).validate(json.load(open("your-episode.json")))The sample episode above conforms to this schema. The schema is v0.1 and will evolve with first-cohort design partners.
Honest status
This schema is v0.1 and in development. No clinical data has been captured against it yet — that is the point of the design-partner program. Our ingestion, labeling, and lineage pipeline is built and running today on non-clinical surrogate data; the clinical corpus is what first-cohort partners shape. We are pre-funding, pre-contract, pre-IRB, and pre-capture, and we'd rather be exactly this honest than impressive on paper.
The first cohort launches Q3 2026, capped at three design partners, each shaping the capture protocols and annotation schema around their model's needs. There is no fee to participate.
For foundation-model labs and surgical-robot OEMs evaluating clinical training data sources. Email taylorm@kindly.fyi with your team, your target use case, and what data you've tried so far.