What is FoodforThought?

FoodforThought is an open-source robotics data infrastructure for organizing and sharing robot learning datasets. It supports VLA (Vision-Language-Action) training, diffusion policies, and behavioral cloning with full data lineage tracking from raw demonstrations to trained models.

Is FoodforThought compatible with LeRobot?

Yes! FoodforThought natively supports LeRobot format import and export. You can import datasets directly from HuggingFace Hub and export your datasets in LeRobot-compatible format for training with ACT, Diffusion Policy, and other methods.

What robots does FoodforThought support?

FoodforThought supports a wide range of robots including humanoid platforms (Figure, Unitree H1/G1, Boston Dynamics Atlas), manipulators (Franka, UR5, WidowX, ALOHA), and mobile robots (Spot, Fetch, TurtleBot). We also support egocentric and factory data.

What training methods does FoodforThought support?

FoodforThought provides training recipe templates for popular methods including OpenVLA, Octo, RT-1/RT-X, Diffusion Policy, and LeRobot ACT. Datasets include metadata for action space, observation types, and episode statistics.

How do I contribute datasets to FoodforThought?

You can contribute datasets by signing up for a free account and using our upload page. We support multiple formats including HDF5, RLDS, ROS bags, Zarr, and LeRobot format. All contributions are credited to you.

FoodforThought

Clinical Annotation Schema

v0.1 — in development — design-partner program Q3 2026

The clinical-workflow data layer that telemetry can't capture.

We are building a credentialed, policy-learning-grade corpus of surgical and sterile-processing workflows. The differentiator isn't phase recognition — open corpora already do that. It's the non-telemetry workflow context: interruptions, rework, deviations, and handoffs, captured under credentialed clinical access and labeled to clinical standards.

Join the design partner program

Credentialed capture of non-telemetry workflow data

Robot telemetry tells you what a machine did. Open robotics corpora (for example NVIDIA's Open-H) give you scale on generic manipulation. Neither contains the layer that makes real clinical work real: the human workflow around the procedure — who handed what to whom, where the count came up short, what was reworked, and which deviations from protocol occurred. That layer is invisible to sensors and absent from scraped video. Capturing it requires credentialed access to the clinical environment and clinical judgment to label it. That is the wedge.

Why us: our founding team includes a surgical RN and a hospital robotics coordinator. Credentialed clinical access — not dataset hosting — is the differentiator.

The four-layer annotation taxonomy

Four independent label tracks over the same timeline. A buyer can consume any subset. L1–L2 are the conventional surface; L3–L4 are where the policy-learning value concentrates.

Workflow phase / step

A two-level hierarchy over the workflow timeline — phases decomposed into ordered steps. This is the conventional surface that existing surgical-video datasets stop at.

Phase recognition is the known ceiling of open corpora. We treat it as the easiest, lowest layer — necessary, but not where the differentiated value lives.

Action triplets

<actor-role, verb, object> intervals with start/end timestamps. The first slot is who acts (tech, nurse, assist, robot, surgeon), because in workflow automation who acts is a control variable.

Annotated as intervals rather than per-frame classes, so action extent is preserved for behavior cloning and goal-conditioned imitation — not just a per-frame class head.

Instrument / object state

Per-object tracks that persist across an episode — location zone, sterility state, and count events. This is what turns a video into a manipulable world model a policy can reason over.

State transitions (added, removed, recount, relocated, missing-confirmed) are emitted as discrete events with piecewise-constant state between them — cheap to store, trivial to expand to per-step observations.

the layer nobody else ships

Workflow-context events

Typed events for the things that make clinical work real: interruptions, corrections / rework, protocol deviations, and handoffs. This is the differentiating layer and the hardest, most valuable supervision to collect.

A policy trained only on clean runs learns the happy path and fails the moment reality diverges. L4 explicitly localizes the recoveries, corrections, and deviations — exactly the transitions a robust policy must imitate, and exactly what a classification corpus structurally cannot provide.

Negative & recovery examples — the rare, expensive supervision a robust policy needs.
Demonstration-quality filtering — tell a clean demo from a salvaged one; filter, down-weight, or train recovery on precisely those segments.
Reward & constraint signal — deviations and sterility-breach events are directly usable as negative reward or constraint-violation labels for safe-policy and offline-RL methods.

RLDS / LeRobot-compatible serialization

The corpus serializes natively to the RLDS step/episode model used by Open X-Embodiment and round-trips to LeRobot. Your existing OXE / LeRobot pipeline ingests our episodes without a custom loader.

Sidecar is the source of truth

Interval and event labels (L2 triplets, L3 transitions, all of L4) are preserved losslessly as timestamped intervals in a per-episode JSON sidecar. The per-step RLDS flattening is a derived, regenerable view — no lossy "everything must be per-step" trap.

Versioned & auditable

SemVer per artifact, stamped into every episode. Closed vocabularies are versioned per workflow family. Shipped episodes are immutable. Released episodes carry de-identification status and provenance so nothing ships unmarked.

Inspect the schema yourself

One fully-worked sample episode — a ~6-minute SPD tray-reassembly with all four label layers (L1 phase/step, L2 action triplets, L3 instrument-state tracks, L4 workflow-context events including a missing-instrument exception and substitution) plus a derived RLDS step excerpt. Load it, validate it, see exactly what an episode looks like.

Download sample episode (synthetic JSON)

Synthetic example conforming to schema v0.1 — no real patient or clinical data. Every value is fabricated for illustration; no clinical capture has occurred.

Validate your own capture

We publish the episode format as a machine-readable JSON Schema (draft 2020-12) so you can validate your own captured episode against ours before any data-sharing agreement exists. The schema is permissive by design — a superset of our fields still validates — so it interops rather than locks you in.

Download JSON Schema (kindly-episode.schema-v0.1.json)

# Node (ajv-cli)
npx ajv-cli@5 validate --spec=draft2020 -c ajv-formats \
  -s kindly-episode.schema-v0.1.json -d your-episode.json

# Python (jsonschema)
import json, jsonschema
from jsonschema import Draft202012Validator
schema = json.load(open("kindly-episode.schema-v0.1.json"))
Draft202012Validator(schema).validate(json.load(open("your-episode.json")))

The sample episode above conforms to this schema. The schema is v0.1 and will evolve with first-cohort design partners.

Honest status

This schema is v0.1 and in development. No clinical data has been captured against it yet — that is the point of the design-partner program. Our ingestion, labeling, and lineage pipeline is built and running today on non-clinical surrogate data; the clinical corpus is what first-cohort partners shape. We are pre-funding, pre-contract, pre-IRB, and pre-capture, and we'd rather be exactly this honest than impressive on paper.

The first cohort launches Q3 2026, capped at three design partners, each shaping the capture protocols and annotation schema around their model's needs. There is no fee to participate.

Join the design partner program Book a 20-min intro call Back to FoodforThought

For foundation-model labs and surgical-robot OEMs evaluating clinical training data sources. Email taylorm@kindly.fyi with your team, your target use case, and what data you've tried so far.