01Overview · a video data stream built for AI training pipelines
ENDATA's video delivery isn't a raw-video dump — every dataset comes with a machine-readable metadata schema, temporal alignment, and coordinate normalization. Customer ingestion scripts can consume it directly. Supports task-family targeting, cross-scene capture, POV control (egocentric / third-person), and configurable resolution and frame rate.
Pre-cut, task-family-tagged video clips
delivered end-to-end to your private cloud
Not "collect, clean, then find a platform" — ENDATA keeps the complexity upstream and ships training-ready samples. From film originals to embodied data, one unified four-dimensional schema.
02Data categories · four sources covering the full video-data stack
Film & TV, social, e-commerce, embodied — each has unique training value. From narrative structure of content creation to causal action reasoning in egocentric POV, the four sources complement rather than overlap.
Film & TV Video
Series, variety shows, movies and documentary clips. With IP tags, plot milestones, character annotations, emotion arcs.
Social Short Video
Short videos, livestream cuts, UGC. With author metadata, engagement data, topic tags, audio alignment.
E-commerce Demos
Livestream selling, product demos, unboxing & reviews. With SKU alignment, price points, selling-point tags, conversion milestones.
Embodied AI Video
Egocentric POV, action trajectories, grasping demos, navigation paths. With joint angles, force feedback, task success markers.
03Acquisition & curation · three steps to the training pipeline
Customers arrive with a training objective. ENDATA maps it to underlying sources and discovery filters, curates at high granularity, and delivers in a structured way. From Define to Deliver, every step is customizable.
Define task families
& scene dimensions
Customers arrive with a training objective — task family, scene, POV, duration, language. ENDATA maps these to source data and discovery filters, producing an executable acquisition strategy.
High-granularity curation
& quality scoring
Multi-dimensional filtering on content tags, action semantics, image quality and licensing status — discarding low-value and high-risk samples, keeping only the high-ROI "right data" for training.
Structured delivery
to your private cloud
Pre-cut clips with standardized metadata, exported as RLDS / LeRobot v3 / WebDataset / custom schema — directly pluggable into your training pipeline. Supports incremental updates and versioning.
04Delivery formats · drop-in with mainstream training pipelines
Supports three standard formats — RLDS (Reinforcement Learning Datasets), LeRobot v3, WebDataset — plus custom schemas. Your ingestion scripts consume it directly.
# episode structure { "observation": { "image": <video_frame>, "state": <joint_angles>, }, "action": <action_vector>, "reward": <scalar>, }
# episode layout
dataset/
├── meta/
│ └── info.json
├── data/
│ └── chunk-000.parquet
└── videos/
└── episode_000.mp4
# tar structure
shard_000.tar:
00001.mp4
00001.json
00002.mp4
00002.json
# ingestion-friendly { "schema_version": "2.0", "fields": ["video", "caption", "task_id", "pov", ...], "license_chain": [...] }
05 · NEW 2026Embodied AI Datasets
Egocentric POV + action-labeled training data. Aligned with the 2026 commercial inflection in embodied intelligence, ENDATA launches four scene-family datasets — kitchen, household, navigation, manipulation — with expanding task-family coverage.
Kitchen Scenes
Grasping, prepping, cooking, cleaning — complete kitchen task-family POV and action annotations.
Household Scenes
Tidying, cleaning, operating appliances, tending plants — diverse samples of everyday household tasks.
Navigation Scenes
Indoor/outdoor navigation, obstacle avoidance, path planning, target tracking — POV video streams.
Manipulation Scenes
Grasping, placement, assembly, tool use — fine manipulation task-family with full annotations.