NAVIGATION
Datasets Video · VLA Video Feeds → SaaS Products Solutions About
LANGUAGE
中文 CN → English EN ✓
Console Contact Sales

Video Feeds
for Video-Native AI & VLA CONTINUOUS · TARGETED · COMPLIANT VIDEO DATA STREAM

Video-native AI and embodied intelligence are rewriting the scarcity curve of multimodal data. Drawing on global-scale video assets across film & TV, social and e-commerce — paired with metadata schemas and multi-task labeling — ENDATA delivers a continuous, compliant, scalable video data stream for video generation, video understanding, world models and VLA training.

Clips on Hand2.3B+
Daily Bandwidth800 TB+
Task Families120+
FormatsRLDS · LeRobot · WebDataset
POV Types1st / 3rd / Multi-view

01Overview · a video data stream built for AI training pipelines

ENDATA's video delivery isn't a raw-video dump — every dataset comes with a machine-readable metadata schema, temporal alignment, and coordinate normalization. Customer ingestion scripts can consume it directly. Supports task-family targeting, cross-scene capture, POV control (egocentric / third-person), and configurable resolution and frame rate.

VIDEO FEEDS · VLA READY

Pre-cut, task-family-tagged video clips
delivered end-to-end to your private cloud

Not "collect, clean, then find a platform" — ENDATA keeps the complexity upstream and ships training-ready samples. From film originals to embodied data, one unified four-dimensional schema.

2.3B+
Video Clips
800TB+
Daily Bandwidth
120+
Task Families
POV FPS SCENE TASK

02Data categories · four sources covering the full video-data stack

Film & TV, social, e-commerce, embodied — each has unique training value. From narrative structure of content creation to causal action reasoning in egocentric POV, the four sources complement rather than overlap.

FILM · TV

Film & TV Video

Series, variety shows, movies and documentary clips. With IP tags, plot milestones, character annotations, emotion arcs.

Scale1.2M+ licensed clips
Narrative tagsNode-level
UseVideo generation
SOCIAL

Social Short Video

Short videos, livestream cuts, UGC. With author metadata, engagement data, topic tags, audio alignment.

Scale820M+ clips
Duration5s–90s
UseVideo understanding
E-COMMERCE

E-commerce Demos

Livestream selling, product demos, unboxing & reviews. With SKU alignment, price points, selling-point tags, conversion milestones.

Scale42M+ clips
SKU linkage5B+
UseE-comm Agent
EMBODIED NEW

Embodied AI Video

Egocentric POV, action trajectories, grasping demos, navigation paths. With joint angles, force feedback, task success markers.

ScaleContinuously expanding
Task families120+
UseVLA training

03Acquisition & curation · three steps to the training pipeline

Customers arrive with a training objective. ENDATA maps it to underlying sources and discovery filters, curates at high granularity, and delivers in a structured way. From Define to Deliver, every step is customizable.

01
DEFINE

Define task families
& scene dimensions

Customers arrive with a training objective — task family, scene, POV, duration, language. ENDATA maps these to source data and discovery filters, producing an executable acquisition strategy.

Task Family Scene Filter POV Control Duration
02
CURATE

High-granularity curation
& quality scoring

Multi-dimensional filtering on content tags, action semantics, image quality and licensing status — discarding low-value and high-risk samples, keeping only the high-ROI "right data" for training.

Quality Score Compliance Filter Semantic Tags Action Labels
03
DELIVER

Structured delivery
to your private cloud

Pre-cut clips with standardized metadata, exported as RLDS / LeRobot v3 / WebDataset / custom schema — directly pluggable into your training pipeline. Supports incremental updates and versioning.

RLDS LeRobot v3 WebDataset Custom Schema

04Delivery formats · drop-in with mainstream training pipelines

Supports three standard formats — RLDS (Reinforcement Learning Datasets), LeRobot v3, WebDataset — plus custom schemas. Your ingestion scripts consume it directly.

TFRecords-Based
RLDS · Reinforcement Learning Datasets
Google-originated RL dataset standard, packaged as TFRecords, natively loadable via TensorFlow Datasets.
# episode structure
{
  "observation": {
    "image": <video_frame>,
    "state": <joint_angles>,
  },
  "action": <action_vector>,
  "reward": <scalar>,
}
Parquet · MP4
LeRobot v3
HuggingFace LeRobot standard — Parquet metadata + MP4 video stored separately, supporting efficient streaming loads.
# episode layout
dataset/
├── meta/
│   └── info.json
├── data/
│   └── chunk-000.parquet
└── videos/
    └── episode_000.mp4
Tar · JSON
WebDataset
The go-to format for large-scale multimodal training — tar-packed with JSON metadata, high-throughput streaming via PyTorch DataLoader.
# tar structure
shard_000.tar:
  00001.mp4
  00001.json
  00002.mp4
  00002.json
Flexible · Extensible
Custom Schema
Customize fields, task labels and temporal alignment. ENDATA provides schema-design consulting and validation tooling.
# ingestion-friendly
{
  "schema_version": "2.0",
  "fields": ["video", "caption",
    "task_id", "pov", ...],
  "license_chain": [...]
}

05 · NEW 2026Embodied AI Datasets

Egocentric POV + action-labeled training data. Aligned with the 2026 commercial inflection in embodied intelligence, ENDATA launches four scene-family datasets — kitchen, household, navigation, manipulation — with expanding task-family coverage.

KITCHEN

Kitchen Scenes

Grasping, prepping, cooking, cleaning — complete kitchen task-family POV and action annotations.

HOUSEHOLD

Household Scenes

Tidying, cleaning, operating appliances, tending plants — diverse samples of everyday household tasks.

NAVIGATION

Navigation Scenes

Indoor/outdoor navigation, obstacle avoidance, path planning, target tracking — POV video streams.

MANIPULATION

Manipulation Scenes

Grasping, placement, assembly, tool use — fine manipulation task-family with full annotations.

Ready to start your video data training?

Tell us the training objective and task-family need — ENDATA can provide a small-scale sample evaluation plus full schema documentation.