Scenario Design, Data Collection, and Human Feedback for Robot Foundation Models

| Role: AI Analyst @ NVIDIA, GEAR Robotics Lab
| Duration: Mar 2025 - Jul 2025

At NVIDIA, I worked on managing the quantity and quality of multimodal training data for Large Language Models (LLMs), Autonomous Vehicles (AV), and Vision Language Models (VLMs). Towards the end of my time there, I joined the GEAR robotics lab, where the team was developing NVIDIA’s latest robotics foundation model, Gr00t N1.5.

I collaborated closely with research scientists on scenario design, teleoperation data collection, and evaluation, ensuring that the data and model accurately represented real-world conditions, behaviors, and edge-cases. Strong focus was on denoising the data — deciding what factors should be varied or stay consistent, and which data should be included, excluded, or flagged so that models could learn more robustly and safely from complex environments. I learned to troubleshoot hardware and software issues, thinking critically about which sources to consult and what steps to follow when an arm motor broke, server crashed, or a data log entered the wrong data. I gained exposure to the vast world of hardware, robotics embodiment, and world models, and grew a stronger desire to contribute to the development of human-friendly AI systems.

Before working at the GEAR lab, I annotated and QA’ed multimodal datasets that included text, tables, images, and video, while defining labeling standards and resolving edge cases that affected model reliability. Through this process, I helped raise training data accuracy to near-complete consistency, while aligning a distributed team of over 60 annotators through clear documentation and shared guidelines. I also initiated a survey representing 200 annotators, in order to collect feedback on the data pipeline, workflow, and tooling system, synthesizing insights and proposing recommendations to directors, team leads, and tooling teams.

There are two main learnings that stuck with me through this experience:

Training data is essentially the model’s knowledge of the world. A model’s intelligence is fundamentally shaped by the data it is fed, and curating data is essentially an act of teaching and defining the world for the model’s eye and brain. This experience made me reflect on how we organize the world for AI systems, what signals we prioritize, and which dimensions of human experience — sensory, social, contextual — are currently represented and underrepresented in the data we give them.
Nothing beats a clear mission and hustle. Through employment at NVIDIA, I got to learn about the values the #1 valued company in the world runs by. NVIDIA is often referred to as the “biggest startup”, and fittingly, I experienced its scrappy culture where no resources are wasted, while getting to see communication structures across global stakeholders. I got to firsthand see how teams and processes are structured to maximize efficiency and impact, where massive data flows across the world everyday and is fed into intelligent systems determined to better our lives.

Scenario Design, Data Collection, and Human Feedback for Robot Foundation Models

Building Human-Centered AI Agents for Productivity and Collaboration (2022-25)

Eunkyo Jo