Photo of Boyuan Chen

About Me

AI/ML Researcher & Robotics Enthusiast

I'm Boyuan Chen (陈博远). I am currently a second year PhD student at MIT EECS working with Prof. Vincent Sitzmann and Prof. Russ Tedrake. I am interested in AI decision makeing and robotics. In particular, how can we use both big data and rich structures to build AI powered robots that matches human intelligence? I hope to use my knowledge and passion to tackle the most important challenges in the world and free humanities with breakthrough technologies.

Before joining MIT, I obtained my bachelor’s degree in computer science and math at UC Berkeley, where I spent a signficant amount of time doing research at Berkeley Artificial Intelligence Research (BAIR) on deep reinforcement learning and unsupervised learning. I also spent a year studying philosophy during my undergrad. I am a big fan of chess, robots and boba.

My research

DittoGym: Learning to Control Soft Shape-Shifting Robots
Suning Huang, Boyuan Chen, Huazhe Xu, Vincent Sitzmann
ICLR 2024 (International Conference on Learning Representations)

website | paper | abstract | bibtex
  title={DittoGym: Learning to Control Soft Shape-Shifting Robots}, 
  author={Suning Huang and Boyuan Chen and Huazhe Xu and Vincent Sitzmann},

Robot co-design, where the morphology of a robot is optimized jointly with a learned policy to solve a specific task, is an emerging area of research. It holds particular promise for soft robots, which are amenable to novel manufacturing techniques that can realize learned morphologies and actuators. Inspired by nature and recent novel robot designs, we propose to go a step further and explore the novel reconfigurable robots, defined as robots that can change their morphology within their lifetime. We formalize control of reconfigurable soft robots as a highdimensional reinforcement learning (RL) problem. We unify morphology change, locomotion, and environment interaction in the same action space, and introduce an appropriate, coarse-to-fine curriculum that enables us to discover policies that accomplish fine-grained control of the resulting robots. We also introduce DittoGym, a comprehensive RL benchmark for reconfigurable soft robots that require fine-grained morphology changes to accomplish the tasks. Finally, we evaluate our proposed coarse-to-fine algorithm on DittoGym and demonstrate robots that learn to change their morphology several times within a sequence, uniquely enabled by our RL algorithm.

Spatial VLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities
Boyuan Chen, Zhuo Xu, Sean Kirmani, Brian Ichter, Danny Driess, Pete Florence, Dorsa Sadigh, Leonidas Guibas, Fei Xia
Arxiv 2024

website | paper | abstract | bibtex
  title={SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities}, 
  author={Boyuan Chen and Zhuo Xu and Sean Kirmani and Brian Ichter and Danny Driess and Pete Florence and Dorsa Sadigh and Leonidas Guibas and Fei Xia},

Understanding and reasoning about spatial relationships is a fundamental capability for Visual Question Answering (VQA) and robotics. While Vision Language Models (VLM) have demonstrated remarkable performance in certain VQA benchmarks, they still lack capabilities in 3D spatial reasoning, such as recognizing quantitative relationships of physical objects like distances or size differences. We hypothesize that VLMs' limited spatial reasoning capability is due to the lack of 3D spatial knowledge in training data and aim to solve this problem by training VLMs with Internet-scale spatial reasoning data. To this end, we present a system to facilitate this approach. We first develop an automatic 3D spatial VQA data generation framework that scales up to 2 billion VQA examples on 10 million real-world images. We then investigate various factors in the training recipe, including data quality, training pipeline, and VLM architecture. Our work features the first internet-scale 3D spatial reasoning dataset in metric space. By training a VLM on such data, we significantly enhance its ability on both qualitative and quantitative spatial VQA. Finally, we demonstrate that this VLM unlocks novel downstream applications in chain-of-thought spatial reasoning and robotics due to its quantitative estimation capability.

Self-Supervised Reinforcement Learning that Transfers using Random Features
Boyuan Chen, Chuning Zhu, Pulkit Agrawal, Kaiqing Zhang, Abhishek Gupta
NeurIPS 2023 (Conference of Neural Information Processing Systems)

website | paper | abstract | bibtex
  title={Self-Supervised Reinforcement Learning that Transfers using Random Features}, 
  author={Boyuan Chen and Chuning Zhu and Pulkit Agrawal and Kaiqing Zhang and Abhishek Gupta},

Reinforcement learning (RL) algorithms have the potential not only for synthesizing complex control behaviors, but also for transfer across tasks. Model-free RL excels in solving problems with high-dimensional observations or long horizons, but the learned policies do not transfer across different reward functions. Model-based RL, on the other hand, naturally enables transfer across different reward functions, but struggles in complex environments due to compounding error. In this work, we propose a new method for transferring behaviors across tasks with different rewards, combining the performance of model-free RL with the transferability of model-based RL. In particular, we show how model-free RL using a number of random features as the reward allows for implicit modeling of long-horizon environment dynamics. Model-predictive control using these implicit models enables fast adaptation to problems with new reward functions while avoiding the compounding error from model rollouts. Our method can be trained on offline datasets without reward labels, and quickly deployed on new tasks, making it more widely applicable than typical RL methods. We validate that our proposed method enables transfer across tasks on a variety of manipulation and locomotion domains.

Extraneousness-Aware Imitation Learning
Ray Chen Zheng, Kaizhe Hu, Zhecheng Yuan, Boyuan Chen, Huazhe Xu
ICRA 2023 (International Conference on Robotics and Automation)

website | paper | abstract | bibtex | talk video
  doi = {10.48550/ARXIV.2210.01379},
  url = {},
  author = {Zheng, Ray Chen and Hu, Kaizhe and Yuan, Zhecheng and Chen, Boyuan and Xu, Huazhe},
  keywords = {Robotics (cs.RO), Artificial Intelligence (cs.AI), FOS: Computer and information sciences, FOS: Computer and information sciences},
  title = {Extraneousness-Aware Imitation Learning},
  publisher = {arXiv},
  year = {2022},
  copyright = { perpetual, non-exclusive license}

Visual imitation learning provides an effective framework to learn skills from demonstrations. However, the quality of the provided demonstrations usually significantly affects the ability of an agent to acquire desired skills. Therefore, the standard visual imitation learning assumes near-optimal demonstrations, which are expensive or sometimes prohibitive to collect. Previous works propose to learn from noisy demonstrations; however, the noise is usually assumed to follow a context-independent distribution such as a uniform or gaussian distribution. In this paper, we consider another crucial yet underexplored setting — imitation learning with task-irrelevant yet locally consistent segments in the demonstrations (e.g., wiping sweat while cutting potatoes in a cooking tutorial). We argue that such noise is common in real world data and term them as “extraneous” segments. To tackle this problem, we introduce Extraneousness-Aware Imitation Learning (EIL), a self-supervised approach that learns visuomotor policies from third-person demonstrations with extraneous subsequences. EIL learns action-conditioned observation embeddings in a self-supervised manner and retrieves task-relevant observations across visual demonstrations while excluding the extraneous ones. Experimental results show that EIL outperforms strong baselines and achieves comparable policies to those trained with perfect demonstration on both simulated and real-world robot control tasks.

Open-vocabulary Queryable Scene Representations for Real World Planning
Boyuan Chen, Fei Xia, Brian Ichter, Kanishka Rao, Keerthana Gopalakrishnan, Michael S. Ryoo, Austin Stone, Daniel Kappler
ICRA 2023 (International Conference on Robotics and Automation)

website | paper | abstract | bibtex | talk video
    title={Open-vocabulary Queryable Scene Representations for Real World Planning},
    author={Boyuan Chen and Fei Xia and Brian Ichter and Kanishka Rao and Keerthana Gopalakrishnan and Michael S. Ryoo and Austin Stone and Daniel Kappler
    booktitle={arXiv preprint arXiv:2209.09874},

Large language models (LLMs) have unlocked new capabilities of task planning from human instructions. However, prior attempts to apply LLMs to real-world robotic tasks are limited by the lack of grounding in the surrounding scene. In this paper, we develop NLMap, an open-vocabulary and queryable scene representation to address this problem. NLMap serves as a framework to gather and integrate contextual information into LLM planners, allowing them to see and query available objects in the scene before generating a context-conditioned plan. NLMap first establishes a natural language queryable scene representation with Visual Language models (VLMs). An LLM based object proposal module parses instructions and proposes involved objects to query the scene representation for object availability and location. An LLM planner then plans with such information about the scene. NLMap allows robots to operate without a fixed list of objects nor executable options, enabling real robot operation unachievable by previous methods.

Unsupervised Learning of Visual 3D Keypoints for Control
Boyuan Chen, Pieter Abbeel, Deepak Pathak
ICML 2021 (International Conference on Machine Learning)

website | paper | abstract | bibtex | code | talk video
    author = {Chen, Boyuan and Abbeel, Pieter and Pathak, Deepak},
    title  = {Unsupervised Learning of Visual 3D Keypoints for Control},
    journal= {ICML},
    year   = {2021}

Learning sensorimotor control policies from high-dimensional images crucially relies on the quality of the underlying visual representations. Prior works show that structured latent space such as visual keypoints often outperforms unstructured representations for robotic control. However, most of these representations, whether structured or unstructured are learned in a 2D space even though the control tasks are usually performed in a 3D environment. In this work, we propose a framework to learn such a 3D geometric structure directly from images in an end-to-end unsupervised manner. The input images are embedded into latent 3D keypoints via a differentiable encoder which is trained to optimize both a multi-view consistency loss and downstream task objective. These discovered 3D keypoints tend to meaningfully capture robot joints as well as object movements in a consistent manner across both time and 3D space. The proposed approach outperforms prior state-of-art methods across a variety of reinforcement learning benchmarks.

Zero-shot Policy Learning with Spatial Temporal Reward Decomposition on Contingency-aware Observation
Boyuan Chen*, Huazhe Xu*, Yang Gao and Trevor Darrell
ICRA 2021 (International Conference on Robotics and Automation)

website | paper | abstract | bibtex | code |
    author    = {Huazhe Xu and
                    Boyuan Chen and
                    Yang Gao and
                    Trevor Darrell},
    title     = {Scoring-Aggregating-Planning: Learning task-agnostic priors from interactions
                    and sparse rewards for zero-shot generalization},
    journal   = {CoRR},
    volume    = {abs/1910.08143},
    year      = {2019},
    url       = {},
    eprinttype = {arXiv},
    eprint    = {1910.08143},
    timestamp = {Fri, 27 Nov 2020 15:04:16 +0100},
    biburl    = {},
    bibsource = {dblp computer science bibliography,}

It is a long-standing challenge to enable an intelligent agent to learn in one environment and generalize to an unseen environment without further data collection and finetuning. In this paper, we consider a zero shot generalization problem setup that complies with biological intelligent agents' learning and generalization processes. The agent is first presented with previous experiences in the training environment, along with task description in the form of trajectory-level sparse rewards. Later when it is placed in the new testing environment, it is asked to perform the task without any interaction with the testing environment. We find this setting natural for biological creatures and at the same time, challenging for previous methods. Behavior cloning, state-of-art RL along with other zero-shot learning methods perform poorly on this benchmark. Given a set of experiences in the training environment, our method learns a neural function that decomposes the sparse reward into particular regions in a contingency-aware observation as a per step reward. Based on such decomposed rewards, we further learn a dynamics model and use Model Predictive Control (MPC) to obtain a policy. Since the rewards are decomposed to finer-granularity observations, they are naturally generalizable to new environments that are composed of similar basic elements. We demonstrate our method on a wide range of environments, including a classic video game -- Super Mario Bros, as well as a robotic continuous control task. Please refer to the project page for more visualized results.

Discovering Diverse Multi-agent Strategic Behavior via Reward Randomization
Zhenggang Tang, Chao Yu, Boyuan Chen, Huazhe Xu, Xiaolong Wang, Fei Fang, Simon Shaolei Du, Yu Wang, Yi Wu
ICLR 2021 (International Conference on Learning Representations)

website | paper | abstract | bibtex | code |
    title={Discovering Diverse Multi-Agent Strategic Behavior via Reward Randomization}, 
    author={Zhenggang Tang and Chao Yu and Boyuan Chen and Huazhe Xu and Xiaolong Wang and Fei Fang and Simon Du and Yu Wang and Yi Wu},

We propose a simple, general and effective technique, Reward Randomization for discovering diverse strategic policies in complex multi-agent games. Combining reward randomization and policy gradient, we derive a new algorithm, Reward-Randomized Policy Gradient (RPG). RPG is able to discover multiple distinctive human-interpretable strategies in challenging temporal trust dilemmas, including grid-world games(MonsterHunt and Escalation) and a real-world web game, where multiple equilibria exist but standard multi-agent policy gradient algorithms always converge to a fixed one with a sub-optimal payoff for every player even using state-of-the-art exploration techniques (including RND, DIAYN, MAVEN). Furthermore, with the set of diverse strategies from RPG, we can (1) achieve higher payoffs by fine-tuning the best policy from the set; and (2) obtain an adaptive agent by using this set of strategies as its training opponents.


  • Robots
  • Cooking
  • Teams
Robomooc Robotics Kit

I designed it with my friend, Kinsky. We sold it as an education kit to schools. You can ride on it!

Robomaster ICRA challenge

DJI robomaster robot for ICRA AI Challenge. During my undergrad, I was the captain of the team, leading the development of autonomous algorithms in the robot shooting challenge.

Autonomous Bogie Rover

My personal robot that can handle a variety of terrains. I did everything from machanical design, electronics to programming. It uses computer vision to autonomously follow me and avoid obstables.

FRC 2017 Robot

In 2017, I founded my high school's first FRC team. We didn't have the mentorship nor funding we need, but the team did amazing. I did the majority of the design.

PR2 in RLL

In 2021, I graduated from UC Berkeley, where I spent some amazing time doing research in robotics learning lab.

Autonomous Drone

An autonomous drone which I built and coded. I installed a camera a mini railgun on it to track and aim at the target I select.

FTC 2017 Robot

Our FTC competition robot in 2017, when I became the captain of the team. It's my team's first robot designed with CAD. The robot won the east China regional.

My first ftc robot

In 2016, I participanted in robotics competition for the first time. This is a super cool robot which marks the beginning of my robotics journey.

FRC 2018 Robot

After my graduation from high school, I continued mentoring the team. My successor Xinpei designed the robot under my mentorship.

Robomaster Team

In 2019, I was the captain of Berkeley's team in ICRA Robomaster AI Challenge. I co-founded the team and lead 20 student developing autonomous robots.

MIT Chess Club

I became one of the execs at MIT Chess Club in 2022. It was a great time to organize events and hangout with the team!

FRC Team

In 2017, I founded my high school's first FRC team. I worked as both captain and mentor. We won the Rookie All Star Award at CRC 2017.

Chinese New Year 2022

To celebrate Chinese New Year 2022, I made a big dinner with my friend Maohao Shen at MIT. MITCSSA awarded me the title Master Chef MIT for my Peking duck in their cooking competition.

Home Style Noodle with Braised Chicken

I cooked 黄焖鸡 during COVID-19 quarantine!

Soy sauce braised pork

I made Dongporou (东坡肉) during Thanksgiving 2023. The best Soy sauce braised pork I've every made!

Chicken Soup with Mushroom

Traditional Chinese chicken soup with dried matsutake mushroom.

Chinese New Year 2023

I cooked 5 dished for 2023 Chinese New Year. All of them are amazing! The dishes are steam eel, egg plant with minced meat, soy sauce braised pork with bamboo shoots, chinese chicken soup with bamboo-mushroom and stir-fried Chinese chives.

Birthday Noodle (长寿面)

I made my roommate and long time friend Haoyuan a bowl of traditional birthday noodle in 2021, when he turned 23.

Potato Braised Beef Brisket

I cooked beef brisket (土豆炖排骨) in COVID-19 lockdown.

Lamb Croutons

During the COVID-19 pandemic, I tried to make Lamb Croutons following Gordon Ramsay's tutorial.

XO sauce Tofu Stew with mushrooms

Tofu stew cooked with various mushrooms and XO sauce. The Umami flavor will burst into your mouse - it's finished in 2 minutes by all my friends.