publications

For an up-to-date list, please see my Google Scholar.

2026

ZeroBot: Learning From Scratch in Minutes With Generative Real2Sim

Ivan Kapelyukh, Xiaohan Zhang, Stephen James, Laura Herlant, and Edward Johns

IEEE Robotics and Automation Letters, 2026

PDF
A retrieval-augmented framework enabling VLM spatial awareness for object-centric robot manipulation

Kai Chen, Chengkun Li, Chang Tu, Jiahui Pan, Yiyao Ma, Wei Chen, Zhongxiang Zhou, Xuecheng Xu, Stephen James, Chi-Wing Fu, Rong Xiong, Pieter Abbeel, Yun-Hui Liu, and Qi Dou

Science Robotics, 2026

PDF

2024

Generative Image as Action Models

Mohit Shridhar, Yat Long Lo, and Stephen James

Conference on Robot Learning, 2024

PDF
Green Screen Augmentation Enables Scene Generalisation in Robotic Manipulation

Eugene Teoh, Sumit Patidar, Xiao Ma, and Stephen James

arXiv preprint arXiv:2407.07868, 2024

PDF
BiGym: A Demo-Driven Mobile Bi-Manual Manipulation Benchmark

Nikita Chernyadev, Nicholas Backshall, Xiao Ma, Yunfan Lu, Younggyo Seo, and Stephen James

Conference on Robot Learning, 2024

PDF
Continuous Control with Coarse-to-fine Reinforcement Learning

Younggyo Seo, Jafar Uruç, and Stephen James

Conference on Robot Learning, 2024

PDF
Render and Diffuse: Aligning Image and Action Spaces for Diffusion-based Behaviour Cloning

Vitalis Vosylius, Younggyo Seo, Jafar Uruç, and Stephen James

Robotics: Science and Systems, 2024

PDF
Redundancy-aware Action Spaces for Robot Learning

Pietro Mazzaglia, Nicholas Backshall, Xiao Ma, and Stephen James

IEEE Robotics and Automation Letters, 2024

PDF
Hierarchical Diffusion Policy for Kinematics-Aware Multi-Task Robotic Manipulation

Xiao Ma, Sumit Patidar, Iain Haughton, and Stephen James

Conference on Computer Vision and Pattern Recognition, 2024

PDF
Vision Foundation Model Enables Generalizable Object Pose Estimation

Kai Chen, Yiyao Ma, Xingyu Lin, Stephen James, Jianshu Zhou, Yun-Hui Liu, Pieter Abbeel, and Qi Dou

Conference on Neural Information Processing Systems, 2024

PDF

2023

Language-conditioned path planning

Amber Xie, Youngwoon Lee, Pieter Abbeel, and Stephen James

Conference on Robot Learning, 2023

PDF
Speed Co-Augmentation for Unsupervised Audio-Visual Pre-training

Jiangliu Wang, Jianbo Jiao, Yibing Song, Stephen James, Zhan Tong, Chongjian Ge, Pieter Abbeel, and Yun-Hui Liu

arXiv preprint arXiv:2309.13942, 2023

PDF
Multi-view masked world models for visual robotic manipulation

Younggyo Seo, Junsu Kim, Stephen James, Kimin Lee, Jinwoo Shin, and Pieter Abbeel

International Conference on Machine Learning, 2023

PDF
Language reward modulation for pretraining reinforcement learning

Ademi Adeniji, Amber Xie, Carmelo Sferrazza, Younggyo Seo, Stephen James, and Pieter Abbeel

arXiv preprint arXiv:2308.12270, 2023

PDF
Temporally consistent transformers for video generation

Wilson Yan, Danijar Hafner, Stephen James, and Pieter Abbeel

International Conference on Machine Learning, 2023

PDF
Stereopose: Category-level 6d transparent object pose estimation from stereo images via back-view nocs

Kai Chen, Stephen James, Congying Sui, Yun-Hui Liu, Pieter Abbeel, and Qi Dou

IEEE International Conference on Robotics and Automation, 2023

PDF

2022

Sim-to-Real via Sim-to-Seg: End-to-end Off-road Autonomous Driving Without Real Data

John So, Amber Xie, Sunggoo Jung, Jeffrey Edlund, Rohan Thakker, Ali Agha-mohammadi, Pieter Abbeel, and Stephen James

Conference on Robot Learning, 2022

PDF
Real-World Robot Learning with Masked Visual Pre-training

Ilija Radosavovic, Tete Xiao, Stephen James, Pieter Abbeel, Jitendra Malik, and Trevor Darrell

Conference on Robot Learning, 2022

PDF
Masked World Models for Visual Control

Younggyo Seo, Danijar Hafner, Hao Liu, Fangchen Liu, Stephen James, Kimin Lee, and Pieter Abbeel

Conference on Robot Learning, 2022

PDF
Reinforcement learning with action-free pre-training from videos

Younggyo Seo, Kimin Lee, Stephen L James, and Pieter Abbeel

International Conference on Machine Learning, 2022

PDF
Patch-based Object-centric Transformers for Efficient Video Generation

Wilson Yan, Ryo Okumura, Stephen James, and Pieter Abbeel

arXiv preprint arXiv:2206.04003, 2022

PDF
Auto-Lambda: Disentangling Dynamic Task Relationships

Shikun Liu, Stephen James, Andrew J Davison, and Edward Johns

Transactions on Machine Learning Research, 2022

PDF
On the Effectiveness of Fine-tuning Versus Meta-reinforcement Learning

Zhao Mandi, Pieter Abbeel, and Stephen James

Conference on Neural Information Processing Systems, 2022

PDF
Coarse-to-fine Q-attention with Tree Expansion

Stephen James, and Pieter Abbeel

arXiv preprint arXiv:2204.12471, 2022

PDF
Coarse-to-Fine Q-attention with Learned Path Ranking

Stephen James, and Pieter Abbeel

arXiv preprint arXiv:2204.01571, 2022

PDF
ReorientBot: Learning Object Reorientation for Specific-Posed Placement

Kentaro Wada, Stephen James, and Andrew J Davison

IEEE International Conference on Robotics and Automation, 2022

PDF
SafePicking: Learning safe object extraction via object-level mapping

Kentaro Wada, Stephen James, and Andrew J Davison

IEEE International Conference on Robotics and Automation, 2022

PDF
Sim-to-Real 6D Object Pose Estimation via Iterative Self-training for Robotic Bin-picking

Kai Chen, Rui Cao, Stephen James, Yichuan Li, Yun-Hui Liu, Pieter Abbeel, and Qi Dou

European Conference on Computer Vision, 2022

PDF
HARP: Autoregressive Latent Video Prediction with High-Fidelity Image Generator

Younggyo Seo, Kimin Lee, Fangchen Liu, Stephen James, and Pieter Abbeel

IEEE International Conference on Image Processing, 2022

PDF
Bingham Policy Parameterization for 3D Rotations in Reinforcement Learning

Stephen James, and Pieter Abbeel

arXiv preprint arXiv:2202.03957, 2022

PDF
Coarse-to-Fine Q-attention: Efficient Learning for Visual Robotic Manipulation via Discretisation

Stephen James, Kentaro Wada, Tristan Laidlow, and Andrew J Davison

Conference on Computer Vision and Pattern Recognition, 2022

PDF
Q-attention: Enabling Efficient Learning for Vision-based Robotic Manipulation

Stephen James, and Andrew J Davison

IEEE Robotics and Automation Letters, 2022

PDF

2021

End-to-End Egospheric Spatial Memory

Daniel James Lenton, Stephen James, Ronald Clark, and Andrew Davison

International Conference on Learning Representations, 2021

PDF
SIMstack: A Generative Shape and Instance Model for Unordered Object Stacks

Zoe Landgraf, Raluca Scona, Tristan Laidlow, Stephen James, Stefan Leutenegger, and Andrew J Davison

IEEE International Conference on Computer Vision, 2021

PDF
Waypoint Planning Networks

Alexandru-Iosif Toma, Hussein Ali Jaafar, Hao-Ya Hsueh, Stephen James, Daniel Lenton, Ronald Clark, and Sajad Saeedi

Conference on Robots and Vision, 2021

PDF

2020

MoreFusion: Multi-object Reasoning for 6D Pose Estimation from Volumetric Fusion

Kentaro Wada, Edgar Sucar, Stephen James, Daniel Lenton, and Andrew J Davison

Conference on Computer Vision and Pattern Recognition, 2020

PDF
RLBench: The Robot Learning Benchmark & Learning Environment

Stephen James, Zicong Ma, David Rovick Arrojo, and Andrew J Davison

IEEE Robotics and Automation Letters, 2020

PDF
Learning One-Shot Imitation from Humans without Humans

Alessandro Bonardi, Stephen James, and Andrew J Davison

IEEE Robotics and Automation Letters, 2020

PDF

2019

Pyrep: Bringing V-Rep to Deep Robot Learning

Stephen James, Marc Freese, and Andrew J Davison

arXiv preprint arXiv:1906.11176, 2019

PDF
Sim-to-Real via Sim-to-Sim: Data-efficient Robotic Grasping via Randomized-to-Canonical Adaptation Networks

Stephen James, Paul Wohlhart, Mrinal Kalakrishnan, Dmitry Kalashnikov, Alex Irpan, Julian Ibarz, Sergey Levine, Raia Hadsell, and Konstantinos Bousmalis

Conference on Computer Vision and Pattern Recognition, 2019

PDF

2018

Task-Embedded Control Networks for Few-Shot Imitation Learning

Stephen James, Michael Bloesch, and Andrew J Davison

Conference on Robot Learning, 2018

PDF
Sim-to-Real Reinforcement Learning for Deformable Object Manipulation

Jan Matas, Stephen James, and Andrew J Davison

Conference on Robot Learning, 2018

PDF

2017

Transferring End-to-End Visuomotor Control from Simulation to Real World for a Multi-Stage Task

Stephen James, Andrew J Davison, and Edward Johns

Conference on Robot Learning, 2017

PDF

2016

3D Simulation for Robot Arm Control with Deep Q-Learning

Stephen James, and Edward Johns

NeurIPS 2016 Workshop (Deep Learning for Action and Interaction), 2016

PDF