2026

HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts
HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts

Xinyu Liu, Yingqing He, Lanqing Guo, Xiang Li, Bu Jin, Peng Li, Yan Li, Chi-Min Chan, Qifeng Chen, Wei Xue, Wenhan Luo, Qifeng Liu, Yike Guo

International Journal of Computer Vision (IJCV) 2026

We propose HiPrompt, a new tuning-free solution that tackles object repetition and structural artifacts in higher-resolution generation by introducing hierarchical prompts.

HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts
HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts

Xinyu Liu, Yingqing He, Lanqing Guo, Xiang Li, Bu Jin, Peng Li, Yan Li, Chi-Min Chan, Qifeng Chen, Wei Xue, Wenhan Luo, Qifeng Liu, Yike Guo

International Journal of Computer Vision (IJCV) 2026

VistaBot: View-Robust Robot Manipulation via Spatiotemporal-Aware View Synthesis
VistaBot: View-Robust Robot Manipulation via Spatiotemporal-Aware View Synthesis

Songen Gu, Yuhang Zheng, Weize Li, Yupeng Zheng, Yating Feng, Xiang Li, Yilun Chen, Pengfei Li, Wenchao Ding

IEEE International Conference on Robotics and Automation (ICRA) 2026

We propose VistaBot, a novel framework that integrates feed-forward geometric models with video diffusion models to achieve view-robust closed-loop manipulation without requiring camera calibration at test time.

VistaBot: View-Robust Robot Manipulation via Spatiotemporal-Aware View Synthesis
VistaBot: View-Robust Robot Manipulation via Spatiotemporal-Aware View Synthesis

Songen Gu, Yuhang Zheng, Weize Li, Yupeng Zheng, Yating Feng, Xiang Li, Yilun Chen, Pengfei Li, Wenchao Ding

IEEE International Conference on Robotics and Automation (ICRA) 2026

2025

Enhancing Indoor Occupancy Prediction via Sparse Query-Based Multi-Level Consistent Knowledge Distillation
Enhancing Indoor Occupancy Prediction via Sparse Query-Based Multi-Level Consistent Knowledge Distillation

Xiang Li, Yupeng Zheng, Pengfei Li, Yilun Chen, Ya-Qin Zhang, Wenchao Ding

IEEE Robotics and Automation Letters (RA-L) 2025

We pioneer a hierarchical distillation strategy that establishes coordinated knowledge transfer between teacher and student models and progressively incorporates guidance information, specifically designed for sparse query-based occupancy prediction.

Enhancing Indoor Occupancy Prediction via Sparse Query-Based Multi-Level Consistent Knowledge Distillation
Enhancing Indoor Occupancy Prediction via Sparse Query-Based Multi-Level Consistent Knowledge Distillation

Xiang Li, Yupeng Zheng, Pengfei Li, Yilun Chen, Ya-Qin Zhang, Wenchao Ding

IEEE Robotics and Automation Letters (RA-L) 2025

Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving
Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving

Xiang Li, Pengfei Li, Yupeng Zheng, Wei Sun, Yan Wang, Yilun Chen

International Conference on Learning Representations (ICLR) 2025

Our semi-supervised 3D occupancy world model, featuring 2D rendering supervision and an end-to-end architecture, can forecast future occupancy straightly from image inputs while taking advantage of 2D labels.

Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving
Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving

Xiang Li, Pengfei Li, Yupeng Zheng, Wei Sun, Yan Wang, Yilun Chen

International Conference on Learning Representations (ICLR) 2025

2024

MonoOcc: Digging into Monocular Semantic Occupancy Prediction
MonoOcc: Digging into Monocular Semantic Occupancy Prediction

Yupeng Zheng*, Xiang Li*, Pengfei Li, Yuhang Zheng, Bu Jin, Chengliang Zhong, Xiaoxiao Long, Hao Zhao, Qichao Zhang(* equal contribution)

IEEE International Conference on Robotics and Automation (ICRA) 2024

By proposing a distillation module to transfer temporal information and richer knowledge to the monocular branch from a privileged branch, we increase the performance of the framework especially on small and long-tailed objects, while striking a balance between performance and efficiency.

MonoOcc: Digging into Monocular Semantic Occupancy Prediction
MonoOcc: Digging into Monocular Semantic Occupancy Prediction

Yupeng Zheng*, Xiang Li*, Pengfei Li, Yuhang Zheng, Bu Jin, Chengliang Zhong, Xiaoxiao Long, Hao Zhao, Qichao Zhang(* equal contribution)

IEEE International Conference on Robotics and Automation (ICRA) 2024