Tete Xiao

| Google Scholar | Github |
| Linkedin | CV | Twitter |

I co-founded Prompt AI and am currently serving as its CEO in the San Francisco Bay Area. See more information from our press release.

I received a Ph.D. degree from UC Berkeley, advised by Prof. Trevor Darrell. My research interests lie in the fields of computer vision, robotics and machine learning, with a focus on learning scalable representations via deep learning. I was also affiliated with Facebook AI Research (FAIR), where I was fortunate to work with Piotr Dollár and Ross Girshick.

Prior to UC Berkeley, I received a B.S. degree in Intelligence Science, summa cum laude, from Peking University (PKU) in 2019.

  News
  • [Oct. 2023] The press release about Prompt AI.
  • [Oct. 2023] Prompt AI is mentioned in an article in WIRED Magazine.
  • [Apr. 2023] Segment Anything (SAM) won the 2023 ICCV Best Paper Honorable Mention Award.
  • [Apr. 2023] We released our latest project in vision: Segment Anything, one of very few computer vision systems that just works in the real world.
  • [Mar. 2023] We released our latest project in robotics: Learning Humanoid Locomotion with Transformers, the first humanoid controlled by end-to-end neural networks.
  • [Oct. 2022] We released the project: Real-World Robot Learning with Masked Visual Pre-training
  • [Mar. 2022] We released the Masked Visual Pre-training for Motor Control (MVP)
  • ---- show more ----
  Publications

Learning Humanoid Locomotion with Transformers
Ilija Radosavovic*, Tete Xiao*, Bike Zhang*, Trevor Darrell,
Jitendra Malik, Koushil Sreenath
arXiv, 2023
*, : equal contribution, alphabetical order
| arXiv | project page |

We present a sim-to-real learning-based approach for real-world humanoid locomotion. To the best of our knowledge, this is the first demonstration of a fully learning-based method for real-world full-sized humanoid locomotion.

Real-World Robot Learning with Masked Visual Pre-training
Ilija Radosavovic*, Tete Xiao*, Stephen James, Pieter Abbeel,
Jitendra Malik, Trevor Darrell
Conference on Robot Learning (CoRL), 2022
Oral presentation
*, : equal contribution
| arXiv | project page | code |

We explore self-supervised visual pre-training on images from diverse, in-the-wild videos for real-world robotic tasks. We train a big vision transformer on a massive collection of images from the Internet and egocentric videos, and demonstrate clearly the benefits of scaling visual pre-training for robot learning.

Masked Visual Pre-training for Motor Control
Tete Xiao*, Ilija Radosavovic*, Trevor Darrell, Jitendra Malik
Tech report, 2022
*, : equal contribution
| arXiv | project page | code |

We show that self-supervised visual pre-training from real-world images is effective for learning motor control tasks from pixels.

Early Convolutions Help Transformers See Better
Tete Xiao, Mannat Singh, Eric Mintun, Trevor Darrell, Piotr Dollár*,
Ross Girshick*
Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS), 2021
*: equal contribution
| arXiv |

We analyze the substandard optimization behavior of ViT and propose a simple fix that dramatically increases optimization stability and also improves peak performance.

Region Similarity Representation Learning
Tete Xiao*, Colorado Reed*, Xiaolong Wang, Kurt Keutzer, Trevor Darrell
International Conference on Computer Vision (ICCV), 2021
*: equal contribution
| arXiv | code |

An approach to self-supervised representation learning for localization-based tasks such as object detection and segmentation.

What Should Not Be Contrastive in Contrastive Learning
Tete Xiao, Xiaolong Wang, Alexei A. Efros, Trevor Darrell
International Conference on Learning Representations (ICLR), 2021
| arXiv | video |

To contrast, or not to contrast, that is the question.

Learning Cross-domain Correspondence for Control with Dynamics Cycle-consistency
Qiang Zhang, Tete Xiao, Alexei A. Efros, Lerrel Pinto, Xiaolong Wang
International Conference on Learning Representations (ICLR), 2021
Oral presentation
| project page | arXiv | video |

Learning correspondence across domains differing in representation (vision vs. internal state), physics parameters, and morphology.

Something-Else: Compositional Action Recognition with Spatial-Temporal Interaction Networks
Joanna Materzynska, Tete Xiao, Roei Herzig, Huijuan Xu, Xiaolong Wang, Trevor Darrell
: equal advising
Conference on Computer Vision and Pattern Recognition (CVPR), 2020
| project page | arXiv | dataset |

Using Spatial-Temporal Interaction Networks (STIN) for compositional action recognition plus a new annotated dataset Something-else.

Reasoning About Human-Object Interactions Through Dual Attention Networks
Tete Xiao, Quanfu Fan, Dan Gutfreund, Mathew Monfort, Aude Oliva, Bolei Zhou
International Conference on Computer Vision (ICCV), 2019
| project page | arXiv |

Dual Attention Network model reasoning about human-object interactions.

Semantic Understanding of Scenes through the ADE20K Dataset
Bolei Zhou, Hang Zhao, Xavier Puig, Tete Xiao, Sanja Fidler, Adela Barriuso, Antonio Torralba
International Journal of Computer Vision (IJCV) 127, 302–321 (2019)
| project page | pdf | arXiv | pytorch model | demo |

ADE20K dataset with comprehensive analysis and applications.

Unified Perceptual Parsing for Scene Understanding
Tete Xiao*, Yingcheng Liu*, Bolei Zhou*, Yuning Jiang, Jian Sun
European Conference on Computer Vision (ECCV), 2018
*: equal contribution
| arXiv | code |

Pyramid-like parser UPerNet used for Unified Perceptual Parsing task to recognize as many visual concepts as possible from a given image.

Acquisition of Localization Confidence for Accurate Object Detection
Borui Jiang*, Ruixuan Luo*, Jiayuan Mao*, Tete Xiao, Yuning Jiang
European Conference on Computer Vision (ECCV), 2018
Oral presentation
*: equal contribution
| arXiv | code |

Dissecting object localization through IouNet and Precise RoI Pooling.

Learning Visually-grounded Semantics from Contrastive Adversarial Samples
Haoyue Shi*, Jiayuan Mao*, Tete Xiao*, Yuning Jiang, Jian Sun
International Conference on Computational Linguistics (COLING), 2018
*: equal contribution
| arXiv | code |

Constructing constrastive image-caption pairs for learning visually-grounded semantics.

MegDet: A Large Mini-Batch Object Detector
Chao Peng*, Tete Xiao*, Zeming Li*, Yuning Jiang, Xiangyu Zhang, Kai Jia, Gang Yu, Jian Sun
Conference on Computer Vision and Pattern Recognition (CVPR), 2018 (Spotlight)
*: equal contribution
| arXiv |

Scaling-up training of object detectors; winner of MSCOCO Challenge 2017.

Repulsion Loss: Detecting Pedestrians in a Crowd
Xinlong Wang, Tete Xiao, Yuning Jiang, Shuai Shao, Jian Sun, Chunhua Shen
Conference on Computer Vision and Pattern Recognition (CVPR), 2018
| arXiv |

The pedestrian detector that works better for crowd occlusion.

What Can Help Pedestrian Detection?
Jiayuan Mao*, Tete Xiao*, Yuning Jiang, Zhimin Cao
Conference on Computer Vision and Pattern Recognition (CVPR), 2017
| arXiv |

  Awards
  • MSCOCO Challenge, 2017
  • Snap Research Scholarship, 2019
  • China National Scholarship, Peking Univsity
  • Scholarship for the Outstanding Talented, Peking Univsity
  • Schlumberger Scholarship, Peking Univsity
  • Founder Group Scholarship, Peking Univsity
  • Gold Medals, ACM International Collegiate Programming Contest (ACM-ICPC) Asia Regional, 2016 & 2017
  • Bronze Medal, National Olympiad in Informatics (NOI), 2014
  • Champion, Shandong Province Team Selection Contest for NOI, 2014
  Service
Teaching Faculty, Practice in Programming (17-18 spring)

Teaching Faculty, Artificial Intelligence and Computer Vision (18-19 spring)
  Contact

Berkeley Artificial Intelligence Research Lab
Berkeley Way West, 2121 Berkeley Way
Berkeley, CA 94704


Website design:
Avatar photo: taken in Jerusalem in July 2019 by my good friend Yingcheng Liu.