Chairs and Mugs

-- A Dataset for Object-Centric Scene Understanding
and Equivariance

Jiahui Lei¹ Congyue Deng² Karl Schmeckpeper¹ Leonidas Guibas² Kostas Daniilidis¹

University of Pennsylvania¹ Stanford University¹

Introduction

This dataset provides 3D scenes with repetitive objects from the same categories (chairs, mugs) under a variety of scene configurations, from the simplest case with all objects standing upwards on a ground plane, to the most challenging cases with diverse object poses and complex background contents. Such scenarios are common in many highly interactive real-world environments. And this dataset encourages the development of scene understanding methods that are:

object-centric, leveraging category-level information on repeating instances
robust to scene configuration changes
generalizable to unseen or even out-of-distribution scene configurations

The dataset comprises two subsets: a synthetic set for developing, training, and validating scene understanding methods, and a small real-world scan set for evaluations with the sim2real domain gap.

Dataset details

Synthetic scenes: Our synthetic dataset is simulated with SAPIEN. For the synthetic tabletop scenes, we place 4 synthetic depth cameras at the 4 corners of a table and place the objects in a bin at the center of the table, which is a common setup for tabletop manipulators. We simulate realistic IR sensor depth patterns with IR ray tracing and the mesh reconstruction is created by integrating 4 view depths via TSDF fusion. For the chair scenes, we use 8 static cameras with ideal depth (instead of IR ray tracing), because unlike tabletop scenes, real-world indoor scenes are usually captured with continuous scans which will result in smoother and better reconstruction.
Real scans: Our real dataset contains 240 reconstructions of real scenes containing challenging configurations and backgrounds. More data are collected for scenes with more complex configurations or that are harder to create in simulation environments.

Mugs Z	10	Mugs SO3	10	Mugs Pile	10
Mugs Tree	50	Mugs Others	50	Mugs Wild	50
Chairs Z	20	Chairs SO3	20	Chairs Pile	20

Publication

EFEM: Equivariant Neural Field Expectation Maximization for 3D Object Segmentation Without Scene Supervision
PDF | Code

Data Samples

Synthetic scenes

Mugs Z
Mugs SO3
Mugs Pile
Mugs Tree
Mugs Box
Mugs Shelf

Real scans

Mugs Z
Mugs SO3
Mugs Pile
Mugs Tree
Mugs Wild
Mugs Others

Chairs Z
Chairs SO3
Chairs Pile

Download

To download the synthetic scenes click here
To download the real scans click here

Citation

If you use the dataset or code please cite:

@inproceedings{Lei2023EFEM,
title={EFEM: Equivariant Neural Field Expectation Maximization for 3D Object Segmentation Without Scene Supervision},
author={Lei, Jiahui and Deng, Congyue and Schmeckpeper, Karl and Guibas, Leonidas and Daniilidis, Kostas},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
url={https://cis.upenn.edu/~leijh/projects/efem},
year={2023}
}