Learning to Generate Programs for 3D Shape Structure Synthesis

R. Kenny Jones1      Theresa Barton1      Xianghao Xu1      Kai Wang1     
Ellen Jiang1      Paul Guerrero2      Niloy J. Mitra2,3      Daniel Ritchie1     
1Brown University      2Adobe Research      3University College London     

Paper | Code | Supplemental
Siggraph Asia 2020

We present a deep generative model which learns to write novel programs in ShapeAssembly, a domain-specific language for modeling 3D shape structures. Executing a ShapeAssembly program produces a shape composed of a hierarchical connected assembly of part proxies cuboids. Our method develops a well-formed latent space that supports interpolations between programs. Above, we show one such interpolation, and also visualize the geometry these programs produce when executed. In the last column, we manually edit the continuous parameters of a generated program, in order to produce a variant geometric structure with new topology


Manually authoring 3D shapes is difficult and time consuming; generative models of 3D shapes offer compelling alternatives. Procedural representations are one such possibility: they offer high-quality and editable results but are difficult to author and often produce outputs with limited diversity. On the other extreme are deep generative models: given enough data, they can learn to generate any class of shape but their outputs have artifacts and the representation is not editable.

In this paper, we take a step towards achieving the best of both worlds for novel 3D shape synthesis. First, we propose ShapeAssrmbly, a domain-specific "assembly-language'' for 3D shape structures. ShapeAssembly programs construct shape structures by declaring cuboid part proxies and attaching them to one another, in a hierarchical and symmetrical fashion. ShapeAssembly functions are parameterized with continuous free variables, so that one program structure is able to capture a family of related shapes. We show how to extract ShapeAssembly programs from existing shape structures in the PartNet dataset. Then we train a deep generative model, a hierarchical sequence VAE, that learns to write novel ShapeAssembly programs. Our approach leverages the strengths of each representation: the program captures the subset of shape variability that is interpretable and editable, and the deep generative model captures variability and correlations across shape collections that is hard to express procedurally.

We evaluate our approach by comparing shapes output by our generated programs to those from other recent shape structure synthesis models. We find that our generated shapes are more plausible and physically-valid than those of other methods. Additionally, we assess the latent spaces of these models, and find that ours is better structured and produces smoother interpolations. As an application, we use our generative model and differentiable program interpreter to infer and fit shape programs to unstructured geometry, such as point clouds.

ShapeAssembly DSL

ShapeAssembly is a low-level domain-specific “assembly language” for shape structure. A program consists of Cuboid statements which instantiate new geometry and attach statements which connect these geometries together at specified points on their surfaces. Macro functions (reflect, translate, squeeze) form complex spatial relationships by expanding into multiple Cuboid and attach statements.

Above, we illustrate how the ShapeAssembly interpreter incrementally constructs shapes by imperatively executing program commands. Cuboids are instantiated at the origin and are moved through attachment. Notice how the reflect command in line 6 acts as a macro function, creating a new cuboid and two new attachments.

Below we show the straightforward extension of this imperative execution to hierarchical programs: we represent hierarchical shapes by treating select non-leaf cuboids as the bounding box of another program.

Shape Generation

In the middle row, we show samples from our generative model of ShapeAssembly programs. In the top row, we show the nearest neighbor shape in the training set by Chamfer distance. In the bottom row, we show the nearest neighbor shape in the training set by program edit distance. Our method synthesizes interesting and high-quality structures that go beyond direct structural or geometric memorization.

Shape Editing

Programs, by way of representational form, allow for easy semantic editing of generated output. Each column shows a sample from our model in the top row. In the bottom row we create a variant with the same structure, but different geometry, by editing only the continuous parameters of the program.

Shape Interpolation

A qualitative comparison of latent space interpolation between our method and StructureNet on shapes from the validation set. Our method’s interpolations within program space produce sequences that combine smooth continuous variation with discrete structural transitions.

Synthesis from Unstructured Geometry

Above, we show a qualitative comparison of synthesis from point clouds of our method against StructureNet (SN). Our method is able to infer good program structures that match well with the unstructured geometry. The continuous parameters of this program structure can be further refined through an optimization procedure in order to better fit the target point cloud without creating artifacts. We show a dynamic version of this optimization in the below gif.


  title={ShapeAssembly: Learning to Generate Programs for 3D Shape Structure Synthesis},
  author={Jones, R. Kenny and Barton, Theresa and Xu, Xianghao and Wang, Kai and Jiang, Ellen and Guerrero, Paul and Mitra, Niloy J. and Ritchie, Daniel},
  journal={ACM Transactions on Graphics (TOG), Siggraph Asia 2020},
  pages={Article 234},


We would like to thank the anonymous reviewers for their helpful suggestions. Renderings of part cuboids and point clouds were produced using the Blender Cycles renderer. This research was supported by the National Science Foundation (#1753684, #1941808), a Brown University Presidential Fellowship, gifts from the University of College London AI Center and Adobe Research, and by GPU donations from NVIDIA. Daniel Ritchie is an advisor to Geopipe, Inc. and owns equity in the company. Geopipe is a start-up that is developing 3D technology to build immersive virtual copies of the real world with applications in various fields, including games and architecture.