The Neurally-Guided Shape Parser:
Grammar-based Labeling of 3D Shape Regions with Approximate Inference


R. Kenny Jones1      Aalia Habib1      Rana Hanocka2      Daniel Ritchie1     
1Brown University      2University of Chicago     

Paper | Code | Video | Supplemental

CVPR 2022





We present the Neurally-Guided Shape Parser (NGSP). NGSP frames 3D shape semantic segmentation as a label assignment problem over shape regions; in this paradigm, we show our approximate inference formulation improves performance over comparison methods that (i) use regions to group per-point predictions, (ii) use regions as a self-supervisory signal or (iii) assign labels to regions under alternative formulations.



Bibtex

@article{jones2022NGSP,
  title={The Neurally-Guided Shape Parser: Grammar-based Labeling of 3D Shape Regions with Approximate Inference},
  author={Jones, R. Kenny and Habib, Aalia and Hanocka, Rana and Ritchie, Daniel},
  journal={The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2022}
}


Abstract

We propose the Neurally-Guided Shape Parser (NGSP), a method that learns how to assign fine-grained semantic labels to regions of a 3D shape. NGSP solves this problem via MAP inference, modeling the posterior probability of a label assignment conditioned on an input shape with a learned likelihood function. To make this search tractable, NGSP employs a neural guide network that learns to approximate the posterior. NGSP finds high-probability label assignments by first sampling proposals with the guide network and then evaluating each proposal under the full likelihood.

We evaluate NGSP on the task of fine-grained semantic segmentation of manufactured 3D shapes from PartNet, where shapes have been decomposed into regions that correspond to part instance over-segmentations. We find that NGSP delivers significant performance improvements over comparison methods that (i) use regions to group per-point predictions, (ii) use regions as a self-supervisory signal or (iii) assign labels to regions under alternative formulations. Further, we show that NGSP maintains strong performance even with limited labeled data or noisy input shape regions. Finally, we demonstrate that NGSP can be directly applied to CAD shapes found in online repositories and validate its effectiveness with a perceptual study.




Problem Statement





The ability to semantically segment 3D shapes is important for numerous applications, that often demand that the parts detected be fine-grained and hierarchically organized, but producing such segmentations has proved a challenging task.

Recent work on 3D shape semantic segmentation has mainly focused on end-to-end approaches that operate over shape atoms, the lowest-level geometric entity in the input representation (below figure). These methods achieve impressive performance on many tasks, but often do not transfer well to domains with fine-grained label sets or when access to labeled data is limited.

Our method, the Neurally-Guided Shape Parser, attempts to address these limitations by reasoning over shape regions instead of shape atoms (above figure). By learning to label shape regions, NGSP simplifies its learning problem, shrinks the search space, and is able to reason over region-region relationships.




NGSP





The Neurally-Guided Shape Parser (NGSP) learns to assign fine-grained semantic labels (rightmost) to shape regions (leftmost). A guide network generates a set of proposed label assignments. The label assignments are sent through likelihood modules that evaluate the global coherence of each proposal. These terms are combined into a posterior probability which determines the final label assignment.

NGSP’s likelihood function is composed of learned modules that reason over either properties of the semantic labels of the grammar or properties of groups of regions implied by a given label assignment.



For each label in the grammar, the semantic label likelihood terms (geometry and layout) reason about different properties of shape regions that were assigned to that label. They both consume a (shape, label assignment) pair as input (left), and a separate PointNet++ binary classification network is learned for each label in the grammar. Each geometry network sees which regions of the input shape have been assigned to its label (e.g. chair back). Each layout network sees which regions of the input shape have been assigned to its child labels (e.g. chair back surface and chair back frame). These networks are trained in a binary classification paradigm, tasked with assessing whether the regions assigned to a given label form a valid instance of that label.



The region group likelihood term reasons about proper- ties of region groups implicitly formed when a labeling is assigned to an input shape. It also takes a (shape, label assignment) pair as input. For every region group, it creates a fully-connected graph, where nodes correspond to shape regions that are members of the group. The region network is modeled with a GCN and makes two predictions over each group: what is the best label for the group, and what percentage of area within the group belongs to that label.



While the search space over regions is much smaller than the search space over atoms, it is still computationally infeasible to exhaustively evaluate L on all possible label assignments to regions. To guide our search procedure towards good areas of the search space, we learn a guide network to locally approximate the posterior. The neural guide network is modeled with a multi-class classification PointNet++ architecture, and operates over individual shape regions, predicting the label for each region independently.


Fine-grained Semantic Segmentation





We evaluate NGSP's ability to assign fine-grained semantic labels to regions of 3D shapes. Our experiments use CAD manufactured objects form the PartNet dataset. As input, we over-segment each shape using the mesh components for each part instance in PartNet (Input Regions). We compare NGSP against related methods that (i) use regions to aggregate point predictions (PartNet, BAE-NET) (ii) incorporate regions into self-supervised training objectives (LEL) or (iii) assign labels to regions in alternative search-based formulations (LHSS).

Quantitatively, using semantic mean intersection over union as our evaluation metric, NGSP outperforms the comparison methods by a significant margin when labeled data is plentiful. Even when access to labeled data is limited, NGSP outperforms the next-best alternatives, in fact, NGSP’s performance with 10% of the labeled data outperforms any comparison method that has access to all labeled data by almost 10 absolute percentage points. We share qualitative comparisons across the categories from PartNet we experimented on below, where each semantic label is represented by a unique color.

Chair qualitative results:



Lamp qualitative results:



Table qualitative results:



Vase qualitative results:



Knife qualitative results:



Storage qualitative results:




NGSP on Unstructed Data





As NGSP requires a region decomposition as input, it can’t be directly applied to some types of unstructured data without the help of auxiliary methods.

We evaluate NGSP against alternative region labeling methods over unstructured input data, with regions created by an ACD procedure (example above), where only 10 labeled shapes are provided (limited data paradigm).

Within this paradigm, NGSP outperforms other approaches that learn to label ACD generated regions or ignore regions and label shape atoms directly, see the below graph.



As a byproduct of CAD modeling procedures, many ‘in the wild’ 3D shapes come with part instance over- segmentations. NGSP can segment such objects by treating each mesh connected component as a shape region.

To demonstrate this application, using regions sourced from connected components of ShapeNet meshes, we asked participants in a perceptual study to compare semantic segmentations produced by NGSP to those produced by alternative methods. For all conditions, we found that participants had a significant preference for the part labelings generated by NGSP (79.6% preferred over LHSS, and 79.1% preferred over Partnet).




Acknowledgements

We would like to thank the participants in our user study for their contribution to our research. We would also like to thank the anonymous reviewers for their helpful suggestions. Renderings of part cuboids and point clouds were produced using the Blender Cycles renderer. This work was funded in parts by NSF award #1941808 and a Brown University Presidential Fellowship. Daniel Ritchie is an advisor to Geopipe and owns equity in the company. Geopipe is a start-up that is developing 3D technology to build immersive virtual copies of the real world with applications in various fields, including games and architecture.