Accepted Papers

Grammatically Recognizing Images with Tree Convolution

Guangrun Wang: Sun Yat-sen University; Guangcong Wang: Sun Yat-Sen University; Keze Wang: University of California, Los Angeles; Xiaodan Liang: Sun Yat-sen University; Liang Lin: Sun Yat-Sen University


Similar to language, understanding an image can be considered as a hierarchical decomposition process from scenes to objects, parts, pixels, and the corresponding spatial/contextual relations. However, the existing convolutional networks concentrate on stacking redundant convolutional layers with a large number of kernels in a hierarchical organization to implicitly approximate this decomposition. This may limit the network to learn the semantic information conveyed in the internal feature maps that may reveal minor yet crucial differences for visual understanding. Attempting to tackle this problem, this paper proposes a simple yet effective tree convolution (TreeConv) operation for deep neural networks. Specifically, inspired by the image grammar techniques[73] that serve as a unified framework of object representation, learning, and recognition, our TreeConv designs a generative image grammar, i.e., tree generation rule, to parse the hierarchy of internal feature maps by generating tree structures and implicitly learning the specific visual grammars for each object category. Extensive experiments on a variety of benchmarks, i.e., classification (ImageNet / CIFAR), detection & segmentation (COCO 2017), and person re-identification (CUHK03), demonstrate the superiority of our TreeConv in both boosting the accuracy and reducing the computational cost. The source code will be available at:

How can we assist you?

We'll be updating the website as information becomes available. If you have a question that requires immediate attention, please feel free to contact us. Thank you!

Please enter the word you see in the image below: