Tsinghua Science and Technology  2019, Vol. 24 Issue (2): 207-215    doi: 10.26599/TST.2018.9010044
CasNet: A Cascade Coarse-to-Fine Network for Semantic Segmentation
Zhenyang Wang, Zhidong Deng*, Shiyao Wang
∙ Zhenyang Wang, Zhidong Deng, and Shiyao Wang are with the Department of Computer Science, Tsinghua University, Beijing 100084, China. E-mail: crazycry2010@gmail.com; sy-wang14@mails.tsinghua.edu.cn.

Abstract

Semantic segmentation is a fundamental topic in computer vision. Since it is required to make dense predictions for an entire image, a network can hardly achieve good performance on various kinds of scenes. In this paper, we propose a cascade coarse-to-fine network called CasNet, which focuses on regions that are difficult to make pixel-level labels. The CasNet comprises three branches. The first branch is designed to produce coarse predictions for easy-to-label pixel regions. The second one learns to distinguish the relatively difficult-to-label pixels from the entire image. Finally, the last branch generates final predictions by combining both the coarse and the fine prediction results through a weighting coefficient that is estimated by the second branch. Three branches focus on their own objectives and collaboratively learn to predict from coarse-to-fine predictions. To evaluate the performance of the proposed network, we conduct experiments on two public datasets: SIFT Flow and Stanford Background. We show that these three branches can be trained in an end-to-end manner, and the experimental results demonstrate that the proposed CasNet outperforms existing state-of-the-art models, and it achieves prediction accuracy of 91.6% and 89.7% on SIFT Flow and Standford Background, respectively.

Received: 25 October 2017      Published: 29 April 2019
Corresponding Authors: Zhidong Deng