Weakly Supervised Semantic Segmentation: From Box to Tag and Back

Abstract

We propose an approach for semantic segmentation with weak supervision using bounding box annotations. Most previous work relies on segmenting bounding boxes into the object and the background. Each box is segmented independently from the other boxes. We argue that the collection of boxes for the same class naturally provides a dataset from which we can learn a model to segment that object class. Learned model, in turn, leads to a better segmentation of each individual box. Thus for each class, we propose to train a segmentation CNN from the dataset consisting of the bounding boxes for that class. This step transforms the bounding box weak supervision into to several image-tag weak supervision tasks. Each image-tag weak supervision task is on a dataset with a single object class. After we train these single-class CNNs, we apply them back to the training bounding boxes to obtain object/background segmentations and merge them to construct pseudo-ground truth. The obtained pseudo-ground truth is used for training a standard segmentation CNN. We improve the state of the art on Pascal VOC 2012 benchmark in bounding box weak supervision setting.

Zongliang (Jerry) Ji
Zongliang (Jerry) Ji
PhD student @ U of Toronto

My research interests include machine learning for healthcare and computational biology.