Scene parsing, or semantic segmentation, aims at labeling all pixels in an image with the predefined categories of things and stuff. Learning a robust representation for each pixel is crucial for this task. Existing state-of-the-art (SOTA) algorithms employ deep neural networks to learn (discover) the representations needed for parsing from raw data. Nevertheless, these networks discover desired features or representations only from the given image (content), ignoring more generic knowledge contained in the dataset. To overcome this deficiency, we make the first attempt to explore the meaningful supportive knowledge, including general visual concepts (i.e., the generic representations for objects and stuff) and their relations from the whole dataset to enhance the underlying representations of a specific scene for better scene parsing. Specifically, we propose a novel supportive knowledge mining module (SKMM) and a knowledge augmentation operator (KAO), which can be easily plugged into modern scene parsing networks. By taking image-specific content and dataset-level supportive knowledge into full consideration, the resulting model, called knowledge augmented neural network (KANN), can better understand the given scene and provide greater representational power. Experiments are conducted on three challenging scene parsing and semantic segmentation datasets: Cityscapes, Pascal-Context, and ADE20K. The results show that our KANN is effective and achieves better results than all existing SOTA methods.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TNNLS.2021.3107194 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!