The state-of-the-art models for semantic image segmentation usually contain a convolutional neural network (CNN) and a conditional random field (CRF). As a predictor, existing CNN techniques can generate a dense prediction result but may generate obvious boundary errors at the same time. As a refinement model, CRF improves the CNN outcomes by forcing the consistency of local labels. However, the use of CRF may cause fragmentation effect around object boundaries. In this paper, we propose the use of a so-called iterative contraction and merging (ICM) process to facilitate the semantic segmentation process. Guided by the high-level information from CNN, the ICM process is used as a tool to grow image segments in a bottom-up way and to produce more accurate outcomes in an iterative way. The ICM process can faithfully preserve the boundary information and maintain the consistency of local labels. Our experimental results demonstrate that the performance of the proposed approach is comparable to the state-of-the-art models but with more accurate boundaries.