In this paper, a novel framework for image-to-image translation, in which contour-consistency networks are used to solve the problem of inconsistency between the contours of generated and original images, is proposed. The objective of this study was to address the lack of an adequate training set. At the generator end, the original map is sampled by an encoder to obtain the encoder feature map; the attention feature map is then obtained using the attention module. Using the attention feature map, the decoder can ascertain where more conversions are required. The mechanism at the discriminator end is similar to that at the generator end. The map is sampled through an encoder to obtain the encoder feature map and then converted into the attention feature map. Finally, the map is classified by the classifier as real or fake. Experimental results demonstrate the effectiveness of the proposed method.