Self-driving cars leverage on semantic segmentation to understand an urban scene. However, it is costly to collect segmentation labels, thus, synthetic datasets are used to train segmentation models. Unfortunately, the synthetic to real domain shift causes these models to perform poorly. Prior works use adversarial training to align features of both synthetic and real-world images. We observe that background objects tend to be similar across domains, while foreground objects tend to have more variations. Using this insight, we propose an adaptation method that uses foreground and background cues and adapt them separately. We also propose a mask-aware gated discriminator that learns soft masks from the input foreground and background masks instead of naively performing binary masking that immediately removes information outside of the predicted masks. We evaluate our method on two different datasets and show that our method outperforms several state-of-the-art baselines, which verifies the effectiveness of our approach.