Code generation from graphical user interface images is a promising area of research. Recent progress on machine learning methods made it possible to transform user interface into the code using several methods. The encoder–decoder framework represents one of the possible ways to tackle code generation tasks. Our model implements the encoder–decoder framework with an attention mechanism that helps the decoder to focus on a subset of salient image features when needed. Our attention mechanism also helps the decoder to generate token sequences with higher accuracy. Experimental results show that our model outperforms previously proposed models on the pix2code benchmark dataset.