This study examined how middle school students constructed their understanding of the mitosis and meiosis processes at a molecular level through multimedia learning materials presented in different interaction and sensory modality modes. A two (interaction modes: animation/simulation) by two (sensory modality modes: narration/on-screen text) factorial design was employed. The dependent variables included subjects' pre-test, post-test, and retention-test scores, showing their understanding of mitosis and meiosis process at molecular level, as well as data of subjects' eye-movement behavior. Results showed the group that received animation with narration allocated a greater amount of visual attention (number of fixations, total inspection time, and mean fixation duration) than the group that received animation with on-screen text, in both pictorial area and area of interest, which is consistent with students' immediate and long-term retained learning of the processes of mitosis and meiosis. The group that received simulation with on-screen text allocated a greater amount of visual attention than the group that received simulation with narration, consistent with students' immediate and retained learning. The group that received simulation with on-screen text also allocated a greater amount of visual attention than the group that received animation with on-screen text, consistent with students' immediate and retained learning. This study adds empirical evidence of a direct correlation between the length of eye fixation behavior and the depth of learning. Moreover, it provides insight into the multimedia effect on students' cognitive process through the use of eye fixation behavior evidence.