We study the problem of predicting the Field-of-Views (FoVs) of viewers watching 360° videos using commodity Head-Mounted Displays (HMDs). Existing solutions either use the viewer's current orientation to approximate the FoVs in the future, or extrapolate future FoVs using the historical orientations and dead-reckoning algorithms. In this paper, we develop fixation prediction networks that concurrently leverage sensor- and content related features to predict the viewer fixation in the future, which is quite different from the solutions in the literature. The sensor-related features include HMD orientations, while the content-related features include image saliency maps and motion maps. We build a 360° video streaming testbed to HMDs, and recruit twenty-five viewers to watch ten 360° videos. We then train and validate two design alternatives of our proposed networks, which allows us to identify the better-performing design with the optimal parameter settings. Trace-driven simulation results show the merits of our proposed fixation prediction networks compared to the existing solutions, including: (i) lower consumedbandwidth, (ii) shorter initial buffering time, and (iii) short running time.