This paper presents a framework for automatic video region-of-interest determination based on visual attention model. We view this work as a preliminary step towards the solution of high-level semantic video analysis. Facing such a challenging issue, in this work, a set of attempts on using video attention features and knowledge of computational media aesthetics are made. The three types of visual attention features we used are intensity, color, and motion. Referring to aesthetic principles, these features are combined according to camera motion types on the basis of a new proposed video analysis unit, frame-segment. We conduct subjective experiments on several kinds of video data and demonstrate the effectiveness of the proposed framework.