Real-time body pose information is very useful for many human-robot interaction applications. However, due to the motion of both human and the robot, robust body pose recognition poses a challenge in such a system design. This paper aims to locate a human body initially in the acquired image plane and then classify six body poses through image recognition. Color-space techniques and the method of connected component are used to detect ellipse shape and the shape patterns are used to locate human body in the video stream. Furthermore, a neutral network has been designed to fuse data from image recognition and inertial sensors to improve the recognition rate under various environmental variations. Experimental results show that the average recognition rate of six body poses is 93.5%, an improvement from 79.23% and 90.67% of using only image recognition and inertial sensor respectively.