This study proposes a human-robot collaboration framework for semiautomatic control of quadrotor unmanned aerial vehicles (UAVs) for beyond visual line of sight applications. The framework integrates partially teleoperated flight control with partially autonomous task execution by developing a hierarchical control architecture by multiple levels of instruction. The flight control teleoperation is performed by an operator's hand gesture to a mounted camera on a monitoring device. We extract the abduction and adduction of fingers and hand pointing direction as features for gesture recognition using Mediapipe. Equipped with visual processing capability, the quadrotor can also automate a subtask execution process when a high-level instruction gesture is given. We provide a customizable target perception architecture based on ORB-SLAM2 and Detectron2 to estimate the volume and coordinates of 3D objects through instance segmentation. The combination of gesture control and autonomous visual perception provides advantage of a loosely engaged operation process over the tightly-coupled joystick control. Based on the experimental results, we show that our proposed approach improves the task effectiveness of quadrotor UAV space exploration and object inspection in an indoor environment.