This paper presents a new language processing and understanding where an adaptive data augmentation strategy for individual documents is proposed instead of using one universal policy for the whole dataset. Importantly, a reinforcement learning and understanding method is exploited for document classification where the document encoder, augmenter and classifier are jointly optimized. In particular, a new reward function based on the consistency loss maximization is presented to assure the diversity of the generated documents. Using this method, the reward for adaptive augmentation policy is immediately calculated for every augmented instance without the need of waiting the child model performance metrics as the reward. The experiments on various classification tasks with a strong baseline model show that the augmentation strategy optimization can improve the model training process by providing meaningful augmentation data which eventually result in desirable evaluation performance. Furthermore, the extensive studies on the behavior of policy in different settings are provided in order to assure the diversity of the augmented data that was obtained by the proposed method.