The massive data demand of GPGPUs requires expensive memory modules, such as GDDR, to support high data bandwidth. The high cost poses constraints on the total memory capacity available to GPGPUs, and the data need to be transferred between the host CPUs and GPGPUs. However, the long latency of data transfers has resulted in significant performance overhead. To alleviate this issue, the modern GPGPUs have implemented the non-blocking data transfer allowing a GPGPU to perform computing while the data is being transmitted. This paper proposes a capacity aware scheduling algorithm that exploits the non-blocking data transfer in modern GPGPUs. By effectively taking the advantage of non-blocking transfers, experiment results demonstrate an average of 24.01% performance improvement when compared to existing approaches that only consider memory capacity.