Next-generation sequencing (NGS) has been widely applied to genetics research and biomedical applications. It achieves a high sequencing speed by sequencing subsequences (called short-reads) in a massively parallel manner . However, the succeeding data analysis for assembling short-reads still takes a couple of days on CPU and thus becomes the bottleneck. Fig. 1(a) shows the NGS data analysis workflow, consisting of short-read mapping, haplotype & variant calling, and genotype calling. Of these three steps, the execution time is dominated by short-read mapping. A CPU-FPGA heterogeneous system is presented in  for accelerating short-read mapping, but the performance improvement is limited. A dedicated FPGA accelerator  achieves a higher throughput at a cost of a larger DRAM requirement. Compared to prior arts, this work presents an FPGA accelerator that delivers a 1.7-to-18.6x higher throughput in a memory-efficient way. Paired-end short-read mapping is exploited to achieve the highest 99.3% accuracy on true human DNA.