Clustering provides a promising way to build a scalable, reliable, and high-performance WAP gateway architecture. However, this requires an efficient load balancing mechanism for assigning a request to a suitable gateway in the cluster, that can offer the best service. In addition, unpredictable connection time and nonuniformity of incoming load from different mobile clients are big obstacles to load balancing among real gateways. In this paper, we propose a load balancing strategy that has the following features: (1) estimating the potential load of real gateways with small computation time and no communication overhead, (2) asynchronous alarm sent when the utilization of a real gateway exceeds a critical threshold, and (3) WAP-awareness. We also propose a scalable WAP gateway (SWG) architecture that consists of a WAP dispatcher and a cluster of real gateways. The WAP dispatcher is a front-end distributor with our load balancing strategy. To prevent the WAP dispatcher from becoming a bottleneck, the WAP dispatcher distributes mobile clients' requests in kernel space and does not process outgoing gateway-to-client responses. Experiment results show that our SWG has better load balancing performance, throughput, and delay compared to the LVS and the Kannel gateway.