Abstract
Prefetching, i.e., exploiting the overlap of processor computations with data accesses, is one of several approaches for tolerating memory latencies. Prefetching can be either hardware-based or software-directed or a combination of both. Hardware-based prefetching, requiring some support unit connected to the cache, can dynamically handle prefetches at run-time without compiler intervention. Software-directed approaches rely on compiler technology to insert explicit prefetch instructions. Mowry et al.'s software scheme and our hardware approach are two representative schemes. In this paper, we evaluate approximations to these two schemes in the context of a shared-memory multiprocessor environment. Our qualitative comparisons indicate that both schemes are able to reduce cache misses in the domain of linear array references. When complex data access patterns are considered, the software approach has compile-time information to perform sophisticated prefetching whereas the hardware scheme has the advantage of manipulating dynamic information. The performance results from an instruction-level simulation of four benchmarks confirm these observations. Our simulations show that the hardware scheme introduces more memory traffic into the network and that the software scheme introduces a non-negligible instruction execution overhead. An approach combining software and hardware schemes is proposed; it shows promise in reducing the memory latency with least overhead.
Original language | English |
---|---|
Pages (from-to) | 223-232 |
Number of pages | 10 |
Journal | Conference Proceedings - Annual International Symposium on Computer Architecture, ISCA |
DOIs | |
State | Published - 1994 |
Event | Proceedings of the 21st Annual International Symposium on Computer Architecture - Chicago, IL, USA Duration: 18 Apr 1994 → 21 Apr 1994 |