Tolerating load miss latency by extending effective instruction window with low complexity

Walter Yuan Hwa Li*, Chin Ling Huang, Chung-Ping Chung

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

An execute-ahead processor pre-executes instructions when a load miss would stall the processor. The typical design has several components that grow with the distance to execute ahead and need to be carefully balanced for optimal performance. This paper presents a novel approach which unifies those components and therefore is easy to implement and has no trouble to balance resource investment. When executing ahead, the processor enqueues (or preserves) all instructions along with the known execution results (including register and memory) in a preserving buffer (PB). When the leading load miss is resolved, the processor dequeues the instructions and then restores the known execution results or dispatch the instructions not yet executed. The implementation overheads include PB and a runahead cache for forwarding memory data. Only PB grows with the distance to execute ahead. This method can be applied to both in-order and out-of-order processors. Our experiments show that a four-way superscalar out-of-order processor with a 1 K-entry PB can have 15% and 120% speedup over the baseline design for SPEC INT2000 and SPEC FP2000 benchmark suites, assuming a 128-entry instruction window and a 300-cycle memory access latency.

Original languageEnglish
Title of host publicationProceedings - 2011 International Conference on Parallel Processing, ICPP 2011
Pages83-92
Number of pages10
DOIs
StatePublished - 2011
Event40th International Conference on Parallel Processing, ICPP 2011 - Taipei City, Taiwan
Duration: 13 Sep 201116 Sep 2011

Publication series

NameProceedings of the International Conference on Parallel Processing
ISSN (Print)0190-3918

Conference

Conference40th International Conference on Parallel Processing, ICPP 2011
Country/TerritoryTaiwan
CityTaipei City
Period13/09/1116/09/11

Fingerprint

Dive into the research topics of 'Tolerating load miss latency by extending effective instruction window with low complexity'. Together they form a unique fingerprint.

Cite this