Uppsala Architecture Research Team

Instruction Cache Modeling

Performance loss caused by L1 instruction cache misses varies between different architectures and cache sizes. The growing use of low-power multi-threaded CPUs (with shared L1 caches) in general purpose computing platforms requires new efficient techniques for analyzing application instruction cache usage. Such insight can be achieved using traditional simulation technologies modeling several cache sizes, but the overhead of simulators may be prohibitive for practical optimization usage. In this work we present a statistical method to quickly model application instruction cache performance. Most importantly we propose a very low-overhead sampling mechanism to collect runtime data from the application's instruction stream. This data is fed to the statistical model which accurately estimates the instruction cache miss ratio for the sampled execution. Our sampling method is about 10x faster than previously suggested sampling approaches, with average runtime overhead as low as 25% over native execution. The architecturally-independent data collected is used to accurately model miss ratio for several cache sizes simultaneously, with average absolute error of 0.2%. Finally, we show how our tool can be used to identify program phases with large instruction cache footprint. Such phases can then be targeted to optimize for reduced code footprint.

Instruction cache behavior as a function of phase. Poster

Low Overhead Instruction-Cache Modeling Using Instruction Reuse Profiles. Muneeb Khan, Andreas Sembrant, and Erik Hagersten. In International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'12), Computer Architecture and High Performance Computing, pp 260-269, IEEE Computer Society, 2012. (DOI).