The identification of the memory gap in terms of the relatively slow memory accesses put a focus on cache performance in the 90s. The introduction of the moderately clocked multicores has shifted this focus from memory latency to memory bandwidth for modern processors. The multicore's limited cache capacity per thread in combination with their current a projected off-chip memory bandwidth limitation makes this the most likely bottleneck of future computer systems.
This paper presents a new and efficient way of estimating the cache performance for an application. The method has several similarities with that of Stack Distance, but instead of counting unique memory objects, as is done for Stack Distance calculations, our schema only requires the number of memory accesses to be counted between two successive accesses to the same data object. This task can be efficiently handled at runtime by existing built-in hardware counters. Furthermore, only a small fraction of the memory accesses have to be monitored for an accurate estimation.
We show how low-overhead runtime data, similar to that of StatCache, is sufficient to feed this model. We evaluate the accuracy of the proposed transformation based on sparse data and compare the results with that of native stack distance based all memory accesses. We show excellent accuracy over a wide range of cache sizes and applications.
Available as PDF (235 kB, no cover)
Download BibTeX entry.