The introduction of general purpose microprocessors running multiple threads will put a focus on methods and tools helping a programmer to write efficient parallel applications. Such tool should be fast enough to meet a software developer's need for short turn-around time, but also accurate and flexible enough to provide trend-correct and intuitive feedback.
This paper describes an efficient and flexible approach for modeling the memory system of a multiprocessor, such as those of chip multiprocessors (CMPs). Sparse data is sampled during a multithreaded execution. The data collected consist of the reuse distance and invalidation distribution for a small subset of the memory accesses. Based on the sampled data from a single run, a new mathematical formula can be used to estimate the miss rate for a memory hierarchy built from caches of arbitrarily size, cacheline size and degree of sharing. The formula further divides the misses into six categories to further aid the software developer.
The method is evaluated using a large number of commercial and technical multithreaded applications. The result produced by our algorithm fed with sparse sampling data is shown to be consistent with results gathered during traditional architecture simulation.
Available as compressed Postscript (1006 kB)
Download BibTeX entry.