Efficient Modeling Technology

Motivation

This work focuses on the development of efficient techniques for capturing relevant performance information at runtime as well as the development of statistical modeling methods to predict and analyze application performance based on the information captured at runtime. Understanding an application behavior is necessary to fully exploit the available hardware. The memory hierarchy, latency hiding techniques, and many other advanced mechanisms in the CPU architecture dictate how fast applications execute. Simulation has been used to study the performance impact of these features, but is typically too slow to analyze large applications or datasets. Furthermore, fast modeling techniques are needed for providing guidance for schedulers, compilers and programmers.
This work seeks to develop new statistical and analytical models to predict and capture significant aspects of application performance and power consumption based on runtime information that can be cheaply captured. It also includes methods to cheaply collect such information.

Long Term Goal

Develop efficient statistical models of critical performance factors based on easily captured data. We will also develop the infrastructure to capture a wide variety of performance and execution information for real applications processing real data sets on real hardware. While the current modeling the memory hierarchy, future work will be expended to also cover other parts of computer systems.

Our Approach

Our work is based on three different core technologies, originally developed in this research group: Statistical reuse distance sampling, statistical modeling and on-line phase detection.

Reuse Distance Sampling

An application´s locality properties (e.g., its temporal and spatial locality) dictates how well it can utilize the cache hierarchy. Measuring an application´s stack-distance counts the number of UNOIQUE data pieces touch between two accesses to the same piece of data. This effectively captures the essential parts of an application´s locality properties and is used by many researchers to feed their analysis models. However, no proposal exists for how to efficiently capture stack distance information. We instead have proposed several new methods for capturing reuse distance information for an application, i.e., to count the number of other memory accesses between two accesses to the same piece of data. We have also implemented several efficient tools for capturing reuse distance information.

Statistical Modeling

stat-sampling-model.gifRuse distance information alone cannot tell much about the properties of a system and will have to be refined to model system properties. We have developed several mathematical models for modeling caches with LRU replacement and random replacement based on reuse distance information. We have also shown that such caches can be effectively modeled even with very sparse reuse distance data.

Phase Detection

The third core technology for efficient modeling is on-line phase detection. The behavior of an application tends to change over time and is often assembled from a handful of different behavior types, known as phases. Knowing the behavior for each such phace, as well as its fraction of the execution time, is sufficient for knowing the application´s overall behavior. Our approach involved techniques for detecting phase changes as well as detecting if the new phase has occurred before.

Achievements

  • Implemented a low-overhead tool for reuse distance sampling on x86 running Linux. Our previous tool only supported UlktrSPARC III running Solaris.
  • StatCache modeling techniques for efficiently analyzing and predicting application cache usage for random replacement caches. The StatCache approach uses a simple statistical model to very rapidly (< 1s) produce accurate estimates of an application's cache miss ratio for arbitrary-sized caches. This approach was commercialized in the Acumem tool suite.
  • Designed a new statistical cache model to model the behavior of caches with LRU replacement.
  • Implemented the first general low-overhead runtime phase detection at tool. It´s average overhead is below 2 percent.
  • Designed a phase-guided reuse distance sampler by combining the x86 sampler and the phase detection tool. It results in on average 6 times lower overhead and 50 percent better accuracy than the basic x86 sampler.

Expected Future Results

  • Modeling of memory level parallelism and its effect on the memory system and performance based on statistically captured runtime data.
  • Modeling of instruction cache behavior, including reuse and prefetching, based on statistically captured runtime data.
  • Integration of data prefetching models into the existing data cache models (StatStack and StatCache) to more accurately reflect modern hardware.
  • Modeling of branch prediction behavior
  • Use interval modeling to model detailed affects in out-of-order processors based on efficiently captured architecturally independent runtime information (such as reuse distance).

Publications

  • Peter Vestberg. Low-Overhead Memory Access Sampler An efficient method for data-locality profiling. M.Sc. Dissertation Uppsala University January 2011, ISSN: 1401-5749, UPTEC IT 11 003.
  • Andreas Sembrant. Low Overhead Online Phase Predictor and Classifier. M.Sc. Dissertation Uppsala University January 2011, ISSN: 1401-5749, UPTEC IT 11 002.

Internal project page

Staff

Senior: Erik Hagersten (Contact), David Black-Schaffer, Stefanos Kaxiras
Ph.D. students: Nikos Nikoleris, Muneeb Khan, David Eklöv, Andreas Sembrant