Resource Sharing Modeling

Motivation

resource-sharing.gifThe relatively long latency and limited bandwidth to off-chip memory makes applications' performance highly dependent on how well they utilize the resources in the memory hierarchy. In modern chip multiprocessors (CMPs) cores share key memory hierarchy resources, such as on-chip caches and off-chip memory bandwidth. On the hardware level, these resource are typically shared in a free-for-all manner, and how much resources an application receive can therefore vary greatly depending on what other applications it happen to be co-running with. On the software level, process schedulers and thread placement algorithms therefore have the potential to greatly improve application performance and scalability by intelligently schedule and place threads/applications in a resource aware manner.
The results of this work will be to produce models that capture how resource sharing affects performance. With this information we will better be able to understand how to optimize applications, operating system schedulers, and hardware to maximize the performance of applications sharing resources.

Long Term Goal

Our long term goal is to develop methods to predict and enhance application performance and scalability. These approaches can be used as a basis for the development of practical resource-sharing aware process scheduling algorithms, thread mappings, and application optimizations.

Expected Results

  • Tools to efficiently measure, model and predict application performance and scalability in the presence of resource sharing. In particular, we are examining the impacts of sharing in the memory system through caches, prefetchers, and off-chip bandwidth.
  • Practical resource-aware process scheduling and thread placement algorithms to leverage the performance models for better scalability.
  • Application optimizations based on the analysis of shared resources usage to improve application performance in when sharing resources and to minimize its impact on other applications.

Achievements

  • StatCC presented an efficient method to model the performance impacts of cache sharing for multi-program workloads. StatCC uses static program information (instruction mix) and a performance/cache-miss model to predict how applications will affect each other's performance when sharing a cache.
  • We have leveraged the StatStack cache modeling framework to automatically identify memory access instructions that pollute shared caches by installing data that is never reused. Once found, these instructions can then be automatically transformed to non-temporal accesses to keep them from being installed in the cache. This automatic analysis and transformation increases the amount of cache available to other applications, thereby improving multi-application throughput.
  • We have developed a new method that measures how properties of real hardware changes as the available cache capacity for an application changes. The Pirate tool can measure properties, scu as bandwidth needs, miss rate, miss ratio and IPC as a funtion of cache size with an overall overhaed in the range of 5 percent.

Publications

Approach

This work leverages low-overhead hardware performance counters and fast statistical hardware models to rapidly predict the effects of resource sharing.

internal project page

Staff

Senior: Erik Hagersten (Contact), David Black-Schaffer
Ph.D. students: David Eklöv, Andreas Sandberg, Nikos Nikoleris