Department of Information Technology

Resource Sharing Modeling

More information: Uppsala Architecture Research Team | Modeling.

Motivation

The relatively long latency and limited bandwidth to off-chip memory makes applications' performance highly dependent on how well they utilize the resources in the memory hierarchy. In modern chip multiprocessors (CMPs) cores share key memory hierarchy resources, such as on-chip caches and off-chip memory bandwidth. On the hardware level, these resource are typically shared in a free-for-all manner, and how much resources an application receive can therefore vary greatly depending on what other applications it happen to be co-running with. On the software level, process schedulers and thread placement algorithms therefore have the potential to greatly improve application performance and scalability by intelligently schedule and place threads/applications in a resource aware manner.
This work focuses on techniques to collect information about resource usage and models that capture how resource sharing affects performance. With this information we can understand how to optimize applications, operating system schedulers, and hardware to maximize the performance of applications sharing resources.

resource-sharing.gif

Long Term Goal

Our long term goal is to develop methods to predict and enhance application performance and scalability. These approaches can be used as a basis for the development of practical resource-sharing aware process scheduling algorithms, thread mappings, and application optimizations.

  • Tools to efficiently measure, model and predict application performance and scalability in the presence of resource sharing. In particular, we are examining the impacts of sharing in the memory system through caches, prefetchers, and off-chip bandwidth.
  • Practical resource-aware process scheduling and thread placement algorithms to leverage the performance models for better scalability.
  • Application optimizations based on the analysis of shared resources usage to improve application performance in when sharing resources and to minimize its impact on other applications.

On-going Work

  • Leveraging our resource sharing models and profiling data to better schedule runtime tasks. Germán Ceballos
  • Measuring resource sensitivity on heterogeneous systems. Johan Janzén

Achievements

  • Initial work on extending statistical cache models to task-based frameworks (StatTask), including profiling of the OpenMP BOTS benchmark suite to identify data reuse properties.
  • StatCC presented an efficient method to model the performance impacts of cache sharing for multi-program workloads. StatCC uses static program information (instruction mix) and a performance/cache-miss model to predict how applications will affect each other's performance when sharing a cache.
  • We have leveraged the StatStack cache modeling framework to automatically identify memory access instructions that pollute shared caches by installing data that is never reused. Once found, these instructions can then be automatically transformed to non-temporal accesses to keep them from being installed in the cache. This automatic analysis and transformation increases the amount of cache available to other applications, thereby improving multi-application throughput.
  • We have developed a new methods that measure how properties of commodity hardware change as the available cache capacity and bandwidth for an application changes. Through Cache Pirating and the Bandwidth Bandit we can steal cache capacity and bandwidth to evaluate the sensitivity of an application's performance, bandwidth, and cache behavior as a function of cache size and bandwidth allocation.
  • Using the Cache Pirate and Bandwidth Bandit data we have developed techniques to understand scaling bottlenecks.
  • We can accurately model cache allocation on commodity hardware using Cache Pirate profiles. This allows us to predict performance (CPI) and bandwidth consumption of a set of co-executed applications from data captured for each application individually.
  • We have used our cache allocation model to rapidly (600x faster than native execution) explore the variability in application slowdown due to varying offsets in application executions. This work has shown that two co-executing applications can have very different slowdowns depending on how they overlap in time.

Publications

Approach

This work leverages low-overhead hardware performance counters, runtime resource stealing, and fast statistical hardware models to rapidly predict the effects of resource sharing.

internal project page

Updated  2015-03-03 08:18:37 by David Black-Schaffer.