Uppsala Architecture Research Team
Towards more efficient execution: a decoupled access-execute approach
The end of Dennard scaling is expected to shrink the range of DVFS in future nodes, limiting the energy savings of this technique. This paper evaluates how much we can increase the effectiveness of DVFS by using a software decoupled access-execute approach. Decoupling the data access from execution allows us to apply optimal voltage-frequency selection for each phase and therefore improve energy efficiency over standard coupled execution.
The underlying insight of our work is that by decoupling access and execute we can take advantage of the memory- bound nature of the access phase and the compute-bound nature of the execute phase to optimize power efficiency, while maintaining good performance. To demonstrate this we built a task based parallel execution infrastructure consisting of: (1) a runtime system to orchestrate the execution, (2) power models to predict optimal voltage-frequency selection at runtime, (3) a modelling infrastructure based on hardware measurements to simulate zero-latency, per-core DVFS, and (4) a hardware measurement infrastructure to verify our model’s accuracy.
Based on real hardware measurements we project that the combination of decoupled access-execute and DVFS has the potential to improve EDP by 25% without hurting performance. On memory-bound applications we significantly improve performance due to increased MLP in the access phase and ILP in the execute phase. Furthermore we demonstrate that our method can achieve high performance both in presence or absence of a hardware prefetcher.
The impact of decoupling access and execute: blue regions (access) can be run at lowest frequency without hurting performance.
-
Towards Power Efficiency on Task-Based, Decoupled Access-Execute Models
. In PARMA 2013, 4th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures, 2013. (Conference website
, fulltext:postprint
).
-
Towards more efficient execution: a decoupled access-execute approach
. In Proc. 27th ACM International Conference on Supercomputing, pp 253-262, ACM Press, New York, 2013. (DOI
, fulltext:print
).