Uppsala Architecture Research Team
A compiler automated decoupled access-execute approach
Traditional Dynamic Voltage and Frequency Scaling (DVFS) policies are limited by DVFS granularity and by shrinking voltage ranges. Such restrictions result in energy savings but only with significant sacrifices in performance which is antithetical to a goal for scalable energy-efficient computing. Yet, there is significant untapped potential that can be unlocked by closer software-hardware cooperation and co-designed optimizations. We make the case for software decoupled access-execute (DAE) in which the compiler automatically transforms programs into coarse-grain Access (memory-bound) and Execute (compute-bound) phases. The granularity of the phases is adjusted to strike a balance between the memory hierarchy and the hardware DVFS capabilities. Access phases are designed to prefetch data in the cache (at low frequency, to save energy), while Execute phases consume data from the cache and perform computations (at high frequency, to deliver performance). Sophisticated static analyses guide the compiler in generating highly efficient Access phases, which achieve the right balance between costs and benefits of prefetching. We propose compiler techniques to exploit and adapt to each program's characteristics, yielding DAE universally applicable on a wide range of applications, including scientific linear codes, complex, irregular serial programs, as well as parallel applications. Overall, our techniques preserve peak performance while achieving energy-delay-product (EDP) improvements over 20% on average (with peak EDP improvements surpassing 70%) across a selection of applications from several benchmark suites.