17 January 2018Abstract:
New trends such as the internet-of-things and smart homes push the demands for energy-efficiency. Choosing energy-efficient hardware, however, often comes as a trade-off to high-performance. In order to strike a good balance between the two, we propose software solutions to tackle the performance bottlenecks of small and energy-efficient processors.
One of the main performance bottlenecks of processors is the discrepancy between processor and memory speed, known as the memory wall. While the processor executes instructions at a high pace, the memory is too slow to provide data in a timely manner, if data has not been cached in advance. Load instructions that require an access to memory are thereby referred to as long-latency or delinquent loads. Long latencies caused by delinquent loads are putting a strain on small processors, which have few or no resources to effectively hide the latencies. As a result, the processor may stall.
In this thesis we propose compile-time transformation techniques to mitigate the penalties of delinquent loads on small out-of-order processors, with the ultimate goal to avoid processor stalls as much as possible. Our code transformation is applicable for general-purpose code, including unknown memory dependencies, complex control flow and pointers. We further propose a software-hardware co-design that combines the code transformation technique with lightweight hardware support to hide latencies on a stall-on-use in-order processor.
Available as PDF (3.19 MB)
Download BibTeX entry.