11 October 2017Abstract:
Making computer systems more energy efficient while obtaining the maximum performance possible is key for future developments in engineering, medicine, entertainment, etc. However it has become a difficult task due to the increasing complexity of hardware and software, and their interactions. For example, developers have to deal with deep, multi-level cache hierarchies on modern CPUs, and keep busy thousands of cores in GPUs, which makes the programming process more difficult.
To simplify this task, new abstractions and programming models are becoming popular. Their goal is to make applications more scalable and efficient, while still providing the flexibility and portability of old, widely adopted models. One example of this is task-based programming, where simple independent tasks (functions) are delegated to a runtime system which orchestrates their execution. This approach has been successful because the runtime can automatically distribute work across hardware cores and has the potential to minimize data movement and placement (e.g., being aware of the cache hierarchy).
To build better runtime systems, it is crucial to understand bottlenecks in the performance of current and future multicore systems. In this thesis, we provide fast, accurate and mathematically-sound models and techniques to understand the execution of task-based applications concerning three key aspects: memory behavior (data locality), scheduling, and performance. With these methods, we lay the groundwork for improving runtime system, providing insight into the interplay between the schedule's behavior, data reuse through the cache hierarchy, and the resulting performance.
Available as PDF (33.79 MB)
Download BibTeX entry.