Keynote Speakers

This year, MCC will include three keynotes from top researchers.

Per Stenström - Chalmers University, Göteborg, Sweden

Runtime Guided Management of Multicore Cache Hierarchies
Download Presentation

Abstract: As Moore’s Law is running out of steam management of a compute chip’s resources becomes increasingly important. Over the past few years, we have been investigating how cross-layer cooperation can be helpful to manage resources more effectively and in particular the on-chip cache hierarchy in multicore systems. Our approach is to leverage static information from task-based programming models and dynamic information collected as a parallel program runs to have the run-time system smartly manage cache resources. This talk will detail our overall approach and show how it can smartly manage inter-task communication and eliminate dead cache blocks by appropriate collaboration between the runtime system and the architecture.

Bio:Per Stenstrom is a professor at Chalmers University of Technology. His research interests are in parallel computer architecture. He has authored or co-authored four textbooks, more than 150 publications and ten patents in this area. He has been program chairman of several top-tier IEEE and ACM conferences including IEEE/ACM Symposium on Computer Architecture and acts as Senior Associate Editor of ACM TACO and Associate Editor-in-Chief of JPDC. He is a Fellow of the ACM and the IEEE and a member of Academia Europaea, the Royal Swedish Academy of Engineering Sciences and the Royal Spanish Academy of Engineering Science.

Davide Rossi - University of Bologna, Italy

Introduction to the PULP (Parallel Ultra Low Power) platform
Download Presentation

Abstract: The “internet of everything” envisions trillions of connected objects loaded with high-bandwidth sensors requiring massive amounts of local signal processing, fusion, pattern extraction and classification. From the computational viewpoint, the challenge is formidable and can be addressed only by pushing computing fabrics toward massive parallelism and brain-like energy efficiency levels. CMOS technology can still take us a long way toward this vision. Our recent results with the PULP (parallel ultra-low power) open computing platform demonstrate that pj/OP (GOPS/mW) computational efficiency is within reach in today’s 28nm UTBB FD-SOI technology. In this talk, I will describe the evolution of the PULP platform and tackle the main challenges for next generation mW-range energy efficient computing systems.

Bio: Davide Rossi, received the PhD from the University of Bologna, Italy, in 2012. He has been a post doc researcher in the Department of Electrical, Electronic and Information Engineering “Guglielmo Marconi” at the University of Bologna since 2015, where he currently holds an assistant professor position. His research interests focus on energy efficient digital architectures in the domain of heterogeneous and reconfigurable multi and many-core systems on a chip. This includes architectures, design implementation strategies, and runtime support to address performance, energy efficiency, and reliability issues of both high end embedded platforms and ultra-low-power computing platforms targeting the IoT domain. In these fields, he has published more than 60 paper in international peer-reviewed conferences and journals.

Johan Grönqvist - ARM, Lund

Hardware-Software Co-design in Arm GPUs
Download Presentation

Abstract: We will give an overview of Arm's latest GPU architecture, followed by a brief description of the graphics pipeline as embodied in the graphics APIs. Then, as an example of the co-design process, we describe Index-Driven Position shading, a new technique introduced in the most recent GPU Architecture, designed to decrease memory bandwidth for graphics applications. We focus on the hardware and software changes and their effect on memory traffic. To illustrate this, we present examples of cases where this change can be beneficial, and how the software layer can determine when to use this change.

Bio: With a PhD in theoretical physics, Johan joined Arm six years ago and has spent that time in Arm's Media Processing Group working on performance analysis for compute workloads, on OpenCL driver development and GPU compiler development.