Coffee break (08:30-09:00) |
Invited Presentation |
09:00-10:00 |
Implications of the Meltdown & Spectre Hardware Security Flaws (abstract)
Two major hardware security design flaws--dubbed Meltdown and
Spectre--were broadly revealed to the public in early January 2018 in
research papers and blog posts that require considerable expertise and effort
to understand. To complement these, this talk seeks to give a general
computer science audience the gist of these security flaws and their
implications. The goal is to enable the audience can either stop there or
have a framework to learn more. A non-goal is exploring many details of flaw
exploitation and patch status, in part, because the speaker is a computer
architect, not a security expert.
In particular, this talk reviews that Computer Architecture 1.0 (the version
number is new) specifies the timing-independent functional behavior of a
computer and micro-architecture that is the set of implementation techniques
that improve performance by more than 100x. It then asks, “What if a
computer that is completely correct by Architecture 1.0 can be made to leak
protected information via timing, a.k.a., micro-architecture?” The answer
is that this exactly what is done by the Meltdown and Spectre design flaws.
Meltdown leaks kernel memory, but software & hardware fixes exist. Spectre
leaks memory outside of sandboxes and bounds check, and it is scary. An
implication is that the definition of Architecture 1.0--the most important
interface between software and hardware--is inadequate to protect
information. It is time for experts from multiple viewpoints to come together
to create Architecture 2.0).
Mark D. Hill |
About UPMARC |
10:00-10:15 |
Introduction
Johan Tysk (Vice-rector of Science and Technology) |
10:15-10:30 |
UPMARC: Overview
Bengt Jonsson (Director of UPMARC) |
Coffee break (10:40-11:10) |
Session 1 |
11:10-11:30 |
Task parallel programming for Scientific Computing (abstract)
In high-performance scientific computing, one application needs to be able to run efficiently on large-scale distributed, multicore-based, and potentially heterogeneous hardware. Task parallel programming is one way to make this feasible for a broader range of application codes. In this programming paradigm, the software developer takes the responsibility for describing the workflow in terms of tasks and their data accesses. The scheduling of tasks in parallel onto the hardware is then handled by a run-time system. During the last decade, extensive research has been performed on how to design these run-time systems. In this talk, some of the contributions that were developed at UPMARC are presented.
Elisabeth Larsson |
11:30-11:50 |
Testing and Verifying Concurrent Programs using Stateless Model Checking (abstract)
Ensuring correctness of concurrent programs is difficult since one must consider all the different ways in which threads can interact. A successful technique for finding concurrency bugs and for verifying their absence is stateless model checking (SMC). In this talk, we will present algorithms and tools for SMC that were developed under the aegis of UPMARC and discuss how they have significantly advanced the state-of-the-art in this area. Experiences from applying these tools in testing and verifying key properties of RCU, the Read-Copy-Update component of the Linux kernel, and from a collaboration project with VMWare reseachers to test and verify chain repair methods for Corfu, a distributed shared log which aims to be scalable and reliable in the presence of failures and asynchrony, will also be presented.
Kostis Sagonas |
11:50-12:10 |
Verifying Parallel Programs under Weak Memory Models (abstract)
We consider the verification of concurrent programs running under the classical TSO (Total Store Order) memory model. The model allows write to read relaxation corresponding to the addition of an unbounded store buffer (that contains pending store operations) between each processor and the main memory. In this talk, we introduce a novel semantics which we call the dual TSO semantics that replaces each store buffer with a load buffer that contains pending load operations. The flow of information is reversed, i.e., store operations are performed by the processes atomically on the main memory, while values of variables are propagated from the memory to the load buffers of the processes. We show that the two semantics are equivalent in the sense that a program will reach identical sets of states when running under the two semantics. Furthermore, we present a simple and effective reachability algorithm for checking safety properties of programs running under the dual semantics.
Mohamed Faouzi Atig |
Lunch Break (12:10-13:30) |
Session 2 |
13:30-13:50 |
Design for Update: Composability, Predictability and Efficiency (abstract)
Today, many industrial products are defined by software and therefore customizable by installing new applications on demand. Their functionalities are implemented by software and can be modified and extended by software updates. This trend towards customizable products is rapidly expanding into all domains of IT, including Embedded Systems deployed in cars, robotics, and medical devices etc. The current state-of-practice in safety-critical systems allows hardly any modifications once they are deployed. This work aims to supply the missing paradigms and technologies for building and updating safety-critical embedded systems after deployment – subject to stringent timing constraints, dynamic workloads, and limited resources on complex platforms. Essentially there are three key challenges: Composability, Resource-Efficiency and Predictability to enable modular, incremental and safe software updates over system life-time in use. We present research directions to address these challenges: (1) Open architectures and implementation schemes for building composable systems, (2) Fundamental issues in real-time scheduling aiming at a theory of multi-resource (inc. multiprocessor) scheduling, and (3) New-generation techniques and tools for fully separated verification of timing and functional properties of real-time systems with significantly improved efficiency and scalability.
Wang Yi |
13:50-14:10 |
Typing Lockfree and Concurrent Programs
(abstract)
Unique or linear references simplify such reasoning about
ownership and lifetime of resources and are used frequently with
or without language support across mainstream programming
languages. The standard way of maintaining uniqueness is to move
references, for example using the move operation on C++ smart
pointers. This talk shows how we can extend these concepts to
lock-free algorithms, which implement protocols that ensure safe,
non-blocking concurrent access to data structures, but generally
are not typable with linear references because they rely on
aliasing to achieve lock-freedom.
Tobias Wrigstad |
14:10-14:30 |
Consistency models and their implications on core architecture (abstract)
The memory consistency model has important implications on the microarchitecture of out-of-order cores. In this talk we discuss recent advances and in particular non-speculative reordering of loads and non-speculative store coalescing and show how such advances can change the way we design cores.
Stefanos Kaxiras |
14:30-14:50 |
Green Cache: Energy-efficient pointer-based multicore caches (abstract)
After 60 years of cache designs, their implementation still looks the same: Data is stored together with their address tags (≈30bits). The location of a requested datum is determined by searching the cache hierarchy for a matching address tag, answering the local question, Is the datum here? This search is costly both in terms of energy and latency.
Green Caches relies on small cacheline pointers, CP (≈5 bits), to answer the global question, Where is the datum? A CP is not coupled to its corresponding datum; instead, CPs are stored in a small, separate metadata hierarchy (≈3% of the SRAM) while data are stored in plain SRAM (≈97%) with no tags or comparators. This removes the need for data searches. By introducing delayed acknowledges in the coherence protocol, the CP information is made deterministic, i.e., the datum remains in the CP location until read, removing many traditional coherence corner cases.
Green Cache also enables important cache topologies and optimisations, such as cache bypassing, private data classification and non-uniform caches. Overall, it reduces chip traffic by an average of 70%, dynamic cache energy (EDP) by 50% and the latency for L1 cache misses by 30%.
Erik Hagersten |
Coffee Break (14:50-15:30) |
Poster Session (including industrial projects) |
15:30-16:30 |
Poster Session (including industrial projects)
|
Closing |
16:30-17:00 |
Conclusing Remarks
Bengt Jonsson |