UPMARC Summer School on Multicore Computing, June 21-24, 2010
|Home||Lectures||Travel and Accommodation||Registration||Program|
Partitioned Fixed-Priority Real-time Scheduling on Multi-cores
Nan Guan, Uppsala university
Liu and Layland discovered the famous utilization bound for fixed-priority scheduling on single-processor systems in the 1970´s. Since then, it has been a long standing open problem to find fixed-priority scheduling algorithms with the same bound for multiprocessor systems. In our recent work, we have developed partitioning-based fixed-priority multiprocessor scheduling algorithms with Liu and Layland´s utilization bound. We first proposed a polynomial algorithm to achieve the desired bound. Then we will introduce a pseudo-polynomial algorithm which not only has this best utilization bound, but also outperforms existing algorithms with respect to average performance.
Transactional Memory: Myths and Limits
The Transaction Memory (TM) paradigm promises a lot. Argued to be as easy to use as coarse-grained locking and as efficient as, hand-crafted, fine-grained locking, it is very appealing to leverage multi-core architectures. Not surprisingly, a large body of work has been dedicated to implementing the TM paradigm and stretching its variants. Yet, little work has been devoted to explore its inherent power and limitations. This tutorial will introduce the TM paradigm and overview its fundamental characteristics beyond specific prototypes and platforms. The tutorial will address questions like what safety property should a TM ensure? what liveness? how to verify these? can a TM scale? how to measure the performance of a TM? The tutorial will also discuss some TM related open problems and research perspectives.
Relaxed Memory Models in 2010
Multiprocessors have become commonplace, but programming them remains very challenging. In particular, they typically do not provide a sequentially consistent memory abstraction, exacerbating the usual problems of concurrent programming. Instead, performance optimisations, in processor hardware and in high-level language compilers, mean that memory accesses can be reordered in various subtle ways. Moreover, despite extensive research, some major processor architectures and languages still do not have well-specified behaviour. In these lectures I will introduce relaxed memory models, focussing on the behaviour of x86 multiprocessors and touching also on those for Power, ARM, Java and C++.
Re-Targetable Data-Parallel Code for Multicore Processors, GPUs and FPGAs
This tutorial will present a technique for writing data-parallel algorithms in a discipline which permits efficient compilation to multicore vector processors (via SSE4 instructions); GPUs (via the DirectX framework); and FPGAs (via a VHDL based compilation flow and Xilinx synthesis and implementation tools). The tutorial aims to emphasize the importance of expressing data movement and reorganization in the original source description which then permits efficient automatic layout and memory access patterns to support different kinds of parallel execution. The tutorial will work through a series of data-parallel examples in a variety of languages including C++, C#, F# and Haskell and will include several demonstrations. We will also emphasise the important role of embedded domain specific languages (eDSLs) for addressing the challenge of programming heterogeneous processors.
Real-Time Computing on Multicore Platforms
This talk will focus on the broad question of how to best support workloads with timing constraints on multicore platforms. UNC's real-time systems group is seeking to address this question by devising practical algorithms for scheduling and synchronizing real-time tasks on multiprocessors, developing operating-system infrastructure in which such algorithms are supported, and conducting experimental evaluations to assess the viability of the different algorithms and implementations produced. This work, and related research done by other groups, will be surveyed. The talk will begin with a high-level overview of relevant background on multiprocessor real-time scheduling and synchronization. Following that, a number of different research efforts will be discussed in detail. These efforts range from investigations involving fundamental real-time scheduling and synchronization problems to the development of a Linux extension in which various real-time multiprocessor scheduling and synchronization policies are supported.
Alias Control Mechanism for Encapsulation, and Concurrent and Parallel Programming
Tobias Wrigstad, Uppsala university
This talk surveys the field of alias control, and its applicability to problems in concurrent and parallel programming. Reasoning about aliasing and the propagation of side-effects in a system is often key for determining atomicity, or a minimal lock set for safe parallel execution of statements that may cause overlapping effects. Starting with the soon 20-year old article The Geneva Convention On The Treatment of Object Aliasing, we will survey aliasing control mechanisms such as various flavours of object ownership, uniqueness, effect systems, and immutable and read-only pointers and exemplify how these have been used to avoid deadlocks and data races, reason about disjointness of effects, deterministic parallelism, etc. in object-oriented languages.
Memory Consistency Models and Amdahl's Law in the Multicore Era
Emerging multicore chips encourage multithreaded programming with shared memory. The behavior of these programs depends how their memory references interact, as defined by a /memory (consistency) model/. The computing world has not settled on a single memory model, because the models that promote fast implementations often make programmer reasoning more difficult and vice versa.
The first two-thirds of this tutorial seeks to aid programmer and hardware implementer understanding for the /whats/ and the /whys/ of memory consistency models. We will:
- review how hardware affects memory models via write buffers and cache coherence
- define sequential consistency (a simple model) and total store order (SPARC and ~x86)
- illustrate SC and TSO hardware implementation issues
- define and discuss more relaxed (weaker) models, such as PowerPC and ARM
- conclude with an overview of memory models for higher-level languages (Java and C++) that promise to provide SC to data-race-free programs
The last one-third of this tutorial includes material from our popular paper "Amdahl´s Law in the Multicore Era" [IEEE Computer, 7/2008]. In particular, we complement Amdahl´s software model with an equally-simple multicore hardware model to illuminate challenges that future hardware might present to programmers.