In computational science (including fields such as astrophysics, quantum chemistry, materials science, genetics) the application problems that are of scientific interest require parallel execution due to extreme volumes of data and computations. Scientific progress in these fields also requires a combination of increasingly powerful computer systems, more advanced models, and more sophisticated parallel algorithms. The universal shift to multicore based system design in theory offers increased performance, but parallel legacy codes typically execute at a fraction of peak performance and scale poorly, and it is becoming prohibitively costly in terms of man-hours to develop parallel software for real application problems that meet reasonable performance requirements.
The picture to the right shows the cluster Kalkyl at UPPMAX with 348 computing nodes, equipped with 8 cores each, as an example of a multicore based high performance computer system.
With the multicore revolution, also other types of performance critical applications where parallelization becomes important emerge. An example is computationally heavy algorithms such as encryption/decryption that are executed on battery-operated hand-held mobile devices. In this case, parallelization over the cores is needed to lower the energy consumption.
To tackle the parallel performance issue for multicores the basic philosophy of parallel programming needs to be adapted. Before, load balancing and minimizing communication were important. Now, the flow of data between the processor chip and the main memory is a major bottleneck and hence an area of opportunity for performance speedups, whereas cache-to-cache communication within a multicore is basically free. Furthermore, a core is inexpensive compared to a computer in a multicomputer system which make speculative computation more affordable. We will
An important complement to the algorithmic development is the efficient system level support provided by the UPMARC research direction Efficiency and Predictability.
To increase the programmer efficiency, we will develop models or tools to handle (complex) data dependencies, to maximize data locality, and to schedule tasks onto cores and accelerators. This is closely connected with the direction Ease of Programming in terms of providing better ways to express parallelism. We will
The figure shows the dependency graph for a simple 4x4 block LU-factorization.