Department of Information Technology

Parallel Algorithms for Multicore Processors

Motivation and Background

We are looking at various examples of applications of Scientific Computing where we are taking advantage of today's multicore and manycore computers. This currently involves differential equations, N-body problems, event-based models, and inverse formulations. Challenges in this area include balancing memory accesses and computations, understanding synchronization, finding optimal granularity, and balancing cost and accuracy. Users of these results are often other researchers trying to design or comprehend complex physical systems such as air-flow around wind power plants, flow in micro-fluidic systems, chemical reactions, or combustion in engines.

Sub-projects

  • N-body problems involve the dynamics of a population of N particles in a potential field. Every particle interacts with every other particle via the field. Notable examples are charged particles in an electrical field and massive objects in a gravitational field. The challenge lies in reducing the cost of computing the forces between N particles, which scales with N^2. In many cases the force between particles decays with the distance between them, allowing distant interactions to be neglected. The Fast Multipole Method (FMM) operates on this principle and is capable of reducing the effort to O(N log N) or O(N). We are working on a parallel adaptive implementation in three dimensions with the goal of achieving high efficiency at very large N.

Fluid flowing around a rotating cylinder using FMM2D

  • The Finite Element Method (FEM) is an important numerical method used for simulation in application fields such as structural mechanics and fluid dynamics. However, classical matrix-based FEM algorithms access too much data per arithmetic operation to fully utilize modern many- and multicore processors, especially when solving problems in 3D with elements of high order. We instead use a matrix-free approach, which has a significantly smaller memory-access footprint. Additional advantages are that no matrix assembly is needed, and that larger problems can be solved since no matrix needs to be stored. In our research, we develop Deal.II, an open-source C++ library for finite-element computations which is widely used by researchers in application fields such as geoscience.
  • Event-based models are used to model discrete models, that are altered by instantaneous changes in time, so-called events. Due to the possibility to incorporate stochastic dynamics as for example Markov Chains, such models have gained popularity in fields as Computational Systems Biology or Computational Epidemiology. We are studying the parallelizaiton of event-based models for multi-socket processors, that are typically equipped with 32 or more cores. Due to the mathematical properties of the underlying models, the parallelization of discrete event simulations is not trivial and requires a deep understanding of the model sychronization behaviour.

URDME simulating oscillating proteins in E.Coli bacteria.

  • The flash X-ray single particle diffraction Imaging (FXI) is a cutting-edge technology that solves structures of small biological objects without using crystals. Other than current X-ray beam limitation, recontructing a 3D structure is challenging task due to the large mount of diffraction patterns gernerated from FXI experiments and the compute-intensive reconstruction algorithm. Hence, selecting proper diffraction patterns and reconstructe models efficiently and automatically are important, and our current aims.

 A cutaway view of a reconstruction from mimivirus diffraction patterns.

Publications

Refereed

Efficient inter-process synchronization for parallel discrete event simulation on multicores. Pavol Bauer, Jonatan Lindén, Stefan Engblom, and Bengt Jonsson. In Proc. 3rd ACM SIGSIM Conference on Principles of Advanced Discrete Simulation, pp 183-194, ACM Press, New York, 2015. (DOI).
Machine learning for ultrafast X-ray diffraction patterns on large-scale GPU clusters. Tomas Ekeberg, Stefan Engblom, and Jing Liu. In The international journal of high performance computing applications, volume 29, pp 233-243, 2015. (DOI, fulltext).
Stable difference methods for block-oriented adaptive grids. Anna Nissen, Katharina Kormann, Magnus Grandin, and Kristoffer Virta. In Journal of Scientific Computing, volume 65, pp 486-511, 2015. (DOI).
Sensitivity estimation and inverse problems in spatial stochastic models of chemical kinetics. Pavol Bauer and Stefan Engblom. In Numerical Mathematics and Advanced Applications: ENUMATH 2013, volume 103 of Lecture Notes in Computational Science and Engineering, pp 519-527, Springer, 2015. (DOI).
Matrix-free finite-element operator application on graphics processing units. Karl Ljungkvist. In Euro-Par 2014: Parallel Processing Workshops, Part II, volume 8806 of Lecture Notes in Computer Science, pp 450-461, Springer, 2014. (DOI).
On the stability of stochastic jump kinetics. Stefan Engblom. In Applied Mathematics, volume 5, pp 3217-3239, 2014. (DOI).
X-ray laser imaging of biomolecules using multiple GPUs. Stefan Engblom and Jing Liu. In Parallel Processing and Applied Mathematics: Part I, volume 8384 of Lecture Notes in Computer Science, pp 480-489, Springer-Verlag, Berlin, 2014. (DOI).
On the impact of the heterogeneous multicore and many-core platforms on iterative solution methods and preconditioning techniques. Dimitar Lukarski and Maya Neytcheva. In High-Performance Computing on Complex Environments, pp 13-32, Wiley-Blackwell, Hoboken, NJ, 2014. (DOI).
Accelerating COBAYA3 on multi-core CPU and GPU systems using PARALUTION. Nico Trost, Javier Jiménez, Dimitar Lukarski, and Victor Sanchez. In Proc. 2nd Joint International Conference on Supercomputing in Nuclear Applications and Monte Carlo, La Société Française d'Energie Nucléaire, Paris, France, 2013.
Accurate surface embedding for higher order finite elements. Stefan Suwelack, Dimitar Lukarski, Vincent Heuveline, Rüdiger Dillmann, and Stefanie Speidel. In Proc. 12th ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pp 187-192, ACM Press, New York, 2013. (DOI).
Adaptive fast multipole methods on the GPU. Anders Goude and Stefan Engblom. In Journal of Supercomputing, volume 63, pp 897-918, 2013. (DOI).
Communication-efficient algorithms for numerical quantum dynamics. Magnus Gustafsson, Katharina Kormann, and Sverker Holmgren. In Applied Parallel and Scientific Computing: Part II, volume 7134 of Lecture Notes in Computer Science, pp 368-378, Springer-Verlag, Berlin, 2012. (DOI).
Using hardware transactional memory for high-performance computing. Karl Ljungkvist, Martin Tillenius, David Black-Schaffer, Sverker Holmgren, Martin Karlsson, and Elisabeth Larsson. In Proc. 25th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum, pp 1660-1667, IEEE, Piscataway, NJ, 2011. (DOI).
On well-separated sets and fast multipole methods. Stefan Engblom. In Applied Numerical Mathematics, volume 61, pp 1096-1102, 2011. (DOI).
An implementation framework for solving high-dimensional PDEs on massively parallel computers. Magnus Gustafsson and Sverker Holmgren. In Numerical Mathematics and Advanced Applications: 2009, pp 417-424, Springer-Verlag, Berlin, 2010. (DOI).

Theses

Parallelism and efficiency in discrete-event simulation. Pavol Bauer. Licentiate thesis, IT licentiate theses / Uppsala University, Department of Information Technology nr 2015-004, Uppsala universitet, 2015. (fulltext).
Techniques for finite element methods on modern processors. Karl Ljungkvist. Licentiate thesis, IT licentiate theses / Uppsala University, Department of Information Technology nr 2015-001, Uppsala universitet, 2015. (fulltext).
Adaptive Solvers for High-Dimensional PDE Problems on Clusters of Multicore Processors. Magnus Grandin. Ph.D. thesis, Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology nr 1199, Acta Universitatis Upsaliensis, Uppsala, 2014. (fulltext, preview image).
Scientific computing on hybrid architectures. Marcus Holm. Licentiate thesis, IT licentiate theses / Uppsala University, Department of Information Technology nr 2013-002, Uppsala universitet, 2013. (fulltext).
Towards an adaptive solver for high-dimensional PDE problems on clusters of multicore processors. Magnus Gustafsson. Licentiate thesis, IT licentiate theses / Uppsala University, Department of Information Technology nr 2012-003, Uppsala universitet, 2012. (fulltext).

Other

Parallel data structures and algorithms for high-dimensional structured adaptive mesh refinement. Magnus Grandin and Sverker Holmgren. Technical report / Department of Information Technology, Uppsala University nr 2014-020, 2014. (External link).
Data structures and algorithms for high-dimensional structured adaptive mesh refinement. Magnus Grandin. Technical report / Department of Information Technology, Uppsala University nr 2014-019, 2014. (External link).
GPU-Parallel simulation of rigid fibers in Stokes flow. Ronny Eriksson. Student thesis, supervisor: Stefan Engblom, examiner: Jarmo Rantakokko, Olle Gällmo, IT nr 14 029, 2014. (fulltext).
Numerical evaluation of the Communication-Avoiding Lanczos algorithm. Magnus Gustafsson, James Demmel, and Sverker Holmgren. Technical report / Department of Information Technology, Uppsala University nr 2012-001, 2012. (External link).
Parallelization and performance in simulation of disease spread by animal transfer. Fredrik Pasanen and Magnus Söderling. Student thesis, supervisor: Stefan Engblom, examiner: Jarmo Rantakokko, Olle Gällmo, IT nr 12 002, 2012. (fulltext).
Efficient implementation of a high-dimensional PDE-solver on multicore processors. Magnus Gustafsson and Sverker Holmgren. In Proc. 2nd Swedish Workshop on Multi-Core Computing, pp 64-66, Department of Information Technology, Uppsala University, Uppsala, Sweden, 2009.

Software

  • URDME: a general software framework for modeling and simulation of stochastic reaction-diffusion processes on unstructured, tetrahedral and triangular meshes.
  • SimInf: a flexible and efficient framework for stochastic disease spread simulations in animal populations.
  • FMM2D: a two-dimensional adaptive fast multipole method with a Matlab mex-interface. Currently available in a serial C99-version and a fully ported CUDA/C++-version. Download here.
  • FMM3D: a three-dimensional adaptive fast multipole method with a Matlab mex-interface. Currently under development.
  • deal.II: An open-source C++ library for finite-element computations.

internal project page

Updated  2016-10-28 09:18:15 by Jing Liu.