Parallel Algorithms for Multicore Processors
Motivation and Background
We are looking at various examples of applications of Scientific Computing where we are taking advantage of today's multicore and manycore computers. This currently involves differential equations, N-body problems, event-based models, and inverse formulations. Challenges in this area include balancing memory accesses and computations, understanding synchronization, finding optimal granularity, and balancing cost and accuracy. Users of these results are often other researchers trying to design or comprehend complex physical systems such as air-flow around wind power plants, flow in micro-fluidic systems, chemical reactions, or combustion in engines.
- N-body problems involve the dynamics of a population of N particles in a potential field. Every particle interacts with every other particle via the field. Notable examples are charged particles in an electrical field and massive objects in a gravitational field. The challenge lies in reducing the cost of computing the forces between N particles, which scales with N^2. In many cases the force between particles decays with the distance between them, allowing distant interactions to be neglected. The Fast Multipole Method (FMM) operates on this principle and is capable of reducing the effort to O(N log N) or O(N). We are working on a parallel adaptive implementation in three dimensions with the goal of achieving high efficiency at very large N.
- The Finite Element Method (FEM) is an important numerical method used for simulation in application fields such as structural mechanics and fluid dynamics. However, classical matrix-based FEM algorithms access too much data per arithmetic operation to fully utilize modern many- and multicore processors, especially when solving problems in 3D with elements of high order. We instead use a matrix-free approach, which has a significantly smaller memory-access footprint. Additional advantages are that no matrix assembly is needed, and that larger problems can be solved since no matrix needs to be stored. In our research, we develop Deal.II, an open-source C++ library for finite-element computations which is widely used by researchers in application fields such as geoscience.
- Event-based models are used to model discrete models, that are altered by instantaneous changes in time, so-called events. Due to the possibility to incorporate stochastic dynamics as for example Markov Chains, such models have gained popularity in fields as Computational Systems Biology or Computational Epidemiology. We are studying the parallelizaiton of event-based models for multi-socket processors, that are typically equipped with 32 or more cores. Due to the mathematical properties of the underlying models, the parallelization of discrete event simulations is not trivial and requires a deep understanding of the model sychronization behaviour.
- The flash X-ray single particle diffraction Imaging (FXI) is a cutting-edge technology that solves structures of small biological objects without using crystals. Other than current X-ray beam limitation, recontructing a 3D structure is challenging task due to the large mount of diffraction patterns gernerated from FXI experiments and the compute-intensive reconstruction algorithm. Hence, selecting proper diffraction patterns and reconstructe models efficiently and automatically are important, and our current aims.
- Efficient inter-process synchronization for parallel discrete event simulation on multicores. In Proc. 3rd ACM SIGSIM Conference on Principles of Advanced Discrete Simulation, pp 183-194, ACM Press, New York, 2015. (DOI).Machine learning for ultrafast X-ray diffraction patterns on large-scale GPU clusters. In The international journal of high performance computing applications, volume 29, pp 233-243, 2015. (DOI, fulltext).Stable difference methods for block-oriented adaptive grids. In Journal of Scientific Computing, volume 65, pp 486-511, 2015. (DOI).Sensitivity estimation and inverse problems in spatial stochastic models of chemical kinetics. In Numerical Mathematics and Advanced Applications: ENUMATH 2013, volume 103 of Lecture Notes in Computational Science and Engineering, pp 519-527, Springer, 2015. (DOI).Matrix-free finite-element operator application on graphics processing units. In Euro-Par 2014: Parallel Processing Workshops, Part II, volume 8806 of Lecture Notes in Computer Science, pp 450-461, Springer, 2014. (DOI).On the stability of stochastic jump kinetics. In Applied Mathematics, volume 5, pp 3217-3239, 2014. (DOI).Dynamic autotuning of adaptive fast multipole methods on hybrid multicore CPU and GPU systems. In SIAM Journal on Scientific Computing, volume 36, pp C376-C399, 2014. (DOI).X-ray laser imaging of biomolecules using multiple GPUs. In Parallel Processing and Applied Mathematics: Part I, volume 8384 of Lecture Notes in Computer Science, pp 480-489, Springer-Verlag, Berlin, 2014. (DOI).On the impact of the heterogeneous multicore and many-core platforms on iterative solution methods and preconditioning techniques. In High-Performance Computing on Complex Environments, pp 13-32, Wiley-Blackwell, Hoboken, NJ, 2014. (DOI).Accelerating COBAYA3 on multi-core CPU and GPU systems using PARALUTION. In Proc. 2nd Joint International Conference on Supercomputing in Nuclear Applications and Monte Carlo, La Société Française d'Energie Nucléaire, Paris, France, 2013.Accurate surface embedding for higher order finite elements. In Proc. 12th ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pp 187-192, ACM Press, New York, 2013. (DOI).Adaptive fast multipole methods on the GPU. In Journal of Supercomputing, volume 63, pp 897-918, 2013. (DOI).URDME: a modular framework for stochastic simulation of reaction-transport processes in complex geometries. In BMC Systems Biology, volume 6, pp 76:1-17, 2012. (DOI).Communication-efficient algorithms for numerical quantum dynamics. In Applied Parallel and Scientific Computing: Part II, volume 7134 of Lecture Notes in Computer Science, pp 368-378, Springer-Verlag, Berlin, 2012. (DOI).Using hardware transactional memory for high-performance computing. In Proc. 25th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum, pp 1660-1667, IEEE, Piscataway, NJ, 2011. (DOI).On well-separated sets and fast multipole methods. In Applied Numerical Mathematics, volume 61, pp 1096-1102, 2011. (DOI).An implementation framework for solving high-dimensional PDEs on massively parallel computers. In Numerical Mathematics and Advanced Applications: 2009, pp 417-424, Springer-Verlag, Berlin, 2010. (DOI).
- Parallelism and efficiency in discrete-event simulation. Licentiate thesis, IT licentiate theses / Uppsala University, Department of Information Technology nr 2015-004, Uppsala universitet, 2015. (fulltext).Techniques for finite element methods on modern processors. Licentiate thesis, IT licentiate theses / Uppsala University, Department of Information Technology nr 2015-001, Uppsala universitet, 2015. (fulltext).Adaptive Solvers for High-Dimensional PDE Problems on Clusters of Multicore Processors. Ph.D. thesis, Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology nr 1199, Acta Universitatis Upsaliensis, Uppsala, 2014. (fulltext, preview image).Scientific computing on hybrid architectures. Licentiate thesis, IT licentiate theses / Uppsala University, Department of Information Technology nr 2013-002, Uppsala universitet, 2013. (fulltext).Towards an adaptive solver for high-dimensional PDE problems on clusters of multicore processors. Licentiate thesis, IT licentiate theses / Uppsala University, Department of Information Technology nr 2012-003, Uppsala universitet, 2012. (fulltext).
- Fast Matlab compatible sparse assembly on multicore computers. In Parallel Computing, volume 56, pp 1-17, 2016. (DOI).Parallel data structures and algorithms for high-dimensional structured adaptive mesh refinement. Technical report / Department of Information Technology, Uppsala University nr 2014-020, 2014. (External link).Data structures and algorithms for high-dimensional structured adaptive mesh refinement. Technical report / Department of Information Technology, Uppsala University nr 2014-019, 2014. (External link).GPU-Parallel simulation of rigid fibers in Stokes flow. Student thesis, supervisor: Stefan Engblom, examiner: Jarmo Rantakokko, Olle Gällmo, IT nr 14 029, 2014. (fulltext).Numerical evaluation of the Communication-Avoiding Lanczos algorithm. Technical report / Department of Information Technology, Uppsala University nr 2012-001, 2012. (External link).Parallelization and performance in simulation of disease spread by animal transfer. Student thesis, supervisor: Stefan Engblom, examiner: Jarmo Rantakokko, Olle Gällmo, IT nr 12 002, 2012. (fulltext).Efficient implementation of a high-dimensional PDE-solver on multicore processors. In Proc. 2nd Swedish Workshop on Multi-Core Computing, pp 64-66, Department of Information Technology, Uppsala University, Uppsala, Sweden, 2009.
- URDME: a general software framework for modeling and simulation of stochastic reaction-diffusion processes on unstructured, tetrahedral and triangular meshes.
- SimInf: a flexible and efficient framework for stochastic disease spread simulations in animal populations.
- FMM2D: a two-dimensional adaptive fast multipole method with a Matlab mex-interface. Currently available in a serial C99-version and a fully ported CUDA/C++-version. Download here.
- FMM3D: a three-dimensional adaptive fast multipole method with a Matlab mex-interface. Currently under development.
- deal.II: An open-source C++ library for finite-element computations.