In this paper, we consider matrix-free finite-element techniques for efficient numerical solution of partial differential equations on modern manycore processors such as graphics cards. We present a GPU parallelization of a completely matrix-free geometric multigrid iterative solver, with support for general curved and adaptively refined meshes with hanging nodes. Comparing our implementation running on a Pascal P100 GPU to a highly optimized multi-core implementation running on comparable Broadwell CPUs, we demonstrate speedups of around a factor of 2x across three different Poisson-based applications and a variety of element degrees in 2D and 3D. We also show that atomic intrinsics is consistently the fastest way for shared-memory updates on the GPU, in contrast to previous architectures, and mixed-precision arithmetic can be used successfully, yielding a speedup of up to 83% over a full double precision approach.
Note: Updated 2017-04-20.
Available as PDF (2.4 MB, no cover)
Download BibTeX entry.