vLock: Lock Virtualization Mechanism for Exploiting Fine-grained Parallelism in Graph Traversal Algorithms
Description
For graph traversal applications, fine synchronization is required to exploit massive fine parallelism. However, in the conventional solution using fine-grained locks, locks themselves suffer huge memory cost as well as poor locality for inherent irregular access to vertices. In this paper, we propose a novel fine lock solution--vLock. The key idea is lock virtualization that maps the huge logical lock space to a much smaller physical lock space that can reside in cache during the program life cycle. Lock virtualization effectively reduces lock incurred overheads of both memory cost and cache misses. It also achieves high usability in legacy graph programs, as from users's view vLock is the same as lock methods in Pthreads. We implement vLock as a Pthreads-like library and evaluate its performance in four classical graph algorithms (BFS,SSSP,CC,PageRank). Experiments on a SMP system with two Intel Westemere six-core processors show that, compared to conventional fine locks, vLock significantly reduces locks' cache misses and has competitive performance. Particularly, PageRank with vLock has about 20% performance improvement. |
Questions
Question 1:
Why is there a low amount of lock conflicts, using any kind of fine grained locking for concurrent graph traversal?
Question 2:
Why is the performance of vLock-Add so bad for PageRank?
Stavros
Q1: Q2: |
Kjell
Q1: Q2: |
Jonatan
Q1: The computation time is small, resulting in short critical sections, decreasing the probability of conflicts. At the same time there is a large amount of nodes. Q2: "The lost performance of vLock-Add is due to its high cost of hash computation [...]". |
Sofia
Q1: Q2: |
Martin
Q1: Many nodes, few threads. Q2: The hash function is expensive. |
Yunyun
Q1: Fine-grained locks associate with each vertex in a graph. Critical section, which can cause lock conflicts, are operations on vertices and edges. With too many vertices in sparse graphs and too few threads, conflicts can seldom happen. Q2: The hash computation cost in vLock-Add is 70-80 times higher than that in vLock-Vtx, and the PageRank algorithm requires locks much more frequent than others. |
Name
Q1: Q2: |