Efficient Synchronization for Nonuniform Communication Architectures
Zoran Radovic and Erik Hagersten
In Proceedings of Supercomputing 2002 (SC2002), Baltimore, Maryland, November 2002.
Scalable parallel computers are often nonuniform communication architectures (NUCAs), where the access time to other processor's caches vary with their physical location. Still, few attempts of exploring cache-to-cache communication locality have been made. This paper introduces a new kind of synchronization primitives (lock-unlock) that favor neighboring processors when a lock is released. This improves the lock handover time as well as access time to the shared data of the critical region. A critical section guarded by our new RH lock takes less than half the time to execute compared with the same critical section guarded by any other lock on our NUCA hardware. The execution time for Raytrace with 28 processors was improved 2.23-4.68 times, while global traffic was dramatically decreased compared with all the other locks. The average execution time was improved 7-24% while the global traffic was decreased 8-28% for an average over the seven applications studied.
Available as PDF (123 kB)
BibTeX file entry: Radovic:2002:nov