The advances in semiconductor technology have set the shared-memory server trend towards multiple cores per die and multiple threads per core. We believe that this technology shift forces a reevaluation of how to interconnect multiple such chips to form larger systems.
This paper argues that minimal processor support for coherence traps implemented in future chip multiprocessors will provide large-scale server systems at a much lower cost in terms of engineer years, verification and time to market when compared to its traditional all-hardware counter part. In our proposal, software trap handlers are responsible for getting read/write permission, whereas the coherence trap hardware is responsible for the actual permission check.
Detailed full-system simulation shows that a coherence-trap enabled distributed shared memory system can be performance competitive with its highly optimized hardware-only counter part. The evaluated systems use high-end processors with one or two dual-threaded cores per die as processing nodes.
Note: A revised version is available as Technical Report 2005-041
Available as PDF (319 kB, no cover)
Download BibTeX entry.