Actual TSO implementations (x86-TSO, SPARC) relax strict store atomicity by allowing a core to see its own stores while they are in limbo, i.e., executed (and perhaps retired) but not yet inserted in the global memory order. This can break the TSO ordering rules, specifically the load-load order, in unexpected and unpredictable ways. Furthermore, we show that similar effects can be observed in memory models weaker than TSO. Such behaviors seriously compromise the soundness of the memory model. The store-atomicity dilemma that designers face is: clean semantics and a sound model or performance? As of yet, enforcing strict store atomicity carries a steep performance penalty. The only known solutions to guarantee store atomicity impose a blanket enforcement even when a violation of store atomicity would not matter. We make a simple observation. What holds for any other rule in a consistency model, also holds for strict store atomicity: it is not a crime to break the rule, unless we get caught. In this work, we detail the different ways of how a store atomicity violation can be detected via its effect: the breaking of the load-load ordering rule. We then describe an effective and cheap approach to dynamically enforce store atomicity only when the detection of its violation actually occurs. In practice, these cases are rare during the execution of a program. In all other cases (the bulk of the execution of a program) store atomicity can be freely violated without anyone taking notice. The end result is that we provide (the illusion of) clean semantics and a sound store-atomic memory model but with the performance and cost of a non-store-atomic model.
Note: To appear.
Download BibTeX entry.