Uppsala Architecture Research Team
The Uppsala Architecture Research Team is a multi-disciplinary research group that works on a broad range of challenges in computer architecture, including microarchitecture, memory systems, compilers, security, power efficiency, simulation and modeling, runtime optimizations, co-design, and distributed systems.
Professor Stefanos Kaxiras
(PhD Wisconsin) worked at Bell Labs before coming to Uppsala. His research interests include and memory consistency models, coherence, and microarchitecture with an emphasis on security and (reducing) speculation.
Professor David Black-Schaffer
(PhD Stanford) worked at Apple before coming to Uppsala. His research interests include runtime scheduling and memory system design.
Assistant Professor Yuan Yao
(PhD Royal Institute of Technology, Stockholm) has research interests in Network on Chip (NoC) and Non-Von-Neumann architectures.
Assistant Professor Chang Hyun Park
(PhD KAIST) conducts research on the virtual memory system on both the architecture and systems side.
Professor (Emeritus) Erik Hagersten
(PhD Royal Institute of Technology, Stockholm) was the chief server architect at Sun Microsystems before coming to Uppsala. His research interests include efficient memory system designs and modeling.
Postdocs and Visiting Researchers
Challenge: Making general purpose processors more efficient.
Results: Offloading instructions to simpler schedulers to reduce scheduling cost (ICCD2018, HPCA2019, DATE2019, HPCA2020); caching in the pipeline (ISCA2019).
Security and Speculation
Challenge: Building processors that are secure by design; Reducing our reliance on speculation without losing its performance advantages.
Results: Understanding speculative shadows to reduce the impact of reduced speculation (ISCA2019); hiding speculative effects (CF2019), Non-Speculative techniques to reorder memory accesses (ISCA2017, IEEE Micro Top Picks 2018, ISCA2018, MICRO2018); Compiler orchestrated software-out-of-order execution on in-order cores (PACT2016 SRC-Bronze medal, CGO2017, PLDI2018, Best of CAL 2017, TransOnComputers2018 - Featured article of the month); Limited speculation cores (ISCA2015).
Compiling for Power Efficiency
Challenge: Co-designing the hardware and compiler to maximize efficiency.
Results: Decoupling access and execute to improve DVFS (ICS2013, CGO2014, CC2016 Best Paper, HIP3ES2016, HIP3ES2017);
Smart Memory Systems
Challenge: Understanding where and when data is needed to reduce the energy consumed in moving it and the time wasted waiting for it.
Results: Direct-to-data cache designs that avoid searches (MICRO2013, ISCA2014, MICRO2015, HPCA2018); intelligent policies for placing data based on reuse for CPUs (ICCD2016, SBAC-PAD2017, ICS2019) and GPUs (IISCW2017).
Challenge: Matching the heterogeneous behavior of tasks and applications to heterogeneous hardware for performance.
Results: CPU and GPU task analysis and modeling (JParallelComputing2018, ISPASS2018); GPU co-execution (SBAC-PAD),
Challenge: Create novel coherence protocols to enable highly-efficient multi/many-core systems and software shared memory implementation.
Results: Application driven, highly-efficient, VIPS family of protocols (PACT2012, ISCA2013, ISCA2015, HPCA2015); ArgoDSM distributed shared memory system (HPDC2015); Racer TSO: data-race-detection coherence, transparent to software (MICRO2016, IEEE Micro Top Picks 2017 honorable mention); compiler-assisted cache coherence (IPDPS2015, TPDS2016, CGO2017, CCPR2017, TPDS2018).
Challenge: Using low-overhead profile information to quickly model memory system behavior and performance.
Results: Architecturally independent performance models for memory systems (CGO2012, IISWC2012) and performance (ISPASS2015) and resource-sharing performance profiling (CGO2013, PACT2012).
Software Optimization for Memory Systems
Challenge: Automatic software-based cache bypassing and prefetching without hurting co-execution on multicores.
Results: Adaptive software bypassing (HPCA2013) and prefetching (PACT2015).
Eta Scale AB works to commercialize memory coherence technology for both efficient scalable hardware implementations and software distributed shared memory. (Active)
Green Cache AB took the Direct-to-Data memory system technology and worked with clients to investigate the energy-savings potential in their future mobile SoCs. (IP purchased)
Acumem AB developed the StatCache statistical memory modeling technology into the ThreadSpotter turn-key tool to help developers identify and fix memory system related issues in their software. (Sold to Rouge Wave)
Alumni (and first job)
Christos Sakalis (PhD 2021, IAR, Sweden)
Mehdi Alipour (PhD 2020, Ericsson, Sweden)
Kim-Anh Tran (PhD 2020, Google, Germany)
Ricardo Alves (PhD 2019, Intel, USA)
Nikos Nikoleris (PhD 2019, ARM, UK)
Germán Ceballos (PhD 2018, Ericsson, Sweden)
Magnus Norgren (Swedish Patent Office)
Andreas Sembrant (PhD 2017, Nvidia, USA)
Mahdad Davari (PhD 2017, Ericsson, Sweden)
Muneeb Khan (PhD 2016, Ericsson, Sweden)
Moncef Mechri (IMC, Netherlands)
Vasileios Spiliopoulos (ZeroPoint, Sweden)
Konstantinos Koukos (PhD 2016, KTH, Sweden)
Andreas Sandberg (PhD 2014, ARM, UK)
David Eklöv (PhD 2011, Samsung, USA)
Håkan Zeffer (PhD 2006, Sun Microsystems, USA)
Henrik Löf (PhD 2006, Stanford University, USA)
Erik Berg (PhD 2005, Xelerated, Sweden)
Martin Karlsson (PhD 2006, Sun Microsystems, USA)
Dan Wallin (PhD 2006, Virtutech, Sweden)
Zoran Radovic (PhD 2005, Sun Microsystems, USA)
Dr. Mihail Popov (Huawei, UK)
Professor Rakesh Kumar (NTNU, Norway)
Dr. Gregory Vaumourin (Atos, France)
Dr. Andra Hugo (DDN Storage, France)
Professor Trevor Carlson (NUS, Sinagpore)
Professor Magnus Själander (NTNU, Norway)
Professor Alberto Ros (University of Murcia, Spain)
Dr. Nina Shariati (Uppsala University, Sweden)
- Analysing software prefetching opportunities in hardware transactional memory. In Journal of Supercomputing, volume 78, number 1, pp 919-944, Springer Nature, 2022. (DOI).
- Dependence-aware Slice Execution to Boost MLP in Slice-out-of-order Cores. In ACM Transactions on Architecture and Code Optimization (TACO), volume 19, number 2, ASSOC COMPUTING MACHINERY, 2022. (DOI).
- Every Walk's a Hit: Making Page Walks Single-Access Cache Hits. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’22), February 28 – March 4, 2022, Lausanne, Switzerland, Association for Computing Machinery (ACM), 2022. (DOI, Fulltext, fulltext:postprint, fulltext:print).
- Faster Functional Warming with Cache Merging. 2022. (fulltext).
- Free Atomics: Hardware Atomic Operations without Fences. In PROCEEDINGS OF THE 2022 THE 49TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA '22), Conference Proceedings Annual International Symposium on Computer Architecture, pp 14-26, ASSOC COMPUTING MACHINERY, 2022. (DOI).
- Supporting Dynamic Translation Granularity for Hybrid Memory Systems. In The 40th IEEE International Conference on Computer Design (ICCD), Lake Tahoe, USA, October 23-26, 2022., IEEE, USA, 2022.
- A Reusable Characterization of the Memory System Behavior of SPEC2017 and SPEC2006. In ACM Transactions on Architecture and Code Optimization (TACO), volume 18, number 2, ASSOC COMPUTING MACHINERY, 2021. (DOI).
- Do Not Predict – Recompute!: How Value Recomputation Can Truly Boost the Performance of Invisible Speculation. In 2021 International Symposium on Secure and Private Execution Environment Design (SEED), pp 89-100, Institute of Electrical and Electronics Engineers (IEEE), 2021. (DOI).
- Early Address Prediction: Efficient Pipeline Prefetch and Reuse. In ACM Transactions on Architecture and Code Optimization (TACO), volume 18, number 3, ASSOC COMPUTING MACHINERY, 2021. (DOI, Fulltext, fulltext:print).
- Reorder Buffer Contention: A Forward Speculative Interference Attack for Speculation Invariant Instructions. In IEEE COMPUTER ARCHITECTURE LETTERS, volume 20, number 2, pp 162-165, Institute of Electrical and Electronics Engineers (IEEE), 2021. (DOI).
- Seeds of SEED: Preventing Priority Inversion in Instruction Scheduling to Disrupt Speculative Interference. In 2021 International Symposium on Secure and Private Execution Environment Design (SEED), pp 101-107, Institute of Electrical and Electronics Engineers (IEEE), 2021. (DOI).
- Splash-4: Improving Scalability with Lock-Free Constructs. In 2021 IEEE International Symposium On Performance Analysis Of Systems And Software (ISPASS 2021), pp 235-236, Institute of Electrical and Electronics Engineers (IEEE), 2021. (DOI).
- TSOPER: Efficient Coherence-Based Strict Persistency. In 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), International Symposium on High-Performance Computer Architecture : Proceedings, pp 125-138, Institute of Electrical and Electronics Engineers (IEEE), 2021. (DOI).
- Architecturally-independent and time-based characterization of SPEC CPU 2017. In 2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)Raw-Data: A Reusable Characterization Of The Memory System behavior Of SPEC 2017 And SPEC 2006, IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp 107-109, 2020. (DOI, fulltext:postprint, fulltext:preprint).
- Boosting Store Buffer Efficiency with Store-Prefetch Bursts. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp 568-580, Institute of Electrical and Electronics Engineers (IEEE), 2020. (DOI, Fulltext, fulltext:print).
- Clearing the Shadows: Recovering Lost Performance for Invisible Speculative Execution through HW/SW Co-Design. In PACT ’20: Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques, International Conference on Parallel Architectures and Compilation Techniques, pp 241-254, Association for Computing Machinery (ACM), 2020. (DOI, External link).
- Decoupled Address Translation for Heterogeneous Memory Systems. In PACT '20: PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, International Conference on Parallel Architectures and Compilation Techniques, pp 155-156, ASSOC COMPUTING MACHINERY, 2020. (DOI, Fulltext).
- Delay and Bypass: Ready and Criticality Aware Instruction Scheduling in Out-of-Order Processors. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), International Symposium on High-Performance Computer Architecture-Proceedings, pp 424-434, 2020. (DOI).
- Efficient temporal and spatial load to load forwarding. In Proc. 26th International Symposium on High-Performance and Computer Architecture, IEEE Computer Society, 2020.
- Modeling and Optimizing NUMA Effects and Prefetching with Machine Learning. In ICS '20: Proceedings of the 34th ACM International Conference on Supercomputing, 2020. (DOI, Fulltext, fulltext:postprint).
- Perforated Page: Supporting Fragmented Memory Allocation for Large Pages. In Proceedings of the 47th Annual ACM/IEEE International Symposium on Computer Architecture (ISCA), pp 913-925, 2020. (DOI, fulltext:postprint).
- RVSDG: An Intermediate Representation for Optimizing Compilers. In ACM Transactions on Embedded Computing Systems, volume 19, number 6, 2020. (DOI).
- Raw-Data: A Reusable Characterization Of The Memory System Behavior Of SPEC 2017 And SPEC 2006. 2020. (data set).
- Reconciling Time Slice Conflicts of Virtual Machines With Dual Time Slice for Clouds. In IEEE Transactions on Parallel and Distributed Systems, volume 31, number 10, pp 2453-2465, 2020. (DOI).
- Speculative Enforcement of Store Atomicity. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp 555-567, Institute of Electrical and Electronics Engineers (IEEE), 2020. (DOI, Fulltext, fulltext:postprint).
- Twig: Multi-Agent Task Management for Colocated Latency-Critical Cloud Services. In , IEEE, 2020. (DOI).
- Understanding Selective Delay as a Method for Efficient Secure Speculative Execution. In I.E.E.E. transactions on computers (Print), volume 69, number 11, pp 1584-1595, 2020. (DOI).
Full UART publications list.
Teaching and Recruiting
The Uppsala Architecture Research Team was founded in 1999 when Professor Erik Hagersten (PhD from the Royal Institute of Technology) moved back to Sweden from his position as chief server architect at Sun Microsystems. For the first 10 years UART did pioneering work in statistical cache modeling, leading to a successful commercialization of the technology. Professor Stefanos Kaxiras (PhD from Wisconsin) joined the group in 2010, moving from the University of Patras in Greece and bringing extensive experience in power efficiency and coherency. Professor David Black-Schaffer (PhD from Stanford) also joined in 2010, bringing heterogeneous runtime experience from his work on OpenCL at Apple. Professors Hagersten, Black-Schaffer, and Kaxiras, together with PhD student Andreas Sembrant, successfully commercialized their work in direct-to-data memory systems in the company Green Cache AB, whose IP was purchased in 2018. Associate Professor Alexandra Jimborean (PhD from University of Strasbourg ) joined in 2012, bringing experience in compile-time and run-time code analysis and optimization. Since then the group has grown to include multiple PhD students and postdocs.