Uppsala Architecture Research Team
The Uppsala Architecture Research Team is a multi-disciplinary research group that works on a broad range of challenges in computer architecture, including microarchitecture, memory systems, compilers, security, power efficiency, simulation and modeling, runtime optimizations, co-design, and distributed systems.
Professor Erik Hagersten
(PhD Royal Institute of Technology, Stockholm) was the chief server architect at Sun Microsystems before coming to Uppsala. His research interests include efficient memory system designs and modeling.
Professor Stefanos Kaxiras
(PhD Wisconsin) worked at Bell Labs before coming to Uppsala. His research interests include and memory consistency models, coherence, and microarchitecture with an emphasis on security and (reducing) speculation.
Professor David Black-Schaffer
(PhD Stanford) worked at Apple before coming to Uppsala. His research interests include runtime scheduling and memory system design.
Associate Professor Alexandra Jimborean
(PhD Strasbourg) conducts research on compiler optimizations for efficiency and HW/SW co-design.
Assistant Professor Yuan Yao
(PhD Royal Institute of Technology, Stockholm) has research interests in Network on Chip (NoC) and Non-Von-Neumann architectures.
Assistant Professor Chang Hyun Park
(PhD KAIST) conducts research on the virtual memory system on both the architecture and systems side.
Postdocs and Visiting Researchers
Associate Professor Magnus Själander
Challenge: Making general purpose processors more efficient.
Results: Offloading instructions to simpler schedulers to reduce scheduling cost (ICCD2018, HPCA2019, DATE2019, HPCA2020); caching in the pipeline (ISCA2019).
Security and Speculation
Challenge: Building processors that are secure by design; Reducing our reliance on speculation without losing its performance advantages.
Results: Understanding speculative shadows to reduce the impact of reduced speculation (ISCA2019); hiding speculative effects (CF2019), Non-Speculative techniques to reorder memory accesses (ISCA2017, IEEE Micro Top Picks 2018, ISCA2018, MICRO2018); Compiler orchestrated software-out-of-order execution on in-order cores (PACT2016 SRC-Bronze medal, CGO2017, PLDI2018, Best of CAL 2017, TransOnComputers2018 - Featured article of the month); Limited speculation cores (ISCA2015).
Compiling for Power Efficiency
Challenge: Co-designing the hardware and compiler to maximize efficiency.
Results: Decoupling access and execute to improve DVFS (ICS2013, CGO2014, CC2016 Best Paper, HIP3ES2016, HIP3ES2017);
Smart Memory Systems
Challenge: Understanding where and when data is needed to reduce the energy consumed in moving it and the time wasted waiting for it.
Results: Direct-to-data cache designs that avoid searches (MICRO2013, ISCA2014, MICRO2015, HPCA2018); intelligent policies for placing data based on reuse for CPUs (ICCD2016, SBAC-PAD2017, ICS2019) and GPUs (IISCW2017).
Challenge: Matching the heterogeneous behavior of tasks and applications to heterogeneous hardware for performance.
Results: CPU and GPU task analysis and modeling (JParallelComputing2018, ISPASS2018); GPU co-execution (SBAC-PAD),
Challenge: Create novel coherence protocols to enable highly-efficient multi/many-core systems and software shared memory implementation.
Results: Application driven, highly-efficient, VIPS family of protocols (PACT2012, ISCA2013, ISCA2015, HPCA2015); ArgoDSM distributed shared memory system (HPDC2015); Racer TSO: data-race-detection coherence, transparent to software (MICRO2016, IEEE Micro Top Picks 2017 honorable mention); compiler-assisted cache coherence (IPDPS2015, TPDS2016, CGO2017, CCPR2017, TPDS2018).
Challenge: Using low-overhead profile information to quickly model memory system behavior and performance.
Results: Architecturally independent performance models for memory systems (CGO2012, IISWC2012) and performance (ISPASS2015) and resource-sharing performance profiling (CGO2013, PACT2012).
Software Optimization for Memory Systems
Challenge: Automatic software-based cache bypassing and prefetching without hurting co-execution on multicores.
Results: Adaptive software bypassing (HPCA2013) and prefetching (PACT2015).
Eta Scale AB works to commercialize memory coherence technology for both efficient scalable hardware implementations and software distributed shared memory. (Active)
Green Cache AB took the Direct-to-Data memory system technology and worked with clients to investigate the energy-savings potential in their future mobile SoCs. (IP purchased)
Acumem AB developed the StatCache statistical memory modeling technology into the ThreadSpotter turn-key tool to help developers identify and fix memory system related issues in their software. (Sold to Rouge Wave)
Alumni (and first job)
Mehdi Alipour (PhD 2020, Ericsson, Sweden)
Kim-Anh Tran (PhD 2020, Google, Germany)
Ricardo Alves (PhD 2019, Intel, USA)
Nikos Nikoleris (PhD 2019, ARM, UK)
Germán Ceballos (PhD 2018, Ericsson, Sweden)
Magnus Norgren (Swedish Patent Office)
Andreas Sembrant (PhD 2017, Nvidia, USA)
Mahdad Davari (PhD 2017, Ericsson, Sweden)
Muneeb Khan (PhD 2016, Ericsson, Sweden)
Moncef Mechri (IMC, Netherlands)
Vasileios Spiliopoulos (ZeroPoint, Sweden)
Konstantinos Koukos (PhD 2016, KTH, Sweden)
Andreas Sandberg (PhD 2014, ARM, UK)
David Eklöv (PhD 2011, Samsung, USA)
Håkan Zeffer (PhD 2006, Sun Microsystems, USA)
Henrik Löf (PhD 2006, Stanford University, USA)
Erik Berg (PhD 2005, Xelerated, Sweden)
Martin Karlsson (PhD 2006, Sun Microsystems, USA)
Dan Wallin (PhD 2006, Virtutech, Sweden)
Zoran Radovic (PhD 2005, Sun Microsystems, USA)
Dr. Mihail Popov (Huawei, UK)
Professor Rakesh Kumar (NTNU, Norway)
Dr. Gregory Vaumourin (Atos, France)
Dr. Andra Hugo (DDN Storage, France)
Professor Trevor Carlson (NUS, Sinagpore)
Professor Magnus Själander (NTNU, Norway)
Professor Alberto Ros (University of Murcia, Spain)
Dr. Nina Shariati (Uppsala University, Sweden)
- A Reusable Characterization of the Memory System Behavior of SPEC2017 and SPEC2006. In ACM Transactions on Architecture and Code Optimization (TACO), volume 18, number 2, ASSOC COMPUTING MACHINERY, 2021. (DOI).
- TSOPER: Efficient Coherence-Based Strict Persistency. In 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pp 125-138, 2021. (DOI, IEEE Xplore).
- Architecturally-independent and time-based characterization of SPEC CPU 2017. In 2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)Raw-Data: A Reusable Characterization Of The Memory System behavior Of SPEC 2017 And SPEC 2006, IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp 107-109, 2020. (DOI, fulltext:postprint, fulltext:preprint).
- Boosting Store Buffer Efficiency with Store-Prefetch Bursts. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp 568-580, 2020. (DOI, fulltext:postprint).
- Delay and Bypass: Ready and Criticality Aware Instruction Scheduling in Out-of-Order Processors. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), International Symposium on High-Performance Computer Architecture-Proceedings, pp 424-434, 2020. (DOI).
- Modeling and Optimizing NUMA Effects and Prefetching with Machine Learning. In ICS '20: Proceedings of the 34th ACM International Conference on Supercomputing, 2020. (DOI, Fulltext, fulltext:postprint).
- Perforated Page: Supporting Fragmented Memory Allocation for Large Pages. In Proceedings of the 47th Annual ACM/IEEE International Symposium on Computer Architecture (ISCA), pp 913-925, 2020. (DOI, fulltext:postprint).
- RVSDG: An Intermediate Representation for Optimizing Compilers. In ACM Transactions on Embedded Computing Systems, volume 19, number 6, 2020. (DOI).
- Raw-Data: A Reusable Characterization Of The Memory System Behavior Of SPEC 2017 And SPEC 2006. 2020. (data set).
- Speculative Enforcement of Store Atomicity. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp 555-567, 2020. (DOI, fulltext:postprint).
- Twig: Multi-Agent Task Management for Colocated Latency-Critical Cloud Services. In , IEEE, 2020. (DOI).
- Understanding Selective Delay as a Method for Efficient Secure Speculative Execution. In I.E.E.E. transactions on computers (Print), volume 69, number 11, pp 1584-1595, 2020. (DOI).
- Directed Statistical Warming through Time Traveling. In MICRO'52: The 52nd Annual IEEE/ACM International Symposium On Microarchitecture, pp 1037-1049, 2019. (DOI).
- Efficient invisible speculative execution through selective delay and value prediction. In Proc. 46th International Symposium on Computer Architecture, pp 723-735, ACM Press, New York, 2019. (DOI, fulltext:postprint).
- Efficient thread/page/parallelism autotuning for NUMA systems. In ICS '19: Proceedings of the ACM International Conference on Supercomputing, pp 342-353, Association for Computing Machinery (ACM), New York, NY, USA, 2019. (DOI, Fulltext, fulltext:print).
- Evaluating the Potential Applications of Quaternary Logic for Approximate Computing. In ACM Journal on Emerging Technologies in Computing Systems (JETC), volume 16, number 1, New York, NY, USA, 2019. (DOI).
- FIFOrder MicroArchitecture: Ready-Aware Instruction Scheduling for OoO Processors. In 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE), Design Automation and Test in Europe Conference and Exhibition, pp 716-721, IEEE, 2019. (DOI, fulltext:postprint).
- Filter caching for free: The untapped potential of the store-buffer. In Proc. 46th International Symposium on Computer Architecture, pp 436-448, ACM Press, New York, 2019. (DOI, Fulltext, fulltext:print).
- Freeway: Maximizing MLP for Slice-Out-of-Order Execution. In 2019 25th IEEE International Symposium On High Performance Computer Architecture (HPCA), International Symposium on High-Performance Computer Architecture-Proceedings, pp 558-569, IEEE, 2019. (DOI, fulltext:postprint).
- Ghost Loads: What is the cost of invisible speculation?. In Proceedings of the 16th ACM International Conference on Computing Frontiers, pp 153-163, ACM Press, New York, 2019. (DOI, fulltext:postprint).
- Maximizing limited resources: A limit-based study and taxonomy of out-of-order commit. In Journal of Signal Processing Systems, volume 91, number 3-4, pp 379-397, 2019. (DOI, Fulltext, fulltext:print).
- Minimizing Replay under Way-Prediction. Technical report / Department of Information Technology, Uppsala University nr 2019-003, 2019. (fulltext).
- Optimizing Bit-Serial Matrix Multiplication for Reconfigurable Computing. In ACM Transactions on Reconfigurable Technology and Systems, volume 12, number 3, ASSOC COMPUTING MACHINERY, 2019. (DOI).
Full UART publications list.
Teaching and Recruiting
The Uppsala Architecture Research Team was founded in 1999 when Professor Erik Hagersten (PhD from the Royal Institute of Technology) moved back to Sweden from his position as chief server architect at Sun Microsystems. For the first 10 years UART did pioneering work in statistical cache modeling, leading to a successful commercialization of the technology. Professor Stefanos Kaxiras (PhD from Wisconsin) joined the group in 2010, moving from the University of Patras in Greece and bringing extensive experience in power efficiency and coherency. Professor David Black-Schaffer (PhD from Stanford) also joined in 2010, bringing heterogeneous runtime experience from his work on OpenCL at Apple. Professors Hagersten, Black-Schaffer, and Kaxiras, together with PhD student Andreas Sembrant, successfully commercialized their work in direct-to-data memory systems in the company Green Cache AB, whose IP was purchased in 2018. Associate Professor Alexandra Jimborean (PhD from University of Strasbourg ) joined in 2012, bringing experience in compile-time and run-time code analysis and optimization. Since then the group has grown to include multiple PhD students and postdocs.