Uppsala Architecture Research Team
The Uppsala Architecture Research Team is a multi-disciplinary research group that works on a broad range of challenges in computer architecture, including microarchitecture, memory systems, compilers, security, power efficiency, simulation and modeling, runtime optimizations, co-design, and distributed systems.
Professor Erik Hagersten
(PhD Royal Institute of Technology, Stockholm) was the chief server architect at Sun Microsystems before coming to Uppsala. His research interests include efficient memory system designs and modeling.
Professor Stefanos Kaxiras
(PhD Wisconsin) worked at Bell Labs before coming to Uppsala. His research interests include and memory consistency models, coherence, and microarchitecture with an emphasis on security and (reducing) speculation.
Professor David Black-Schaffer
(PhD Stanford) worked at Apple before coming to Uppsala. His research interests include runtime scheduling and memory system design.
Associate Professor Alexandra Jimborean
(PhD Strasbourg) conducts research on compiler optimizations for efficiency and HW/SW co-design.
Postdocs and Visiting Researchers
Challenge: Making general purpose processors more efficient.
Results: Offloading instructions to simpler schedulers to reduce scheduling cost (ICCD2018, HPCA2019, DATE2019, HPCA2020); caching in the pipeline (ISCA2019).
Security and Speculation
Challenge: Building processors that are secure by design; Reducing our reliance on speculation without losing its performance advantages.
Results: Understanding speculative shadows to reduce the impact of reduced speculation (ISCA2019); hiding speculative effects (CF2019), Non-Speculative techniques to reorder memory accesses (ISCA2017, IEEE Micro Top Picks 2018, ISCA2018, MICRO2018); Compiler orchestrated software-out-of-order execution on in-order cores (PACT2016 SRC-Bronze medal, CGO2017, PLDI2018, Best of CAL 2017, TransOnComputers2018 - Featured article of the month); Limited speculation cores (ISCA2015).
Compiling for Power Efficiency
Challenge: Co-designing the hardware and compiler to maximize efficiency.
Results: Decoupling access and execute to improve DVFS (ICS2013, CGO2014, CC2016 Best Paper, HIP3ES2016, HIP3ES2017);
Smart Memory Systems
Challenge: Understanding where and when data is needed to reduce the energy consumed in moving it and the time wasted waiting for it.
Results: Direct-to-data cache designs that avoid searches (MICRO2013, ISCA2014, MICRO2015, HPCA2018); intelligent policies for placing data based on reuse for CPUs (ICCD2016, SBAC-PAD2017, ICS2019) and GPUs (IISCW2017).
Challenge: Matching the heterogeneous behavior of tasks and applications to heterogeneous hardware for performance.
Results: CPU and GPU task analysis and modeling (JParallelComputing2018, ISPASS2018); GPU co-execution (SBAC-PAD),
Challenge: Create novel coherence protocols to enable highly-efficient multi/many-core systems and software shared memory implementation.
Results: Application driven, highly-efficient, VIPS family of protocols (PACT2012, ISCA2013, ISCA2015, HPCA2015); ArgoDSM distributed shared memory system (HPDC2015); Racer TSO: data-race-detection coherence, transparent to software (MICRO2016, IEEE Micro Top Picks 2017 honorable mention); compiler-assisted cache coherence (IPDPS2015, TPDS2016, CGO2017, CCPR2017, TPDS2018).
Challenge: Using low-overhead profile information to quickly model memory system behavior and performance.
Results: Architecturally independent performance models for memory systems (CGO2012, IISWC2012) and performance (ISPASS2015) and resource-sharing performance profiling (CGO2013, PACT2012).
Software Optimization for Memory Systems
Challenge: Automatic software-based cache bypassing and prefetching without hurting co-execution on multicores.
Results: Adaptive software bypassing (HPCA2013) and prefetching (PACT2015).
Eta Scale AB works to commercialize memory coherence technology for both efficient scalable hardware implementations and software distributed shared memory. (Active)
Green Cache AB took the Direct-to-Data memory system technology and worked with clients to investigate the energy-savings potential in their future mobile SoCs. (IP purchased)
Acumem AB developed the StatCache statistical memory modeling technology into the ThreadSpotter turn-key tool to help developers identify and fix memory system related issues in their software. (Sold to Rouge Wave)
Alumni (and first job)
Nikos Nikoleris (PhD 2019, ARM, UK)
Germán Ceballos (PhD 2018, Ericsson, Sweden)
Magnus Norgren (Swedish Patent Office)
Andreas Sembrant (PhD 2017, Nvidia, USA)
Mahdad Davari (PhD 2017, Ericsson, Sweden)
Moncef Mechri (IMC, Netherlands)
Vasileios Spiliopoulos (ZeroPoint, Sweden)
Konstantinos Koukos (PhD 2016, KTH, Sweden)
Andras Sandberg (PhD 2014, ARM, UK)
David Eklöv (PhD 2011, Samsung, USA)
Håkan Zeffer (PhD 2006, Sun Microsystems, USA)
Henrik Löf (PhD 2006, Stanford University, USA)
Erik Berg (PhD 2005, Xelerated, Sweden)
Martin Karlsson (PhD 2006, Sun Microsystems, USA)
Dan Wallin (PhD 2006, Virtutech, Sweden)
Zoran Radovic (PhD 2005, Sun Microsystems, USA)
Dr. Mihail Popov (Huawei, UK)
Professor Rakesh Kumar (NTNU, Norway)
Dr. Gregory Vaumourin (Atos, France)
Dr. Andra Hugo (DNN Storage, France)
Professor Trevor Carlson (NUS, Sinagpore)
Professor Magnus Själander (NTNU, Norway)
Professor Alberto Ros (University of Murcia, Spain)
Dr. Nina Shariati (Uppsala University, Sweden)
- Efficient invisible speculative execution through selective delay and value prediction. In Proc. 46th International Symposium on Computer Architecture, pp 723-735, ACM Press, New York, 2019. (DOI, fulltext:postprint).
- Efficient thread/page/parallelism autotuning for NUMA systems. In International Conference on Supercomputing, Association for Computing Machinery (ACM), New York, NY, USA, 2019. (DOI, Fulltext, External link, fulltext:print).
- Evaluating the Potential Applications of Quaternary Logic for Approximate Computing. In ACM Journal on Emerging Technologies in Computing Systems (JETC), volume 16, number 1, New York, NY, USA, 2019. (DOI).
- FIFOrder MicroArchitecture: Ready-Aware Instruction Scheduling for OoO Processors. In 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE), Design Automation and Test in Europe Conference and Exhibition, pp 716-721, IEEE, 2019. (DOI, fulltext:postprint).
- Filter caching for free: The untapped potential of the store-buffer. In Proc. 46th International Symposium on Computer Architecture, pp 436-448, ACM Press, New York, 2019. (DOI).
- Freeway: Maximizing MLP for Slice-Out-of-Order Execution. In 2019 25th IEEE International Symposium On High Performance Computer Architecture (HPCA), International Symposium on High-Performance Computer Architecture-Proceedings, pp 558-569, IEEE, 2019. (DOI, fulltext:postprint).
- Ghost Loads: What is the cost of invisible speculation?. In Proceedings of the 16th ACM International Conference on Computing Frontiers, pp 153-163, ACM Press, New York, 2019. (DOI, fulltext:postprint).
- Maximizing limited resources: A limit-based study and taxonomy of out-of-order commit. In Journal of Signal Processing Systems, volume 91, number 3-4, pp 379-397, 2019. (DOI, Fulltext, fulltext:print).
- Minimizing Replay under Way-Prediction. Technical report / Department of Information Technology, Uppsala University nr 2019-003, 2019. (fulltext).
- Optimizing Bit-Serial Matrix Multiplication for Reconfigurable Computing. In ACM Transactions on Reconfigurable Technology and Systems, volume 12, number 3, ASSOC COMPUTING MACHINERY, 2019. (DOI).
- Analyzing performance variation of task schedulers with TaskInsight. In Parallel Computing, volume 75, pp 11-27, 2018. (DOI).
- Automatic Detection of Large Extended Data-Race-Free Regions with Conflict Isolation. In IEEE Transactions on Parallel and Distributed Systems, volume 29, number 3, pp 527-541, IEEE COMPUTER SOC, 2018. (DOI).
- Behind the Scenes: Memory Analysis of Graphical Workloads on Tile-based GPUs. In Proc. International Symposium on Performance Analysis of Systems and Software: ISPASS 2018, pp 1-11, IEEE Computer Society, 2018. (DOI, fulltext:preprint).
- Delorean: Virtualized Directed Profiling for Cache Modeling in Sampled Simulation. Technical report / Department of Information Technology, Uppsala University, 2018. (fulltext).
- Dynamically Disabling Way-prediction to Reduce Instruction Replay. In 2018 IEEE 36th International Conference on Computer Design (ICCD), Proceedings IEEE International Conference on Computer Design, pp 140-143, IEEE, 2018. (DOI, External link).
- Mending fences with self-invalidation and self-downgrade. In Logical Methods in Computer Science, volume 14, number 1, 2018. (External link).
- Non-Speculative Load Reordering in Total Store Ordering. In IEEE Micro, volume 38, number 3, pp 48-57, IEEE COMPUTER SOC, 2018. (DOI).
- Non-Speculative Store Coalescing in Total Store Order. In Proc.45th International Symposium on Computer Architecture, pp 221-234, IEEE, 2018. (DOI, fulltext:postprint).
- SWOOP: software-hardware co-design for non-speculative, execute-ahead, in-order cores. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp 328-343, Association for Computing Machinery (ACM), 2018. (DOI, fulltext:print).
- Static instruction scheduling for high performance on limited hardware. In IEEE Transactions on Computers, volume 67, number 4, pp 513-527, 2018. (DOI).
- Tail-PASS: Resource-based Cache Management for Tiled Graphics Rendering Hardware. In Proc. 16th International Conference on Parallel and Distributed Processing with Applications, pp 55-63, IEEE, 2018. (DOI).
- The Superfluous Load Queue. In 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp 95-107, IEEE, 2018. (DOI, fulltext:postprint).
- A Taxonomy of Out-of-Order Instruction Commit. In 2017 Ieee International Symposium On Performance Analysis Of Systems And Software (Ispass), pp 135-136, IEEE Computer Society, Los Alamitos, 2017. (DOI).
- A dedicated private-shared cache design for scalable multiprocessors. In Concurrency and Computation, volume 29, number 2, 2017. (DOI).
- A graphics tracing framework for exploring CPU+GPU memory systems. In Proc. 20th International Symposium on Workload Characterization, pp 54-65, IEEE, 2017. (DOI).
- A split cache hierarchy for enabling data-oriented optimizations. In Proc. 23rd International Symposium on High Performance Computer Architecture, pp 133-144, IEEE Computer Society, 2017. (DOI).
- Adaptive cache warming for faster simulations. In Proc. 9th Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, ACM Press, New York, 2017. (DOI, Fulltext, fulltext:print).
- Addressing energy challenges in filter caches. In Proc. 29th International Symposium on Computer Architecture and High Performance Computing, pp 49-56, IEEE Computer Society, 2017. (DOI).
- Analyzing Graphics Workloads on Tile-based GPUs. In Proc. 20th International Symposium on Workload Characterization, pp 108-109, IEEE, 2017. (DOI).
- Automatic detection of extended data-race-free regions. In Proc. 15th International Symposium on Code Generation and Optimization, pp 14-26, IEEE Press, Piscataway, NJ, 2017. (Paper, fulltext:postprint).
- Clairvoyance: Look-ahead compile-time scheduling. In Proc. 15th International Symposium on Code Generation and Optimization, pp 171-184, IEEE Press, Piscataway, NJ, 2017. (fulltext:postprint).
- Decoupled Access-Execute on ARM big.LITTLE. In Proc. 5th Workshop on High Performance Energy Efficient Embedded Systems, 2017. (External link).
- Efficient Self-Invalidation/Self-Downgrade for Critical Sections with Relaxed Semantics. In IEEE Transactions on Parallel and Distributed Systems, volume 28, number 12, pp 3413-3425, 2017. (DOI).
- Exploring scheduling effects on task performance with TaskInsight. In Supercomputing frontiers and innovations, volume 4, number 3, pp 91-98, 2017. (DOI, Fulltext).
- Exploring the performance limits of out-of-order commit. In Proc. 14th Computing Frontiers Conference, pp 211-220, ACM Press, New York, 2017. (DOI, attachment:print).
- Non-speculative load-load reordering in TSO. In Proc. 44th International Symposium on Computer Architecture, pp 187-200, ACM Press, New York, 2017. (DOI).
- POSTER: Putting the G back into GPU/CPU Systems Research. In 2017 26TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT), International Conference on Parallel Architectures and Compilation Techniques, pp 130-131, 2017. (DOI).
- Scope-Aware Classification: Taking the hierarchical private/shared data classification to the next level. Technical report / Department of Information Technology, Uppsala University nr 2017-008, 2017. (External link).
- TaskInsight: Understanding task schedules effects on memory and performance. In Proc. 8th International Workshop on Programming Models and Applications for Multicores and Manycores, pp 11-20, ACM Press, New York, 2017. (DOI, Fulltext).
- The best of both works: A hybrid data-race-free cache coherence scheme. 2017.
- Transcending hardware limits with software out-of-order processing. In IEEE Computer Architecture Letters, volume 16, number 2, pp 162-165, 2017. (DOI).
- Understanding the interplay between task scheduling, memory and performance. In Proc. Companion 8th ACM International Conference on Systems, Programming, Languages, and Applications: Software for Humanity, pp 21-23, ACM Press, New York, 2017. (DOI).
Full UART publications list.
Teaching and Recruiting
The Uppsala Architecture Research Team was founded in 1999 when Professor Erik Hagersten (PhD from the Royal Institute of Technology) moved back to Sweden from his position as chief server architect at Sun Microsystems. For the first 10 years UART did pioneering work in statistical cache modeling, leading to a successful commercialization of the technology. Professor Stefanos Kaxiras (PhD from Wisconsin) joined the group in 2010, moving from the University of Patras in Greece and bringing extensive experience in power efficiency and coherency. Professor David Black-Schaffer (PhD from Stanford) also joined in 2010, bringing heterogeneous runtime experience from his work on OpenCL at Apple. Professors Hagersten, Black-Schaffer, and Kaxiras, together with PhD student Andreas Sembrant, successfully commercialized their work in direct-to-data memory systems in the company Green Cache AB, whose IP was purchased in 2018. Associate Professor Alexandra Jimborean (PhD from University of Strasbourg ) joined in 2012, bringing experience in compile-time and run-time code analysis and optimization. Since then the group has grown to include multiple PhD students and postdocs.