Uppsala Architecture Research Team
The Uppsala Architecture Research Team (UART) undertakes world-leading computer architecture research in measurement, modeling, and hardware and software optimization, with a focus on power and performance. Our approach starts with low-overhead measurement of key application and hardware data (typically on commodity hardware and often in an architecturally-independent manner). We then use this data to develop fast models for predicting performance, efficiency, and scalability across a range of systems and configurations. These models give us insight into application and hardware behavior, which allows us to develop targeted optimizations and new techniques to improve power and performance. The Uppsala Architecture Research Team is led by Professors Erik Hagersten and Stefanos Kaxiras, Associate Professor David Black-Schaffer and Assistant Professor Alexandra Jimborean.
Applications due 31 March 2016
Funding and Collaboration
- A hybrid static–dynamic classification for dual-consistency cache coherence. In IEEE Transactions on Parallel and Distributed Systems, volume 27, 2016. (DOI). Publication status: Epub ahead of print
- Multiversioned decoupled access-execute: The key to energy-efficient compilation of general-purpose programs. In Proc. 25th International Conference on Compiler Construction, pp 121-131, ACM Press, New York, 2016. (DOI).
- Poster:Approximation: A New Paradigm also for Wireless Sensing. In , 2016.
- Profiling-Assisted Decoupled Access-Execute. In Proc. 4th International Workshop on High Performance Energy Efficient Embedded Systems, 2016. (External link).
- Spatial and Temporal Cache Sharing Analysis in Tasks. In , Timisoara, Romania, 2016. (Proceedings).
- Techniques for Modulating Error Resilience in Emerging Multi-Value Technologies. In , 2016.
- A dual-consistency cache coherence protocol. In Proc. 29th International Parallel and Distributed Processing Symposium, pp 1119-1128, IEEE Computer Society, Los Alamitos, CA, 2015. (DOI, fulltext).
- AREP: Adaptive Resource Efficient Prefetching for Maximizing Multicore Performance. In Proc. 24th International Conference on Parallel Architectures and Compilation Techniques, pp 367-378, IEEE Computer Society, 2015. (DOI, fulltext).
- An efficient, self-contained, on-chip directory: DIR<sub>1</sub>-SISD. In Proc. 24th International Conference on Parallel Architectures and Compilation Techniques, pp 317-330, IEEE Computer Society, 2015. (DOI).
- Cost-effective speculative scheduling in high performance processors. In Proc. 42nd International Symposium on Computer Architecture, pp 247-259, ACM Press, New York, 2015. (DOI).
- Hierarchical private/shared classification: The key to simple and efficient coherence for clustered cache hierarchies. In Proc. 21st International Symposium on High Performance Computer Architecture, pp 186-197, IEEE Computer Society Digital Library, 2015. (DOI).
- Improving data access efficiency by using context-aware loads and stores. In Proc. 16th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems, pp 27-36, ACM Press, New York, 2015. (DOI).
- Long Term Parking (LTP): Criticality-aware Resource Allocation in OOO Processors. In Proc. 48th International Symposium on Microarchitecture, 2015.
- Optimizing transfers of control in the static pipeline architecture. In Proc. 16th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems, pp 7-16, ACM Press, New York, 2015. (DOI).
- Perf-Insight: A Simple, Scalable Approach to Optimal Data Prefetching in Multicores. Technical report / Department of Information Technology, Uppsala University nr 2015-037, 2015. (External link).
- Scheduling instruction effects for a statically pipelined processor. In Proc. International Conference on Compilers, Architectures, and Synthesis for Embedded Systems: CASES 2015, pp 167-176, IEEE Press, Piscataway, NJ, 2015. (DOI).
- StatTask: Reuse distance analysis for task-based applications. In Proc. 7th Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, pp 1-7, ACM Press, New York, 2015. (DOI).
- The effects of granularity and adaptivity on private/shared classification for coherence. In ACM Transactions on Architecture and Code Optimization (TACO), volume 12, number 3, 2015. (DOI).
- A case for resource efficient prefetching in multicores. In Proc. International Symposium on Performance Analysis of Systems and Software: ISPASS 2014, pp 137-138, IEEE Computer Society, 2014. (DOI).
- A case for resource efficient prefetching in multicores. In Proc. 43rd International Conference on Parallel Processing, pp 101-110, IEEE Computer Society, 2014. (DOI).
- A software based profiling method for obtaining speedup stacks on commodity multi-cores. In 2014 IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE (ISPASS): ISPASS 2014, IEEE International Symposium on Performance Analysis of Systems and Software-ISPASS, pp 148-157, IEEE Computer Society, 2014. (DOI).
- A tunable cache for approximate computing. In Proc. 10th International Symposium on Nanoscale Architectures, IEEE International Symposium on Nanoscale Architectures, pp 88-89, IEEE, Piscataway, NJ, 2014. (DOI).
- Dynamic and speculative polyhedral parallelization using compiler-generated skeletons. In International journal of parallel programming, volume 42, number 4, pp 529-545, 2014. (DOI).
- Extending statistical cache models to support detailed pipeline simulators. In 2014 IEEE International Symposium On Performance Analysis Of Systems And Software (Ispass), IEEE International Symposium on Performance Analysis of Systems and Software-ISPASS, pp 86-95, IEEE Computer Society, 2014. (DOI).
- Fix the code. Don't tweak the hardware: A new compiler approach to Voltage–Frequency scaling. In Proc. 12th International Symposium on Code Generation and Optimization, pp 262-272, ACM Press, New York, 2014. (URL, fulltext).
- Full Speed Ahead: Detailed Architectural Simulation at Near-Native Speed. Technical report / Department of Information Technology, Uppsala University nr 2014-005, 2014. (External link, fulltext).
- Managing power constraints in a single-core scenario through power tokens. In Journal of Supercomputing, volume 68, number 1, pp 414-442, 2014. (DOI).
- Power-Efficient Computer Architectures: Recent Advances. Morgan & Claypool Publishers, 2014. (DOI).
- Resource conscious prefetching for irregular applications in multicores. In Proc. International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV), pp 34-43, IEEE, Piscataway, NJ, 2014. (DOI).
- Software-controlled processor stalls for time and energy efficient data locality optimization. In Proc. International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV), pp 199-206, IEEE, Piscataway, NJ, 2014. (DOI, fulltext).
- Speculative program parallelization with scalable and decentralized runtime verification. In Runtime Verification, volume 8734 of Lecture Notes in Computer Science, pp 124-139, Springer Berlin/Heidelberg, 2014. (DOI).
- The Direct-to-Data (D2D) Cache: Navigating the cache hierarchy with a single lookup. In Proc. 41st International Symposium on Computer Architecture, pp 133-144, IEEE Press, Piscataway, NJ, 2014. (DOI).
Full UART publications list.
The Uppsala Architecture Research Team was founded in 1999 when Professor Erik Hagersten (PhD from the Royal Institute of Technology) moved back to Sweden from his position as chief server architect at Sun Microsystems. For the first 10 years UART did pioneering work in statistical cache modeling, leading to a successful commercialization of the technology. Professor Stefanos Kaxiras (PhD from Wisconsin) joined the group in 2010, moving from the University of Patras in Greece and bringing extensive experience in power efficiency and coherency. Associate Professor David Black-Schaffer (PhD from Stanford) also joined in 2010, bringing heterogeneous runtime experience from his work on OpenCL at Apple. Assistant Professor Alexandra Jimborean (PhD from University of Strasbourg ) joined in 2012, bringing experience in compile-time and run-time code analysis and optimization. Since then the group has grown to include 13 PhD students and 3 postdocs.