IT Technical reports
http://www.it.uu.se/research/publications/reports
Technical reports from the Department of Information Technology, Uppsala University, Sweden
20191008T14:40:53Z
Department of Information Technology, Uppsala University, Sweden
Copyright © 2005 Department of Information Technology, Uppsala University, Sweden
daily
12
19990101T00:00+01:00

Technical report 2019009: Evaluation of Methods Handling Missing Data in PCA on Genotype Data: Applications for Ancient DNA
http://www.it.uu.se/research/publications/reports/2019009
20191001
Kristiina Ausmees, Mattias Jakobsson, and Carl Nettelblad

Technical report 2019008: An Empirical Evaluation of Genotype Imputation of Ancient DNA
http://www.it.uu.se/research/publications/reports/2019008
20191001
Kristiina Ausmees, Federico SanchezQuinto, Mattias Jakobsson, and Carl Nettelblad

Technical report 2019007: Performance of an OO Compute Kernel on the JVM: Revisiting Java as a Language for Scientific Computing Applications (Extended Version)
http://www.it.uu.se/research/publications/reports/2019007
20190901
Malin Källén and Tobias Wrigstad

Technical report 2019006: Frequency Domain Identification of FIR Models from Noisy InputOutput Data
http://www.it.uu.se/research/publications/reports/2019006
20190801
Umberto Soverini and Torsten Söderström
<b>Abstract:</b> This paper describes a new approach for identifying FIR models from a finite number of measurements, in the presence of additive and uncorrelated white noise. In particular, two different frequency domain algorithms are proposed. The first algorithm is based on some theoretical results concerning the dynamic Frisch scheme. The second algorithm maps the FIR identification problem into a quadratic eigenvalue problem. Both methods resemble in many aspects some other identification algorithms, originally developed in the time domain. The features of the proposed methods are compared with each other and with those of some time domain algorithms by means of Monte Carlo simulations.

Technical report 2019005: Block Generalized Locally Toeplitz Sequences: Theory and Applications in the Multidimensional Case
http://www.it.uu.se/research/publications/reports/2019005
20190701
Giovanni Barbarino, Carlo Garoni, and Stefano SerraCapizzano
<b>Abstract:</b> In computational mathematics, when dealing with a large linear discrete problem (e.g., a linear system) arising from the numerical discretization of a partial differential equation (PDE), the knowledge of the spectral distribution of the associated matrix has proved to be a useful information for designing/analyzing appropriate solversespecially, preconditioned Krylov and multigrid solversfor the considered problem. Actually, this spectral information is of interest also in itself as long as the eigenvalues of the aforementioned matrix represent physical quantities of interest, which is the case for several problems from engineering and applied sciences (e.g., the study of natural vibration frequencies in an elastic material). The theory of multilevel generalized locally Toeplitz (GLT) sequences is a powerful apparatus for computing the asymptotic spectral distribution of matrices An arising from virtually any kind of numerical discretization of PDEs. Indeed, when the meshfineness parameter n tends to infinity, these matrices An give rise to a sequence {An}n, which often turns out to be a multilevel GLT sequence or one of its "relatives", i.e., a multilevel block GLT sequence or a (multilevel) reduced GLT sequence. In particular, multilevel block GLT sequences are encountered in the discretization of systems of PDEs as well as in the higherorder finite element or discontinuous Galerkin approximation of scalar/vectorial PDEs. In this work, we systematically develop the theory of multilevel block GLT sequences as an extension of the theories of (unilevel) GLT sequences [GLTbookI], multilevel GLT sequences [GLTbookII], and block GLT sequences [bg]. We also present several emblematic applications of this theory in the context of PDE discretizations.

Technical report 2019004: Block Generalized Locally Toeplitz Sequences: Theory and Applications in the Unidimensional Case
http://www.it.uu.se/research/publications/reports/2019004
20190701
Giovanni Barbarino, Carlo Garoni, and Stefano SerraCapizzano
<b>Abstract:</b> In computational mathematics, when dealing with a large linear discrete problem (e.g., a linear system) arising from the numerical discretization of a differential equation (DE), the knowledge of the spectral distribution of the associated matrix has proved to be a useful information for designing/analyzing appropriate solversespecially, preconditioned Krylov and multigrid solversfor the considered problem. Actually, this spectral information is of interest also in itself as long as the eigenvalues of the aforementioned matrix represent physical quantities of interest, which is the case for several problems from engineering and applied sciences (e.g., the study of natural vibration frequencies in an elastic material). The theory of generalized locally Toeplitz (GLT) sequences is a powerful apparatus for computing the asymptotic spectral distribution of matrices An arising from virtually any kind of numerical discretization of DEs. Indeed, when the meshfineness parameter n tends to infinity, these matrices An give rise to a sequence {An}n, which often turns out to be a GLT sequence or one of its "relatives", i.e., a block GLT sequence or a reduced GLT sequence. In particular, block GLT sequences are encountered in the discretization of systems of DEs as well as in the higherorder finite element or discontinuous Galerkin approximation of scalar/vectorial DEs. This work is a review, refinement, extension, and systematic exposition of the theory of block GLT sequences. It also includes several emblematic applications of this theory in the context of DE discretizations.

Technical report 2019003: Minimizing Replay under WayPrediction
http://www.it.uu.se/research/publications/reports/2019003
20190501
Ricardo Alves, Stefanos Kaxiras, and David BlackSchaffer
<b>Abstract:</b> Waypredictors are effective at reducing dynamic cache energy by reducing the number of ways accessed, but introduce additional latency for incorrect waypredictions. While previous work has studied the impact of the increased latency for incorrect waypredictions, we show that the latency variability has a far greater effect as it forces replay of inflight instructions on an incorrect wayprediction. To address the problem, we propose a solution that learns the confidence of the wayprediction and dynamically disables it when it is likely to mispredict. We further improve this approach by biasing the confidence to reduce latency variability further at the cost of reduced waypredictions. Our results show that instruction replay in a waypredictor reduces IPC by 6.9% due to 10% of the instructions being replayed. Our confidencebased waypredictor degrades IPC by only 2.9% by replaying just 3.4% of the instructions, reducing waypredictor cache energy overhead (compared to serial access cache) from 8.5% to 1.9%.

Technical report 2019002: Block Generalized Locally Toeplitz Sequences: Theory and Applications
http://www.it.uu.se/research/publications/reports/2019002
20190401
C. Garoni and S. SerraCapizzano
<b>Abstract:</b> When dealing with a large linear system arising from the numerical discretization of a differential equation (DE), the knowledge of the spectral distribution of the associated matrix has proved to be a useful information for designing/analyzing appropriate solversespecially, preconditioned Krylov and multigrid solvers for the considered system. The theory of generalized locally Toeplitz (GLT) sequences is a powerful apparatus for computing the asymptotic spectral distribution of matrices An arising from virtually any kind of numerical discretization of DEs. Indeed, when the meshfineness parameter n tends to infinity, these matrices An give rise to a sequence { An }, which often turns out to be a GLT sequence or one of its "relatives", i.e., a block GLT sequence or a reduced GLT sequence. In particular, block GLT sequences are encountered in the discretization of systems of DEs as well as in the higherorder finite element or discontinuous Galerkin approximation of scalar/vectorial DEs. This work is a review, refinement, extension, and systematic exposition of the theory of block GLT sequences. It also includes several emblematic applications of this theory in the context of DE discretizations.

Technical report 2019001: The Store Atomicity Dilemma
http://www.it.uu.se/research/publications/reports/2019001
20190301
Alberto Ros and Stefanos Kaxiras
<b>Abstract:</b> Actual TSO implementations (x86TSO, SPARC) relax strict store atomicity by allowing a core to see its own stores while they are in limbo, i.e., executed (and perhaps retired) but not yet inserted in the global memory order. This can break the TSO ordering rules, specifically the loadload order, in unexpected and unpredictable ways. Furthermore, we show that similar effects can be observed in memory models weaker than TSO. Such behaviors seriously compromise the soundness of the memory model. The storeatomicity dilemma that designers face is: clean semantics and a sound model or performance? As of yet, enforcing strict store atomicity carries a steep performance penalty. The only known solutions to guarantee store atomicity impose a blanket enforcement even when a violation of store atomicity would not matter. We make a simple observation. What holds for any other rule in a consistency model, also holds for strict store atomicity: it is not a crime to break the rule, unless we get caught. In this work, we detail the different ways of how a store atomicity violation can be detected via its effect: the breaking of the loadload ordering rule. We then describe an effective and cheap approach to dynamically enforce store atomicity only when the detection of its violation actually occurs. In practice, these cases are rare during the execution of a program. In all other cases (the bulk of the execution of a program) store atomicity can be freely violated without anyone taking notice. The end result is that we provide (the illusion of) clean semantics and a sound storeatomic memory model but with the performance and cost of a nonstoreatomic model.

Technical report 2018014: Delorean: Virtualized Directed Profiling for Cache Modeling in Sampled Simulation
http://www.it.uu.se/research/publications/reports/2018014
20181201
Nikos Nikoleris, Erik Hagersten, and Trevor E. Carlson
<b>Abstract:</b> Current practice for accurate and efficient simulation (e.g., SMARTS and Simpoint) makes use of sampling to significantly reduce the time needed to evaluate new research ideas. By evaluating a small but representative portion of the original application, sampling can allow for both fast and accurate performance analysis. However, as cache sizes of modern architectures grow, simulation time is dominated by warming microarchitectural state and not by detailed simulation, reducing overall simulation efficiency. While checkpoints can significantly reduce cache warming, improving efficiency, they limit the flexibility of the system under evaluation, requiring new checkpoints for software updates (such as changes to the compiler and compiler flags) and many types of hardware modifications. An ideal solution would allow for accurate cache modeling for each simulation run without the need to generate rigid checkpointing data a priori. Enabling this new direction for fast and flexible simulation requires a combination of (1) a methodology that allows for hardware and software flexibility and (2) the ability to quickly and accurately model arbitrarilysized caches. Current approaches that rely on checkpointing or statistical cache modeling require rigid, upfront state to be collected which needs to be amortized over a large number of simulation runs. These earlier methodologies are insufficient for our goals for improved flexibility. In contrast, our proposed methodology, Delorean, outlines a unique solution to this problem. The Delorean simulation methodology enables both flexibility and accuracy by quickly generating a targeted cache model for the next detailed region on the fly without the need for upfront simulation or modeling. More specifically, we propose a new, more accurate statistical cache modeling method that takes advantage of hardware virtualization to precisely determine the memory regions accessed and to minimize the time needed for data collection while maintaining accuracy. Delorean uses a multipass approach to understand the memory regions accessed by the next, upcoming detailed region. Our methodology collects the entire set of key memory accesses and, through fast virtualization techniques, progressively scans larger, earlier regions to learn more about these key accesses in an efficient way. Using these techniques, we demonstrate that Delorean allows for the fast evaluation of systems and their software though the generation of accurate cache models on the fly. Delorean outperforms previous proposals by an order of magnitude, with a simulation speed of 150 MIPS and a similar average CPI error (below 4%).

Technical report 2018013: How to Extend the Application Scope of GLTSequences
http://www.it.uu.se/research/publications/reports/2018013
20181101
Stanislav Morozov, Stefano SerraCapizzano, and Eugene Tyrtyshnikov
<b>Abstract:</b> In this paper we address the problem of finding the distribution of eigenvalues and singular values for matrix sequences. The main focus of this paper is the spectral distribution for matrix sequences arising in discretization of PDE. In the last two decades the theory of GLTsequences aimed at this problem has been developed. We investigate the possibility of application of GLTtheory to discretization of PDE on nonrectangular domains and show that in many cases the present GLTtheory is insufficient. We also propose a generalization of GLTsequences that enables one to cope with a wide range of PDE discretization problems defined on polygonal domains.

Technical report 2018012: Eigenvalue Isogeometric Approximations Based on Bsplines: Tools and Results
http://www.it.uu.se/research/publications/reports/2018012
20180701
SvenErik Ekström and Stefano SerraCapizzano
<b>Abstract:</b> In such a short note we consider the spectral analysis of large matrices coming from the numerical approximation of the eigenvalue problem (a(x)u'(x))'=λ b(x) u(x), x∈ (0,1), where u(0) and u(1) are given, by using isogeometric methods based on Bsplines. We give precise estimates for the extremal eigenvalues and global distributional results. The techniques involve dyadic decomposition arguments, the GLT analysis, and basic extrapolation methods.

Technical report 2018011: Nonlinear System Identification of the Dissolved Oxygen to Effluent Ammonium Dynamics in an Activated Sludge Process
http://www.it.uu.se/research/publications/reports/2018011
20180601
Tatiana Chistiakova, Per Mattsson, Bengt Carlsson, and Torbjörn Wigren
<b>Abstract:</b> Aeration of biological reactors in wastewater treatment plants is important to obtain a high removal of soluble organic matter as well as for nitrification but requires a significant use of energy. It is hence of importance to control the aeration rate, for example, by ammonium feedback control. The goal of this report is to model the dynamics from the set point of an existing dissolved oxygen controller to effluent ammonium using two types of system identification methods for a Hammerstein model, including a newly developed recursive variant. The models are estimated and evaluated using noise corrupted data from a complex mechanistic model (Activated Sludge Model no.1). The performances of the estimated nonlinear models are compared with an estimated linear model and it is shown that the nonlinear models give a significantly better fit to the data. The resulting models may be used for adaptive control (using the recursive Hammerstein variant), gainscheduling control, L2 stability analysis, and model based fault detection.

Technical report 2018010: Preconditioners for TwobyTwo Block Matrices with Square Blocks
http://www.it.uu.se/research/publications/reports/2018010
20180501
Owe Axelsson and Maya Neytcheva
<b>Abstract:</b> Twobytwo block matrices with square blocks arise in the numerical treatment of numerous applications of practical significance, such as optimal control problems, constrained by a state equation in the form of partial differential equations, multiphase models, solving complex linear systems in real arithmetics, to name a few.a Such problems lead to algebraic systems of equations with matrices of a certain twobytwo block form. For such matrices, a number of preconditioners has been proposed, some of them with tight eigenvalue bounds. In this paper it is shown that in particular one of them, referred to as PRESB, is very efficient, not only giving robust, favourable properties of the spectrum but also enabling an efficient implementation with low computational complexity. Various applications and generalizations of this preconditioning technique, such as in timeharmonic parabolic and Stokes equations, eddy current electromagnetic problems and problems with additional boxconstraints, i.e. upper and/or lower bounds of the solution, are also discussed. The method is based on the use of coupled innerouter iterations, where the inner iteration can be performed to various relative accuracies. This leads to variable preconditioners, thus, a flexible version of a Krylov subspace iteration method must be used. Alternatively, some version of a defectcorrection iterative method can be applied.