Licentiate thesis 2020-004

Machine Learning for Spatially Varying Data

Muhammad Osama

22 April 2020

Abstract:

Many physical quantities around us vary across space or space-time. An example of a spatial quantity is provided by the temperature across Sweden on a given day and as an example of a spatio-temporal quantity we observe the counts of the corona virus cases across the globe. Spatial and spatio-temporal data enable opportunities to answer many important questions. For example, what the weather would be like tomorrow or where the highest risk for occurrence of a disease is in the next few days? Answering questions such as these requires formulating and learning statistical models.

One of the challenges with spatial and spatio-temporal data is that the size of data can be extremely large which makes learning a model computationally costly. There are several means of overcoming this problem by means of matrix manipulations and approximations. In paper I, we propose a solution to this problem where the model is learned in a streaming fashion i.e. as the data arrives point by point. This also allows for efficient updating of the learned model based on newly arriving data which is very pertinent to spatio-temporal data.

Another interesting problem in the spatial context is to study the causal effect that an exposure variable has on a response variable. For instance, policy makers might be interested in knowing whether increasing the number of police in a district has the desired effect of reducing crimes there. The challenge here is that of spatial confounding. A spatial map of the number of police against the spatial map of the number of crimes in different districts might show a clear association between these two quantities. However, there might be a third unobserved confounding variable that makes both quantities small and large together. In paper II, we propose a solution for estimating causal effects in the presence of such a confounding variable.

Another common type of spatial data is point or event data, i.e., the occurrence of events across space. The event could for example be a reported disease or crime and one may be interested in predicting the counts of the event in a given region. A fundamental challenge here is to quantify the uncertainty in the predicted counts in a model in a robust manner. In paper III, we propose a regularized criterion for learning a predictive model of counts of events across spatial regions. The regularization ensures tighter prediction intervals around the predicted counts and have valid coverage irrespective of the degree of model misspecification.

Available as PDF (5.51 MB)

Download BibTeX entry.