# The Uppsala University Information Laboratory

## Available thesis projects

At our lab we host a limited number of thesis projects. Those working on these projects become temporary members of the lab, are expected to complete the project under the agreed time constraints (typically 2.5 months for bachelors and 5 months for masters) and to actively participate in the lab activities, so that they can contribute to information sharing and knowledge development. Bachelor/Master students get a desk in our lab and are expected to be present (and work :) ) full time, that is, around 40 hours per week; we typically collaborate with maximum 3 students in parallel. A high degree of independence, a good level of ambition and good linguistic skills (English) are necessary, as all projects are part of the research activities of the lab and are expected to contribute to it with new knowledge, algorithms, code, etc. It is also expected that each student moderates at least one of our fika meetings. For most projects knowledge of C++ is expected. For master projects (and some of the bachelor projects) knowledge of data mining/machine learning is expected.

If you are interested, please send an email to matteo.magnani@it.uu.se with your transcript and a short CV or motivation.

### Null models and network rewiring in temporal network clustering

Level: Bachelor

The objective of this project is to implement network randomisation methods and apply them to test existing clustering algorithms for temporal network clustering. The underlying idea is that a clustering algorithm should identify patterns in the data that are not just the outcome of randomness. Therefore, one step to test a clustering algorithm is to randomise the data according to various methods to remove some aspects of the data (a typical example: moving around the times when people meet in a temporal social network) and observe whether this affects the ability to detect patterns and/or the detected patterns. Knowledge of C++ is required.

### Measuring text networks

Level: Bachelor or Master

Several methods exist to analyse text and to separately analyse the structure of social/communication networks. The objective of this thesis is to define, implement and experimentally evaluate measures to analyse text networks, that is, data containing information both on the communication network and on the text content exchanged through the network. A typical example of text networks are online conversations. The project includes the development and testing of part of a data analysis library (C++) combining functionality to efficiently model interconnected graphs with some existing text mining tools. The student will learn about graph and text manipulation. The final outcome of the project will be to test the software developed by analyzing empirical data collected from real online social media (e.g., Twitter, friendFeed) and compare the results with theoretical models.

### Network embedding for multiplex/text networks

Level: Bachelor or Master

Network embedding is a process that turns each node in a graph into a point in a multidimensional space, so that traditional machine learning and data mining algorithms can then be applied. This thesis concerns the implementation, testing and (for the master version) extension of network embedding algorithms for (multi)graphs extended with additional information, e.g., edge types, time and text. Knowledge of C++ is required.

### Stochastic blockmodeling for (temporal) text networks

Level: Bachelor or Master

Stochastic blockmodeling is a generative model for random graphs, commonly used for pattern detection in networks. This thesis concerns the extension of existing methods to combine textual information with structural information about the graph. For the master version, a qualitative comparison with other methods is also an objective. The candidate must be fluent in C++ programming, able to digest non-trivial mathematics, motivated, independent and creative.

### Modelling topic divergence in (temporal) text networks

Level: Master

As the use of online social media (e.g., Twitter, friendFeed) becomes ever more prevalent, users’ awareness of the impact of their online actions have been also increasing. Our hypothesis is that, as a consequence, they express themselves in a different way when they address individuals than when they address a broader audience (e.g., they write a message to multiple people at once). In order to verify such theory we need to develop a new algorithm to detect these communication motifs using the network structure, text and time. The project includes the development and testing of part of a data analysis library (C++) combining functionality to efficiently model the aforementioned motifs. The student will learn about graph and text manipulation. The final outcome of the project will be to define, implement and perform a natural experiment by analyzing data collected from real online social media or email communications to either verify or falsify our initial hypothesis.

### Sparsification of Uncertain (Probabilistic) Networks

Level: Bachelor

The objective of this project is to develop and test sparsification methods and algorithms for probabilistic (uncertain) networks. The algorithms will be implemented as part of an existing library and tested on both real and synthetic data. One step ahead can be extending the existing state-of-the-art method. The candidate must have knowledge of programming in C++ and be motivated and independent.

Definition (Sparsified graph): Given a graph G=(V,E) and sparsification ratio (a), G'=(V,E') is a sparsified graph in which |E'| = a |E| and G' preserves the structural properties of the network. The structural properties to be preserved, e.g. expected degree, are chosen by the student and the supervisor.