Department of Information Technology

The Uppsala University Information Laboratory

Available thesis projects

At our lab we host a limited number of thesis projects. Those working on these projects become temporary members of the lab, are expected to complete the project under the agreed time constraints (typically 2.5 months for bachelors and 5 months for masters) and to actively participate in the lab activities, so that they can contribute to information sharing and knowledge development. Bachelor/Master students get a desk in our lab and are expected to be present (and work :) ) full time, that is, around 40 hours per week; we typically collaborate with maximum 3 students in parallel. A high degree of independence, a good level of ambition and good linguistic skills (English) are necessary, as all projects are part of the research activities of the lab and are expected to contribute to it with new knowledge, algorithms, code, etc. It is also expected that each student moderates at least one of our fika meetings. For most projects knowledge of C++ is expected. For master projects (and some of the bachelor projects) knowledge of data mining/machine learning is expected.

If you are interested, please send an email to matteo.magnani@it.uu.se with your transcript and a short CV or motivation.

Text-to-features for Swedish text

Level: Bachelor

The aim of this project is to develop a tool to translate text documents (including social media posts) into feature vectors. Many traditional machine learning and data mining algorithms do not work directly on text data, but accept input where each data object is represented as a vector of features. Therefore, to be able to analyze text data we can express text documents as a numerical vector whose fields represent grammatical features (such as: number of personal pronouns, number of adjectives, …) and word categories (such as: number of positive words, number of action verbs, …). Part of the project will consist in the creation of a dictionary mapping words to categories, to be produced using crowdsourcing.

Design and development of a GDPR-compliant online data collection tool

Level: Bachelor

The aim of this project is to extend existing software for the collection of Twitter data, making it compliant with the recently enforced EU General Data Protection Regulation. While excellent programming skills are requested, a significant part of the project concerns the study of the GDPR and the (non-trivial) design of the corresponding software solutions. Knowledge of PHP is required, knowledge of the Twitter API is meriting.

Network embedding for multiplex/text networks

Level: Bachelor or Master

Network embedding is a process that turns each vertex in a graph into a point in a multidimensional space, so that traditional machine learning and data mining algorithms can then be applied. This thesis concerns the implementation, testing and (for the master version) extension of network embedding algorithms for multigraphs extended with additional information, e.g., edge types, time and text. Knowledge of C++ and Machine Learning are required.

Parallel processing of graph data

Level: Bachelor or Master

Computing graph mining algorithms is often time-consuming because of the nature of the data (non-locality) or because of its representation (e.g., adjacency matrices). The aim of this project is to study one or two existing graph clustering algorithms (currently implemented in C++), analyze them to identify computational bottlenecks and extend them to enable a more efficient parallel execution. Knowledge of C++, including knowledge of concurrent programming in C++, is required.

Analysis of the online Swedish political discourse

Level: Master

The objective of this thesis is to design and implement a data mining process to analyze the online political discourse in Sweden. The project includes the formulation of interesting questions (e.g., how different parties are narrated on different sources, or how different genders contribute to the discussion), the selection of appropriate data sources (e.g., online versions of traditional newspapers vs. social media platforms), data collection, the application of data mining and machine learning algorithms and the corresponding interpretation of the results.

Updated  2018-11-07 16:32:48 by Matteo Magnani.