Skip to main content
Department of Information Technology

Multigraph clustering via graph embedding

Description:
Graphs are a common mathematical abstractions to represent systems of interconnected entities, such as social networks, transport networks and brain networks. Therefore, a large number of machine learning and data mining algorithms have been developed to analyse graphs. Among these, graph clustering algorithms are very popular, because groups of densely connected nodes in a graph often correspond to important concepts (such as communities in social networks) and influence processes on graphs (such as the propagation of information in social networks).

A common way to cluster graphs has been (and still is) to apply clustering algorithms specifically developed for graphs. However, recently a number of methods known as graph embedding have been proposed to encode each node into an N-dimensional vector. In this way, one can first embed the graph into an N-dimensional space, then apply a “traditional” clustering algorithm, such as k-means.

This project is about the implementation of recent embedding methods for multi-graphs and the experimental comparison of the performance of this approach (embedding + traditional clustering) when compared with clustering algorithms specifically developed for multi-graphs.

Practical details:
The work shall be done in C++ using the multinet library, developed at our department. The embedding methods are described in this article: https://arxiv.org/pdf/1709.03551.pdf They are based on node2vec, that is itself based on word2vec (if you didn’t hear about these approaches you can check them on Wikipedia. Regarding the coding, it is possible to integrate existing code for word2vec as long as this can be freely redistributed (the library you are going to modify is on CRAN and your work is expected to be released with it in its next version, so no code with restrictive licenses can be used inside it), and there are various implementations of node2vec that can be used as an inspiration, while the new algorithms will need to be implemented from scratch. For the experimental comparison the library already contains several multi-graph clustering algorithms, so there is no need to re-implement them. We also already have data for the tests.

Learning outcomes: From this project you will learn about word2vec and node2vec, that are extremely popular recent methods to handle respectively text and graphs, you will have a chance to look at a very recent research direction (embedding for multi-graphs), and you will also contribute answering a general research question that is still open, about whether these approaches reducing the clustering problem to a setting that can be handled using traditional algorithms are actually able to replace more specialised methods.

Supervisor: Matteo Magnani, InfoLab, Uppsala University (http://infolab.it.uu.se)

Updated  2020-10-03 14:16:07 by Maya Neytcheva.