- Sverker Holmgren (Professor in Scientific Computing)
- Carl Nettelblad (Assistant Professor in Scientific Computing)
- Kristiina Ausmees (PhD student in Scientific Computing, Princial advisor: C. Nettelblad. Co-advisor: M. Jakobsson, Department of Organism Biology, UU)
- Ebba Bergman (Master student, supervisor: C. Nettelblad)
- Salman Toor (PhD in Scientific Computing)
- Lars Rönnegård(Professor, Dalarna University, Section of Statistics)
Originally, our group focused on the search problem of finding specific gene positions explaining variation in quantitative traits (i.e. the genes "controlling" properties such as body height or propensity for disease). However, we have continually moved towards the more basic and computationally demanding steps in pre-processing genomic data for any kind of bioinformatic or statistical analysis. Thease problems include imputation and phasing.
Imputation is the process of filling in the blanks of genotypes. If one individual is tested with a low-cost method, data from other reference individuals can be used instead. For doing imputation, phasing is a common step. In this context, phasing, or haplotype inference is the process of sorting out the two copies each individual carries of a chromosome. Most testing methods only tell "you have one copy of variant 1, and one copy of variant 2". Through phasing, one can say "the copy you have of variant 1 was inherited from the same parent as the copy of variant A in another gene".
The methods for phasing are slightly different for cases where a known pedigree for individuals exist, and when that is not the case. We are actively investigating methods for both, focusing on cases where genotype information can be noisy or fuzzy. When known pedigrees exist, we have pioneered using methods that parametrize the uncertain information and then use iterative optimization, moving over different "focus pedigrees", where each individual can appear in multiple places. The parameters are shared between all common pedigrees.
In all our specific applications, we have paid attention to parallelization and distributed computing, as appropriate, to realize high performance in an accessible manner in practice.
- Behrang Mahjani (PhD thesis 2016, principal advisor: S. Holmgren. Co-advisor: C. Nettelblad, L. Rönnegård)
- Mahen Jayawardena (PhD thesis 2010, Also at University of Colombo School of Computing, Sri Lanka. Principal advisor: S. Holmgren. Co-advisors: Ö. Carlborg, SLU and R. Weerasinghe, UCSC)
- Kateryna Mishchenko (PhD thesis 2008, Mälardalen University College. Principal advisor: S. Holmgren. Co-advisors: L. Rönnegård, LCB and D. Silvestrov, MdH).
- Kajsa Harling (f. Ljungberg) (PhD thesis 2005, Principal advisor: S. Holmgren. Co-advisor: Örjan Carlborg)
A full list of publications, including conference contributions, can be found here.
- cnF2freq: cnF2freq has been used in several ways to compute probabilities for genotypes and haplotypes in various populations (including one release under the name PlantImpute). The most recent versions of the code are available as different branches on Github. Please refer to the more detailed cnF2freq codebase page.
- DIRECT: The early work of the project most importantly resulted in using DIRECT for numerical searches. Some of the resulting code was included in the permutation testing code in GridQTL. Some code is available at http://user.it.uu.se/~kl/qtl_software.html.
- MapFastR - a generalized R package for outbred population analysis, including a fork of cnF2freq.