# Numerical methods for genetic analysis of complex traits

### Participants

- Sverker Holmgren (Professor in Scientific Computing)
- Behrang Mahjani (PhD student. Principal advisor: S. Holmgren.)
- Carl Nettelblad (PhD in Scientific Computing, thesis 2012, Uppsala University, Principal advisor: S. Holmgren. Co-advisor: J. Alvarez Castro, SLU)
- Salman Toor (PhD in Scientific Computing, Postdoctoral Scholar)
- Lars Rönnegård(Professor, Dalarna University, Section of Statistics)

### Alumni

- Mahen Jayawardena (PhD thesis 2010, Also at University of Colombo School of Computing, Sri Lanka. Principal advisor: S. Holmgren. Co-advisors: Ö. Carlborg, SLU and R. Weerasinghe, UCSC)
- Kateryna Mishchenko (PhD thesis 2008, Mälardalen University College. Principal advisor: S. Holmgren. Co-advisors: L. Rönnegård, LCB and D. Silvestrov, MdH).
- Kajsa Harling (f. Ljungberg) (PhD thesis 2005, Principal advisor: S. Holmgren. Co-advisor: Örjan Carlborg)

The group collaborates closely with Örjan Carlborg's group at the Swedish University of Agricultural Sciences (SLU) and Lars Rönnegård at the Statistics group of Dalarna University. The group also has contacts with the research groups of Leif Andesson and Dietrich von Rosen at SLU.

### Research

**Background and early works:** Most of the important traits in humans, animals and plants are quantitative traits which are the traits exhibit a continuous phenotype distribution. Both genetic composition and environmental factors affect these traits. In other words, you may not have a trait (or phenotype) even if you carry its genes; environmental factors play an important role to get the traits. The genetic regions in the genome that describe the genetic architecture of a quantitative trait are called quantitative trait loci (QTL). One needs a "suitable" statistical model to capture the effect of both genetic and environmental factors to find a QTL position. The goal in this area of research is to define and explore "suitable" models and then locate the QTL.

In a QTL search, one should repeat the evaluation of the statistical model, for a large set of candidate positions in the genome, to determine the QTL locations that fit the model best. Mathematically, this corresponds to solving a global optimization problem using some optimization scheme.

Some of the early works in our research group are:

*Numerical methods for computing the QTL model fit:* Different types of statistical models are used for evaluating the fit of a given set of QTL positions. In the most straight-forward scheme a linear model is used and the residual sum of squares is computed by solving a least-squares problem. Using alternative settings, a weighted least-square or a non-linear maximum-likelihood problem are solved. Introducing orthogonal model might facilitate model selection, i.e. selecting not only the most likely QTL positions, but through automated means determining the total number of QTL and where significance interactions can be found. The use of variance component models is rapidly increasing in the field of QTL analysis, and in this case a rather demanding non-linear optimization problem must be solved for each set of QTL positions. Using the structure of the QTL analysis problems, efficient algorithms for different types of model fit computations and model selection settings are derived, analyzed and implemented.

*High Performance Implementations:* The computations are demanding for models with multiple QTL, even one uses efficient algorithms. Parallel implementations on different architectures has been developed and studied.

*HMMs for genotype probabilities:* We have presented the tool cnF2freq to provide genotype and haplotype probabilities based on Hidden Markov Models.

**Active projects:**

- Optimization methods for simultaneous search of multiple QTL
- Sparse matrix techniques in statistical genetics
- Map-Reduce programming model for QTL applications

### Publications

Full list of publications and conference contributions can be find here.

### Software

- cnF2freq: The numerical methods developed in the project are implemented in C and Matlab. Some of the codes have been incorporated into the publicly available and widely used WebQTL and R/qtl packages. Some code is available at the Software page. We also have a specific page with information and code related to the cnF2freq codebase in different versions.

- MapFastR