Licentiate thesis 2010-002

Using Markov Models and a Stochastic Lipschitz Condition for Genetic Analyses

Carl Nettelblad

19 March 2010


A proper understanding of biological processes requires an understanding of genetics and evolutionary mechanisms. The vast amounts of genetical information that can routinely be extracted with modern technology have so far not been accompanied by an equally extended understanding of the corresponding processes.

The relationship between a single gene and the resulting properties, phenotype of an individual is rarely clear. This thesis addresses several computational challenges regarding identifying and assessing the effects of quantitative trait loci (QTL), genomic positions where variation is affecting a trait. The genetic information available for each individual is rarely complete, meaning that the unknown variable of the genotype in the loci modelled also needs to be addressed. This thesis contains the presentation of new tools for employing the information that is available in a way that maximizes the information used, by using hidden Markov models (HMMs), resulting in a change in algorithm runtime complexity from exponential to log-linear, in terms of the number of markers. It also proposes the introduction of inferred haplotypes to further increase the power to assess these unknown variables for pedigrees of related genetically diverse individuals. Modelling consequences of partial genetic information are also treated.

Furthermore, genes are not directly affecting traits, but are rather expressed in the environment of and in concordance with other genes. Therefore, significant interactions can be expected within genes, where some combination of genetic variation gives a pronounced, or even opposite, effect, compared to when occurring separately. This thesis addresses how to perform efficient scans for multiple interacting loci, as well as how to derive highly accurate empirical significance tests in these settings. This is done by analyzing the mathematical properties of the objective function describing the quality of model fits, and reformulating it through a simple transformation. Combined with the presented prototype of a problem-solving environment, these developments can make multi-dimensional searches for QTL routine, allowing the pursuit of new biological insight.

Available as PDF (627 kB)

Download BibTeX entry.