Skip to main content
Department of Information Technology

cnF2freq

cnF2freq (originally same-Chromosome N-loci F2 FREQuencies) is our experimental codebase for analyzing genotype data with a known pedigree structure in different ways. The code has been used to compute line-origin probabilities in outbred and inbred F2 lines, determine phasing (haplotypes) of markers in different pedigree structures, (re)compute sex-specific marker distances based on the Haldane mapping function, and computing the most probable genotype assignments for missing markers as a form of genotype imputation in pedigrees.

All versions of the code are available under a BSD-style license, making it freely available for commercial as well as non-commercial use. We naturally expect use in academic contexts to be accompanied by the proper references to the original work, though.

Unique features

The most crucial distinctive feature of cnF2freq is that a large pedigree is separated into separate "focus pedigrees" of a single individual and one or two generations of ancestors.

Between focus pedigrees, all per-marker and per-individual parameters are shared. The main parameters are the skewness (phase) and sureness (probability of allele error). Other phasing schemes tend to treat phase as a binary variable, doing some kind of Markov sampling to explore different assignments. In our approach, we initialize the phase to 0.5 and then iteratively update it using a modified Baum-Welch algorithm, a standard expectation-maximization approach for Hidden Markov Models. This has been shown to give superior results in complex pedigrees.

qtlmas15ped2.png

Available versions of the code

The code exists in several editions tailored for different datasets and experiments. A specific fork for only computing line genotype probabilities is in the process of being released as an R software package. The currently actively maintained branches are available on github. The suggested branch for current use is plantimpute_modern.

The edition adapted for the 14th QTL-MAS workshop is available here (doing haplotyping with known genotypes in a multi-generational pedigree, tested support for recomputing marker maps but disabled in this specific version, parallel with OpenMP as well as MPI).

The edition adapted for the 15th QTL-MAS workshop is available here (doing haplotyping and genotype reconstruction with parental genotype data purposefully removed, including code for comparing results against those from Merlin, parallelization with MPI not enabled).

The boost library of a recent release is required, and a fairly recent C++ compilers. More recent branches of the code will require C++14 support. The Intel C++ compiler is our main platform for large runs (for performance reasons), so that one always tends to work. Please contact Carl Nettelblad regarding specific use cases, or any issues. Again, in general, the plantimpute_modern is the current stable fork. Future work (including great speedups) are found in the experimental Chaplink branch.

Publications

The following publications relate directly to this codebase:

Updated  2017-02-05 11:42:05 by Kurt Otto.