Mapping IR absorbance in wheat seeds to crucial properties using machine learning
From Lantmännen SW Seed, we have access to data on infrared absorbance from wheat seeds. IR absorbance is a relatively straightforward to analyze samples, compared to other forms of chemical analyses for e.g. determining protein content, or even grinding a large amount of samples to flour in order to assess the baking properties of a specific variety.
Thus, being able to predict these more complex properties directly from IR spectra would benefit the breeding process. A small amount of sample seeds would be enough and the analysis method is cost-effective.
In total, we have access to 288 spectra for 100 wavelengths and roughly 20 more complex quantities of interest, that one would ideally want to predict based on the spectra.
Two main avenues are available:
a) Dimensionality reduction of spectra using unsupervised methods.
The spectra contain a high number of data points, but one would expect there to be strong correlations. Projecting the spectra into a lower-dimensional structure would help exploring correlations.
b) Direct prediction using supervised methods.
Try machine learning approaches for directly predicting the quantities of interest from the underlying data.
Various techniques can be applicable, ranging from support vector machines (SVM) and principal-component analysis (PCA), to more recent deep learning approaches. Access can be provided to run deep learning experiments at the national infrastructure Alvis (https://www.c3se.chalmers.se/about/Alvis/).