Wheat breeding

Ingeborg Gullikstad Hem ([email protected])

library(makemyprior)

Model

We use a model from Hem et al. (2021); a genomic wheat breeding model with three genetic effects: additive, dominance and epistasis The model is: \[ y_i = \mu + a_i + d_i + x_i + \varepsilon_i, \ i = 1, \dots, 100, \]

where * \(\mu\) is an intercept with a \(\mathcal{N}(0, 1000^2)\) prior, * \(\mathbf{a} = (a_1, \dots, a_{100}) \sim \mathcal{N}_{100}(\mathbf{0}, \sigma_{\mathrm{a}}^2 \mathbf{A})\) is the additive effect, * \(\mathbf{d} = (d_1, \dots, d_{100}) \sim \mathcal{N}_{100}(\mathbf{0}, \sigma_{\mathrm{a}}^2 \mathbf{D})\) is the dominance effect, * \(\mathbf{x} = (x_1, \dots, x_{100}) \sim \mathcal{N}_{100}(\mathbf{0}, \sigma_{\mathrm{a}}^2 \mathbf{X})\) is the epistasis effect, and * \(\varepsilon_i\) is the residual effect with variance \(\sigma_{\varepsilon}^2\).

The covariance matrices \(\mathbf{A}\), \(\mathbf{D}\) and \(\mathbf{X}\) are computed from the single nucleotide polymorphism (SNP) matrix with thousands of genetic markers.

We use the dataset wheat_breeding in makemyprior, which consists of indexes for the effects and precision matrices for each of the genetic effects, and present three priors.

We scale the precision matrices so the corresponding covariance matrices have typical variance equal to 1 with scale_precmat.

wheat_data_scaled <- wheat_data
wheat_data_scaled$Q_a <- scale_precmat(wheat_data$Q_a)
wheat_data_scaled$Q_d <- scale_precmat(wheat_data$Q_d)
wheat_data_scaled$Q_x <- scale_precmat(wheat_data$Q_x)

formula <- y ~
  mc(a, model = "generic0", Cmatrix = Q_a, constr = TRUE) +
  mc(d, model = "generic0", Cmatrix = Q_d, constr = TRUE) +
  mc(x, model = "generic0", Cmatrix = Q_x, constr = TRUE)

We do not carry out inference, as it takes time and will slow down the compilation of the vignettes by a lot, but include code so the user can run the inference themselves.

Prior 1

We do not want to say anything about the total (also called phenotypic) variance. We have expert knowledge saying that we want shrinkage towards the residual effect to avoid overfitting, and the heritability, which is the amount of phenotypic variance attributed to the genetic effects, is around 0.25. The additive, dominance and epistasis effects get about (85, 10, 5)% of the genetic variance according to the expert. The expert is quite sure about these choices, and we use a concentration parameter \(c\), saying how much of the density mass of the prior such that \(\mathrm{Prob}(\mathrm{logit}(1/4) < \mathrm{logit}(\omega_*) - \mathrm{logit}(m) < \mathrm{logit}(3/4)) = c\) for a variance proportion \(\omega_*\).

This information can be implemented as a prior as follows:

prior1 <- make_prior(formula, wheat_data_scaled, prior = list(
  tree = "s1 = (d, x); s2 = (a, s1); s3 = (s2, eps)",
  w = list(s1 = list(prior = "pcM", param = c(0.67, 0.8)),
           s2 = list(prior = "pcM", param = c(0.85, 0.8)),
           s3 = list(prior = "pc0", param = 0.25))))

prior1
#> Model: y ~ mc(a, model = "generic0", Cmatrix = Q_a, constr = TRUE) + 
#>     mc(d, model = "generic0", Cmatrix = Q_d, constr = TRUE) + 
#>     mc(x, model = "generic0", Cmatrix = Q_x, constr = TRUE)
#> Tree structure: d_x = (d,x); a_d_x = (a,d_x); eps_a_d_x = (eps,a_d_x)
#> 
#> Weight priors:
#>  w[d/d_x] ~ PCM(0.67, 0.8)
#>  w[a/a_d_x] ~ PCM(0.85, 0.8)
#>  w[eps/eps_a_d_x] ~ PC1(0.75)
#> Total variance priors:
#>  V[eps_a_d_x] ~ Jeffreys'
plot_prior(prior1) # or plot(prior1)

plot_tree_structure(prior1)

Note that we do not fit the model in this vignette, as it takes some time.

posterior1 <- inference_stan(prior1, iter = 15000, warmup = 5000,
                            chains = 1, seed = 1)

plot_posterior_stan(posterior1, param = "prior", prior = TRUE)

For inference with INLA:

posterior1_inla <- inference_inla(prior1)
plot_posterior_stdev(posterior1_inla)

Prior 2

We have a good intuition on the absolute magnitude of the residual and genetic effects, but since the genetic effects are confounded, we want to use a prior saying that about 80% of the genetic variance is additive effect, and shrinkage towards the additive, and and be ignorant about the division of dominance and epistasis. The residual and genetic effects are both assumed to not be much larger than 3.

prior2 <- make_prior(formula, wheat_data_scaled, prior = list(
  tree = "s1 = (d, x); s2 = (a, s1); (eps)",
  w = list(s1 = list(prior = "dirichlet"),
           s2 = list(prior = "pc1", param = c(0.8))),
  V = list(s2 = list(prior = "pc", param = c(3, 0.05)),
           eps = list(prior = "pc", param = c(3, 0.05)))))

prior2
#> Model: y ~ mc(a, model = "generic0", Cmatrix = Q_a, constr = TRUE) + 
#>     mc(d, model = "generic0", Cmatrix = Q_d, constr = TRUE) + 
#>     mc(x, model = "generic0", Cmatrix = Q_x, constr = TRUE)
#> Tree structure: d_x = (d,x); a_d_x = (a,d_x); (eps)
#> 
#> Weight priors:
#>  w[d/d_x] ~ Dirichlet(2)
#>  w[a/a_d_x] ~ PC1(0.8)
#> Total variance priors:
#>  sqrt(V)[a_d_x] ~ PC0(3, 0.05)
#> Independent variance priors:
#>  sigma[eps] ~ PC0(3, 0.05)
plot_prior(prior2)

plot_tree_structure(prior2)

Again we do not fit the model in this vignette, as it takes some time.

posterior2 <- inference_stan(prior2, iter = 15000, warmup = 5000,
                            chains = 1, seed = 1)

plot_posterior_stan(posterior2, param = "prior", prior = TRUE)
sessionInfo()
#> R version 4.3.2 (2023-10-31)
#> Platform: aarch64-apple-darwin20 (64-bit)
#> Running under: macOS Sonoma 14.1.1
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
#> 
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: Europe/Oslo
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] makemyprior_1.2.2
#> 
#> loaded via a namespace (and not attached):
#>  [1] Matrix_1.6-1.1    gtable_0.3.4      jsonlite_1.8.8    highr_0.10       
#>  [5] dplyr_1.1.4       compiler_4.3.2    promises_1.2.1    tidyselect_1.2.0 
#>  [9] Rcpp_1.0.12       later_1.3.2       jquerylib_0.1.4   splines_4.3.2    
#> [13] scales_1.3.0      yaml_2.3.8        fastmap_1.1.1     mime_0.12        
#> [17] lattice_0.21-9    ggplot2_3.4.4     R6_2.5.1          labeling_0.4.3   
#> [21] shinyjs_2.1.0     generics_0.1.3    knitr_1.45        htmlwidgets_1.6.4
#> [25] visNetwork_2.1.2  MASS_7.3-60       tibble_3.2.1      munsell_0.5.0    
#> [29] shiny_1.8.0       bslib_0.6.1       pillar_1.9.0      rlang_1.1.3      
#> [33] utf8_1.2.4        cachem_1.0.8      httpuv_1.6.14     xfun_0.42        
#> [37] sass_0.4.8        cli_3.6.2         withr_3.0.0       magrittr_2.0.3   
#> [41] digest_0.6.34     grid_4.3.2        xtable_1.8-4      rstudioapi_0.14  
#> [45] lifecycle_1.0.4   vctrs_0.6.5       evaluate_0.23     glue_1.7.0       
#> [49] farver_2.1.1      fansi_1.0.6       colorspace_2.1-0  rmarkdown_2.25   
#> [53] ellipsis_0.3.2    tools_4.3.2       pkgconfig_2.0.3   htmltools_0.5.7

References

Hem, Ingeborg Gullikstad, Maria Lie Selle, Gregor Gorjanc, Geir-Arne Fuglstad, and Andrea Riebler. 2021. Robust modeling of additive and nonadditive variation with intuitive inclusion of expert knowledge.” Genetics, January.