Cross-validation for blup prediction.
This function provides a cross-validation procedure for mixed models using replicate-based data. By default, complete blocks are randomly selected within each environment. In each iteration, the original dataset is split up into two datasets: training and validation data. The 'training' set has all combinations (genotype x environment) with R - 1 replications. The 'validation' set has the remaining replication. The estimated values are compared with the 'validation' data and the Root Means Square Prediction Difference (Olivoto et al. 2019) is computed. At the end of boots, a list is returned.
Usage
cv_blup(
.data,
env,
gen,
rep,
resp,
block = NULL,
nboot = 200,
random = "gen",
verbose = TRUE
)
Arguments
- .data
The dataset containing the columns related to Environments, Genotypes, replication/block and response variable(s).
- env
The name of the column that contains the levels of the environments.
- gen
The name of the column that contains the levels of the genotypes.
- rep
The name of the column that contains the levels of the replications/blocks. AT LEAST THREE REPLICATES ARE REQUIRED TO PERFORM THE CROSS-VALIDATION.
- resp
The response variable.
- block
Defaults to
NULL
. In this case, a randomized complete block design is considered. If block is informed, then a resolvable alpha-lattice design (Patterson and Williams, 1976) is employed. See how fixed and random effects are considered, see the section Details.- nboot
The number of resamples to be used in the cross-validation. Defaults to 200
- random
The effects of the model assumed to be random. See Details for more information.
- verbose
A logical argument to define if a progress bar is shown. Default is
TRUE
.
Value
An object of class cv_blup
with the following items: *
RMSPD: A vector with nboot-estimates of the root mean squared
prediction difference between predicted and validating data. *
RMSPDmean The mean of RMSPDmean estimates.
Details
Six models may be fitted depending upon the values in block
and random
arguments.
Model 1:
block = NULL
andrandom = "gen"
(The default option). This model considers a Randomized Complete Block Design in each environment assuming genotype and genotype-environment interaction as random effects. Environments and blocks nested within environments are assumed to fixed factors.Model 2:
block = NULL
andrandom = "env"
. This model considers a Randomized Complete Block Design in each environment treating environment, genotype-environment interaction, and blocks nested within environments as random factors. Genotypes are assumed to be fixed factors.Model 3:
block = NULL
andrandom = "all"
. This model considers a Randomized Complete Block Design in each environment assuming a random-effect model, i.e., all effects (genotypes, environments, genotype-vs-environment interaction and blocks nested within environments) are assumed to be random factors.Model 4:
block
is notNULL
andrandom = "gen"
. This model considers an alpha-lattice design in each environment assuming genotype, genotype-environment interaction, and incomplete blocks nested within complete replicates as random to make use of inter-block information (Mohring et al., 2015). Complete replicates nested within environments and environments are assumed to be fixed factors.Model 5:
block
is notNULL
andrandom = "env"
. This model considers an alpha-lattice design in each environment assuming genotype as fixed. All other sources of variation (environment, genotype-environment interaction, complete replicates nested within environments, and incomplete blocks nested within replicates) are assumed to be random factors.Model 6:
block
is notNULL
andrandom = "all"
. This model considers an alpha-lattice design in each environment assuming all effects, except the intercept, as random factors.
IMPORTANT: An error is returned if any combination of genotype-environment has a different number of replications than observed in the trial.
References
Olivoto, T., A.D.C. Lúcio, J.A.G. da silva, V.S. Marchioro, V.Q. de Souza, and E. Jost. 2019. Mean performance and stability in multi-environment trials I: Combining features of AMMI and BLUP techniques. Agron. J. 111:2949-2960. doi:10.2134/agronj2019.03.0220
Patterson, H.D., and E.R. Williams. 1976. A new class of resolvable incomplete block designs. Biometrika 63:83-92.
Mohring, J., E. Williams, and H.-P. Piepho. 2015. Inter-block information: to recover or not to recover it? TAG. Theor. Appl. Genet. 128:1541-54. doi:10.1007/s00122-015-2530-0
Author
Tiago Olivoto tiagoolivoto@gmail.com
Examples
# \donttest{
library(metan)
model <- cv_blup(data_ge,
env = ENV,
gen = GEN,
rep = REP,
resp = GY,
nboot = 5)
#> Validating 1 of 5 sets |======== | 20% 00:00:00
Validating 2 of 5 sets |================ | 40% 00:00:01
Validating 3 of 5 sets |========================= | 60% 00:00:01
Validating 4 of 5 sets |================================= | 80% 00:00:02
Validating 5 of 5 sets |=========================================| 100% 00:00:03
# }