Cross-validation for estimation of AMMI models
THe original dataset is split into two datasets: training set and validation
set. The 'training' set has all combinations (genotype x environment) with
N-1 replications. The 'validation' set has the remaining replication. The
splitting of the dataset into modeling and validation sets depends on the
design informed. For Completely Randomized Block Design (default), and
alpha-lattice design (declaring block
arguments), complete replicates
are selected within environments. The remained replicate serves as validation
data. If design = 'RCD'
is informed, completely randomly samples are
made for each genotype-by-environment combination (Olivoto et al. 2019). The
estimated values considering naxis
-Interaction Principal Component
Axis are compared with the 'validation' data. The Root Mean Square Prediction
Difference (RMSPD) is computed. At the end of boots, a list is returned.
IMPORTANT: If the data set is unbalanced (i.e., any genotype missing in any environment) the function will return an error. An error is also observed if any combination of genotype-environment has a different number of replications than observed in the trial.
Usage
cv_ammi(
.data,
env,
gen,
rep,
resp,
block = NULL,
naxis = 2,
nboot = 200,
design = "RCBD",
verbose = TRUE
)
Arguments
- .data
The dataset containing the columns related to Environments, Genotypes, replication/block and response variable(s).
- env
The name of the column that contains the levels of the environments.
- gen
The name of the column that contains the levels of the genotypes.
- rep
The name of the column that contains the levels of the replications/blocks. AT LEAST THREE REPLICATES ARE REQUIRED TO PERFORM THE CROSS-VALIDATION.
- resp
The response variable.
- block
Defaults to
NULL
. In this case, a randomized complete block design is considered. If block is informed, then a resolvable alpha-lattice design (Patterson and Williams, 1976) is employed. All effects, except the error, are assumed to be fixed.- naxis
The number of axis to be considered for estimation of GE effects.
- nboot
The number of resamples to be used in the cross-validation. Defaults to 200.
- design
The experimental design. Defaults to
RCBD
(Randomized complete Block Design). For Completely Randomized Designs informdesign = 'CRD'
.- verbose
A logical argument to define if a progress bar is shown. Default is
TRUE
.
Value
An object of class cv_ammi
with the following items: *
RMSPD: A vector with nboot-estimates of the Root Mean Squared
Prediction Difference between predicted and validating data.
RMSPDmean: The mean of RMSPDmean estimates.
Estimated: A data frame that contain the values (predicted, observed, validation) of the last loop.
Modeling: The dataset used as modeling data in the last loop
Testing: The dataset used as testing data in the last loop.
References
Olivoto, T., A.D.C. Lúcio, J.A.G. da silva, V.S. Marchioro, V.Q. de Souza, and E. Jost. 2019. Mean performance and stability in multi-environment trials I: Combining features of AMMI and BLUP techniques. Agron. J. 111:2949-2960. doi:10.2134/agronj2019.03.0220
Patterson, H.D., and E.R. Williams. 1976. A new class of resolvable incomplete block designs. Biometrika 63:83-92.
Author
Tiago Olivoto tiagoolivoto@gmail.com
Examples
# \donttest{
library(metan)
model <- cv_ammi(data_ge,
env = ENV,
gen = GEN,
rep = REP,
resp = GY,
nboot = 5,
naxis = 2)
#> Validating 1 of 5 sets |======== | 20% 00:00:00
Validating 2 of 5 sets |================ | 40% 00:00:01
Validating 3 of 5 sets |========================= | 60% 00:00:01
Validating 4 of 5 sets |================================= | 80% 00:00:02
Validating 5 of 5 sets |=========================================| 100% 00:00:03
# }