Multitrait Genotype-Ideotype Distance Index

Computes the multi-trait genotype-ideotype distance index, MGIDI, (Olivoto and Nardino, 2020), used to select genotypes in plant breeding programs based on multiple traits.The MGIDI index is computed as follows: MGIDI_i = _j = 1^f(F_ij - F_j)^2

where MGIDI_i is the multi-trait genotype-ideotype distance index for the ith genotype; F_ij is the score of the ith genotype in the jth factor (i = 1, 2, ..., g; j = 1, 2, ..., f), being g and f the number of genotypes and factors, respectively, and F_j is the jth score of the ideotype. The genotype with the lowest MGIDI is then closer to the ideotype and therefore should presents desired values for all the analyzed traits.

Usage

mgidi(
  .data,
  use_data = "blup",
  SI = 15,
  mineval = 1,
  ideotype = NULL,
  weights = NULL,
  use = "complete.obs",
  verbose = TRUE
)

Arguments

.data: An object fitted with the function gafem(), gamem() or a two-way table with BLUPs for genotypes in each trait (genotypes in rows and traits in columns). In the last case, the first column is assumed to have the genotype's name.
use_data: Define which data to use if .data is an object of class gamem. Defaults to "blup" (the BLUPs for genotypes). Use "pheno" to use phenotypic means instead BLUPs for computing the index.
SI: An integer (0-100). The selection intensity in percentage of the total number of genotypes.
mineval: The minimum value so that an eigenvector is retained in the factor analysis.
ideotype: A vector of length nvar where nvar is the number of traits used to plan the ideotype. Use 'h' to indicate the traits in which higher values are desired or 'l' to indicate the traits in which lower values are desired. For example, ideotype = c("h, h, h, h, l") will consider that the ideotype has higher values for the first four traits and lower values for the last trait. ALternatively, one can use a mixed vector, indicating both h/l values and a numeric value for the target trait(s), eg., ideotype = c("120, h, 30, h, l"). In this scenario, a numeric value to define the ideotype is declared for the first and third traits. For this traits, the absolute difference between the observed value and the numeric ideotype will be computed, and after the rescaling procedure, the genotype with the smallest difference will have 100. If .datais a model fitted with the functions gafem() or gamem(), the order of the traits will be the declared in the argument resp in those functions.
weights: Optional weights to assign for each trait in the selection process. It must be a numeric vector of length equal to the number of traits in .data. By default (NULL) a numeric vector of weights equal to 1 is used, i.e., all traits have the same weight in the selection process. It is suggested weights ranging from 0 to 1. The weights will then shrink the ideotype vector toward 0. This is useful, for example, to prioritize grain yield rather than a plant-related trait in the selection process.
use: The method for computing covariances in the presence of missing values. Defaults to complete.obs, i.e., missing values are handled by casewise deletion.
verbose: If verbose = TRUE (Default) then some results are shown in the console.

Value

An object of class mgidi with the following items:

data The data used to compute the factor analysis.
cormat The correlation matrix among the environments.
PCA The eigenvalues and explained variance.
FA The factor analysis.
KMO The result for the Kaiser-Meyer-Olkin test.
MSA The measure of sampling adequacy for individual variable.
communalities The communalities.
communalities_mean The communalities' mean.
initial_loadings The initial loadings.
finish_loadings The final loadings after varimax rotation.
canonical_loadings The canonical loadings.
scores_gen The scores for genotypes in all retained factors.
scores_ide The scores for the ideotype in all retained factors.
gen_ide The distance between the scores of each genotype with the ideotype.
MGIDI The multi-trait genotype-ideotype distance index.
contri_fac The relative contribution of each factor on the MGIDI value. The lower the contribution of a factor, the close of the ideotype the variables in such factor are.
contri_fac_rank, contri_fac_rank_sel The rank for the contribution of each factor for all genotypes and selected genotypes, respectively.
complementarity The complementarity matrix, which is the Euclidean distance between selected genotypes based on the contribution of each factor on the MGIDI index (waiting reference).
sel_dif The selection differential for the variables.
stat_gain A descriptive statistic for the selection gains. The minimum, mean, confidence interval, standard deviation, maximum, and sum of selection gain values are computed. If traits have negative and positive desired gains, the statistics are computed for by strata.
sel_gen The selected genotypes.

References

Olivoto, T., and Nardino, M. (2020). MGIDI: toward an effective multivariate selection in biological experiments. Bioinformatics. doi:10.1093/bioinformatics/btaa981

Author

Tiago Olivoto tiagoolivoto@gmail.com

Examples

# \donttest{
library(metan)

# simulate a data set
# 10 genotypes
# 5 replications
# 4 traits
df <-
 g_simula(ngen = 10,
          nrep = 5,
          nvars = 4,
          gen_eff = 35,
          seed = c(1, 2, 3, 4))
#> Warning: 'gen_eff = 35' recycled for all the 4 traits.
#> Warning: 'rep_eff = 5' recycled for all the 4 traits.
#> Warning: 'res_eff = 5' recycled for all the 4 traits.
#> Warning: 'intercept = 100' recycled for all the 4 traits.

# run a mixed-effect model (genotype as random effect)
mod <-
  gamem(df,
        gen = GEN,
        rep = REP,
        resp = everything())
#> Evaluating trait V1 |===========                                 | 25% 00:00:00 
Evaluating trait V2 |======================                      | 50% 00:00:00 
Evaluating trait V3 |=================================           | 75% 00:00:00 
Evaluating trait V4 |============================================| 100% 00:00:00 

#> Method: REML/BLUP
#> Random effects: GEN
#> Fixed effects: REP
#> Denominador DF: Satterthwaite's method
#> ---------------------------------------------------------------------------
#> P-values for Likelihood Ratio Test of the analyzed traits
#> ---------------------------------------------------------------------------
#>     model       V1       V2       V3       V4
#>  Complete       NA       NA       NA       NA
#>  Genotype 2.06e-24 5.97e-25 4.33e-17 1.75e-21
#> ---------------------------------------------------------------------------
#> All variables with significant (p < 0.05) genotype effect
# BLUPs for genotypes
gmd(mod, "blupg")
#> Class of the model: gamem
#> Variable extracted: blupg
#> # A tibble: 10 × 5
#>    GEN      V1    V2    V3    V4
#>    <chr> <dbl> <dbl> <dbl> <dbl>
#>  1 H1     84.4  75.0  75.6 109. 
#>  2 H10    68.9 103.  106.   70.5
#>  3 H2     89.0 112.  124.   67.7
#>  4 H3    104.  102.   91.9  83.9
#>  5 H4    129.   76.8  89.7  86.2
#>  6 H5     76.3 134.  105.  121. 
#>  7 H6    129.  127.  109.   85.6
#>  8 H7    133.   75.7  73.0 117. 
#>  9 H8    107.  123.   89.5 129. 
#> 10 H9    111.   95.5 107.  131. 

# Compute the MGIDI index
# Default options (all traits with positive desired gains)
# Equal weights for all traits
mgidi_ind <- mgidi(mod)
#> 
#> -------------------------------------------------------------------------------
#> Principal Component Analysis
#> -------------------------------------------------------------------------------
#> # A tibble: 4 × 4
#>   PC    Eigenvalues `Variance (%)` `Cum. variance (%)`
#>   <chr>       <dbl>          <dbl>               <dbl>
#> 1 PC1          1.99          49.8                 49.8
#> 2 PC2          1.01          25.2                 75.0
#> 3 PC3          0.78          19.5                 94.5
#> 4 PC4          0.22           5.51               100  
#> -------------------------------------------------------------------------------
#> Factor Analysis - factorial loadings after rotation-
#> -------------------------------------------------------------------------------
#> # A tibble: 4 × 5
#>   VAR     FA1   FA2 Communality Uniquenesses
#>   <chr> <dbl> <dbl>       <dbl>        <dbl>
#> 1 V1     0.57  0.19        0.37         0.63
#> 2 V2    -0.91  0.18        0.86         0.14
#> 3 V3    -0.81 -0.41        0.82         0.18
#> 4 V4     0.1   0.97        0.96         0.04
#> -------------------------------------------------------------------------------
#> Comunalit Mean: 0.7502531 
#> -------------------------------------------------------------------------------
#> Selection differential 
#> -------------------------------------------------------------------------------
#> # A tibble: 4 × 11
#>   VAR   Factor    Xo    Xs      SD  SDperc    h2      SG  SGperc sense     goal
#>   <chr> <chr>  <dbl> <dbl>   <dbl>   <dbl> <dbl>   <dbl>   <dbl> <chr>    <dbl>
#> 1 V1    FA1    103.   91.8 -11.5   -11.1   0.992 -11.4   -11.1   increase     0
#> 2 V2    FA1    102.  128.   26.2    25.6   0.993  26.0    25.4   increase   100
#> 3 V3    FA1     97.1  97.3   0.204   0.211 0.979   0.200   0.206 increase   100
#> 4 V4    FA2    100.  125.   24.8    24.8   0.989  24.5    24.5   increase   100
#> ------------------------------------------------------------------------------
#> Selected genotypes
#> -------------------------------------------------------------------------------
#> H5 H8
#> -------------------------------------------------------------------------------
gmd(mgidi_ind, "MGIDI")
#> Class of the model: mgidi
#> Variable extracted: MGIDI
#> # A tibble: 10 × 2
#>    Genotype MGIDI
#>    <chr>    <dbl>
#>  1 H5       0.323
#>  2 H8       0.861
#>  3 H9       1.24 
#>  4 H6       1.71 
#>  5 H3       2.37 
#>  6 H1       2.50 
#>  7 H10      2.69 
#>  8 H2       2.75 
#>  9 H7       2.95 
#> 10 H4       3.24 

# Higher weight for traits V1 and V4
# This will increase the probability of selecting H7 and H9
# 30% selection pressure
mgidi_ind2 <-
   mgidi(mod,
         weights = c(1, .2, .2, 1),
         SI = 30)
#> 
#> -------------------------------------------------------------------------------
#> Principal Component Analysis
#> -------------------------------------------------------------------------------
#> # A tibble: 4 × 4
#>   PC    Eigenvalues `Variance (%)` `Cum. variance (%)`
#>   <chr>       <dbl>          <dbl>               <dbl>
#> 1 PC1          1.99          49.8                 49.8
#> 2 PC2          1.01          25.2                 75.0
#> 3 PC3          0.78          19.5                 94.5
#> 4 PC4          0.22           5.51               100  
#> -------------------------------------------------------------------------------
#> Factor Analysis - factorial loadings after rotation-
#> -------------------------------------------------------------------------------
#> # A tibble: 4 × 5
#>   VAR     FA1   FA2 Communality Uniquenesses
#>   <chr> <dbl> <dbl>       <dbl>        <dbl>
#> 1 V1     0.57  0.19        0.37         0.63
#> 2 V2    -0.91  0.18        0.86         0.14
#> 3 V3    -0.81 -0.41        0.82         0.18
#> 4 V4     0.1   0.97        0.96         0.04
#> -------------------------------------------------------------------------------
#> Comunalit Mean: 0.7502531 
#> -------------------------------------------------------------------------------
#> Selection differential 
#> -------------------------------------------------------------------------------
#> # A tibble: 4 × 11
#>   VAR   Factor    Xo    Xs     SD SDperc    h2     SG SGperc sense     goal
#>   <chr> <chr>  <dbl> <dbl>  <dbl>  <dbl> <dbl>  <dbl>  <dbl> <chr>    <dbl>
#> 1 V1    FA1    103.  110.    6.37   6.17 0.992   6.32   6.12 increase   100
#> 2 V2    FA1    102.   82.1 -20.3  -19.8  0.993 -20.1  -19.7  increase     0
#> 3 V3    FA1     97.1  85.0 -12.0  -12.4  0.979 -11.8  -12.1  increase     0
#> 4 V4    FA2    100.  119.   18.9   18.9  0.989  18.6   18.6  increase   100
#> ------------------------------------------------------------------------------
#> Selected genotypes
#> -------------------------------------------------------------------------------
#> H7 H1 H9
#> -------------------------------------------------------------------------------
gmd(mgidi_ind2, "MGIDI")
#> Class of the model: mgidi
#> Variable extracted: MGIDI
#> # A tibble: 10 × 2
#>    Genotype MGIDI
#>    <chr>    <dbl>
#>  1 H7       0.811
#>  2 H1       0.978
#>  3 H9       1.12 
#>  4 H8       1.44 
#>  5 H3       1.82 
#>  6 H4       1.88 
#>  7 H6       2.04 
#>  8 H5       2.48 
#>  9 H10      2.87 
#> 10 H2       3.22 

# plot the contribution of each factor on the MGIDI index
p1 <- plot(mgidi_ind, type = "contribution")
p2 <- plot(mgidi_ind2, type = "contribution")
p1 + p2


# Negative desired gains for V1
# Positive desired gains for V2, V3 and V4
mgidi_ind3 <-
  mgidi(mod,
       ideotype = c("h, h, h, l"))
#> 
#> -------------------------------------------------------------------------------
#> Principal Component Analysis
#> -------------------------------------------------------------------------------
#> # A tibble: 4 × 4
#>   PC    Eigenvalues `Variance (%)` `Cum. variance (%)`
#>   <chr>       <dbl>          <dbl>               <dbl>
#> 1 PC1          1.99          49.8                 49.8
#> 2 PC2          1.01          25.2                 75.0
#> 3 PC3          0.78          19.5                 94.5
#> 4 PC4          0.22           5.51               100  
#> -------------------------------------------------------------------------------
#> Factor Analysis - factorial loadings after rotation-
#> -------------------------------------------------------------------------------
#> # A tibble: 4 × 5
#>   VAR     FA1   FA2 Communality Uniquenesses
#>   <chr> <dbl> <dbl>       <dbl>        <dbl>
#> 1 V1     0.57  0.19        0.37         0.63
#> 2 V2    -0.91  0.18        0.86         0.14
#> 3 V3    -0.81 -0.41        0.82         0.18
#> 4 V4    -0.1  -0.97        0.96         0.04
#> -------------------------------------------------------------------------------
#> Comunalit Mean: 0.7502531 
#> -------------------------------------------------------------------------------
#> Selection differential 
#> -------------------------------------------------------------------------------
#> # A tibble: 4 × 11
#>   VAR   Factor    Xo    Xs     SD SDperc    h2     SG SGperc sense     goal
#>   <chr> <chr>  <dbl> <dbl>  <dbl>  <dbl> <dbl>  <dbl>  <dbl> <chr>    <dbl>
#> 1 V1    FA1    103.   78.9 -24.3  -23.6  0.992 -24.2  -23.4  increase     0
#> 2 V2    FA1    102.  107.    4.98   4.87 0.993   4.94   4.83 increase   100
#> 3 V3    FA1     97.1 115.   18.2   18.8  0.979  17.9   18.4  increase   100
#> 4 V4    FA2    100.   69.1 -30.9  -30.9  0.989 -30.6  -30.6  decrease   100
#> ------------------------------------------------------------------------------
#> Selected genotypes
#> -------------------------------------------------------------------------------
#> H2 H10
#> -------------------------------------------------------------------------------


# Extract the BLUPs for each genotype
(blupsg <- gmd(mod, "blupg"))
#> Class of the model: gamem
#> Variable extracted: blupg
#> # A tibble: 10 × 5
#>    GEN      V1    V2    V3    V4
#>    <chr> <dbl> <dbl> <dbl> <dbl>
#>  1 H1     84.4  75.0  75.6 109. 
#>  2 H10    68.9 103.  106.   70.5
#>  3 H2     89.0 112.  124.   67.7
#>  4 H3    104.  102.   91.9  83.9
#>  5 H4    129.   76.8  89.7  86.2
#>  6 H5     76.3 134.  105.  121. 
#>  7 H6    129.  127.  109.   85.6
#>  8 H7    133.   75.7  73.0 117. 
#>  9 H8    107.  123.   89.5 129. 
#> 10 H9    111.   95.5 107.  131. 

# Consider the following ideotype that will be close to H4
# Define a numeric ideotype for the first three traits, and the lower values
# for the last trait
ideotype <- c("129.46, 76.8, 89.7, l")

mgidi_ind4 <-
  mgidi(mod,
       ideotype = ideotype)
#> 
#> -------------------------------------------------------------------------------
#> Principal Component Analysis
#> -------------------------------------------------------------------------------
#> # A tibble: 4 × 4
#>   PC    Eigenvalues `Variance (%)` `Cum. variance (%)`
#>   <chr>       <dbl>          <dbl>               <dbl>
#> 1 PC1          1.62           40.5                40.5
#> 2 PC2          1.06           26.4                67.0
#> 3 PC3          0.68           17.1                84.1
#> 4 PC4          0.64           15.9               100  
#> -------------------------------------------------------------------------------
#> Factor Analysis - factorial loadings after rotation-
#> -------------------------------------------------------------------------------
#> # A tibble: 4 × 5
#>   VAR     FA1   FA2 Communality Uniquenesses
#>   <chr> <dbl> <dbl>       <dbl>        <dbl>
#> 1 V1    -0.68 -0.34        0.57         0.43
#> 2 V2    -0.85  0.14        0.73         0.27
#> 3 V3    -0.41 -0.67        0.61         0.39
#> 4 V4    -0.12  0.87        0.77         0.23
#> -------------------------------------------------------------------------------
#> Comunalit Mean: 0.6698633 
#> -------------------------------------------------------------------------------
#> Selection differential 
#> -------------------------------------------------------------------------------
#> # A tibble: 4 × 11
#>   VAR   Factor    Xo    Xs     SD SDperc    h2     SG SGperc sense     goal
#>   <chr> <chr>  <dbl> <dbl>  <dbl>  <dbl> <dbl>  <dbl>  <dbl> <chr>    <dbl>
#> 1 V1    FA1    103.  131.   28.0   27.2  0.992  27.8   26.9  decrease     0
#> 2 V2    FA1    102.   76.3 -26.1  -25.5  0.993 -25.9  -25.3  decrease   100
#> 3 V3    FA2     97.1  81.4 -15.7  -16.2  0.979 -15.4  -15.9  decrease   100
#> 4 V4    FA2    100.  101.    1.41   1.41 0.989   1.39   1.39 decrease     0
#> ------------------------------------------------------------------------------
#> Selected genotypes
#> -------------------------------------------------------------------------------
#> H4 H7
#> -------------------------------------------------------------------------------

# Note how the strenghts of H4 are related to FA1 (V1 and V2)
plot(mgidi_ind4, type = "contribution", genotypes = "all")


# }