Utilities for data organization — utils_data

Useful function for data organization before statistical analysis

add_seq_block(): Add a column with sequential block numeration in multi-environment data sets.
recode_factor(): Recode a factor column. A sequential numbering (with possible prefix) is used to identify each level.
df_to_selegen_54(): Given a multi-environment data with environment, genotype, and replication, format the data to be used in the Selegen software (model 54).

Usage

add_seq_block(data, env, rep, new_factor = BLOCK, prefix = "", verbose = TRUE)

recode_factor(data, factor, new_factor = CODE, prefix = "", verbose = TRUE)

df_to_selegen_54(data, env, gen, rep, verbose = TRUE)

Arguments

data: A data frame.
env: The name of the column that contains the levels of the environments.
rep: The name of the column that contains the levels of the replications/blocks.
new_factor: The name of the new column created.
prefix: An optional prefix to bind with the new factor.
verbose: Logical argument. If verbose = FALSE the code will run silently.
factor: A column to recode.
gen: The name of the column that contains the levels of the genotypes, that will be treated as random effect.

References

Resende, M.D. V. 2016. Software Selegen-REML/BLUP: a useful tool for plant breeding. Crop Breed. Appl. Biotechnol. 16(4): 330–339. doi:10.1590/1984-70332016v16n4a49 .

Author

Tiago Olivoto tiagoolivoto@gmail.com

Examples

# \donttest{
library(metan)
df_ge <- ge_simula(ngen = 2,
                   nenv = 3,
                   nrep = 2) %>%
         add_cols(ENV = c(rep("CACIQUE", 4),
                          rep("FREDERICO", 4),
                          rep("SANTA_MARIA", 4)))
df_ge
#> # A tibble: 12 × 4
#>    ENV         GEN   REP      V1
#>    <chr>       <fct> <fct> <dbl>
#>  1 CACIQUE     H1    B1     91.6
#>  2 CACIQUE     H1    B2     79.5
#>  3 CACIQUE     H2    B1    128. 
#>  4 CACIQUE     H2    B2    123. 
#>  5 FREDERICO   H1    B1     83.3
#>  6 FREDERICO   H1    B2     84.4
#>  7 FREDERICO   H2    B1    113. 
#>  8 FREDERICO   H2    B2     95.2
#>  9 SANTA_MARIA H1    B1     98.3
#> 10 SANTA_MARIA H1    B2     94.7
#> 11 SANTA_MARIA H2    B1    116. 
#> 12 SANTA_MARIA H2    B2    117. 

# Add sequential block numbering over environments
add_seq_block(df_ge, ENV, REP, prefix = "B")
#> The data `df_ge` has been arranged according to the `ENV` and `REP` columns.
#> # A tibble: 12 × 5
#>    ENV         GEN   REP   BLOCK    V1
#>    <chr>       <fct> <fct> <chr> <dbl>
#>  1 CACIQUE     H1    B1    B1     91.6
#>  2 CACIQUE     H2    B1    B1    128. 
#>  3 CACIQUE     H1    B2    B2     79.5
#>  4 CACIQUE     H2    B2    B2    123. 
#>  5 FREDERICO   H1    B1    B3     83.3
#>  6 FREDERICO   H2    B1    B3    113. 
#>  7 FREDERICO   H1    B2    B4     84.4
#>  8 FREDERICO   H2    B2    B4     95.2
#>  9 SANTA_MARIA H1    B1    B5     98.3
#> 10 SANTA_MARIA H2    B1    B5    116. 
#> 11 SANTA_MARIA H1    B2    B6     94.7
#> 12 SANTA_MARIA H2    B2    B6    117. 

# Recode the 'ENV' column to "ENV1", "ENV2", and so on.
recode_factor(df_ge,
              factor = ENV,
              prefix = "ENV",
              new_factor = ENV_CODE)
#> Error: object 'ENV' not found

# Format the data to be used in the Selegen software (model 54)
df <- df_to_selegen_54(df_ge, ENV, GEN, REP) %>%
recode_factor(ENV, prefix = "E", new_factor = ENV)
#> Error: object 'ENV' not found
# }