Useful function for data organization before statistical analysis
add_seq_block()
: Add a column with sequential block numeration in multi-environment data sets.recode_factor()
: Recode a factor column. A sequential numbering (with possible prefix) is used to identify each level.df_to_selegen_54()
: Given a multi-environment data with environment, genotype, and replication, format the data to be used in the Selegen software (model 54).
Usage
add_seq_block(data, env, rep, new_factor = BLOCK, prefix = "", verbose = TRUE)
recode_factor(data, factor, new_factor = CODE, prefix = "", verbose = TRUE)
df_to_selegen_54(data, env, gen, rep, verbose = TRUE)
Arguments
- data
A data frame.
- env
The name of the column that contains the levels of the environments.
- rep
The name of the column that contains the levels of the replications/blocks.
- new_factor
The name of the new column created.
- prefix
An optional prefix to bind with the new factor.
- verbose
Logical argument. If
verbose = FALSE
the code will run silently.- factor
A column to recode.
- gen
The name of the column that contains the levels of the genotypes, that will be treated as random effect.
References
Resende, M.D. V. 2016. Software Selegen-REML/BLUP: a useful tool for plant breeding. Crop Breed. Appl. Biotechnol. 16(4): 330–339. doi:10.1590/1984-70332016v16n4a49 .
Author
Tiago Olivoto tiagoolivoto@gmail.com
Examples
# \donttest{
library(metan)
df_ge <- ge_simula(ngen = 2,
nenv = 3,
nrep = 2) %>%
add_cols(ENV = c(rep("CACIQUE", 4),
rep("FREDERICO", 4),
rep("SANTA_MARIA", 4)))
df_ge
#> # A tibble: 12 × 4
#> ENV GEN REP V1
#> <chr> <fct> <fct> <dbl>
#> 1 CACIQUE H1 B1 91.6
#> 2 CACIQUE H1 B2 79.5
#> 3 CACIQUE H2 B1 128.
#> 4 CACIQUE H2 B2 123.
#> 5 FREDERICO H1 B1 83.3
#> 6 FREDERICO H1 B2 84.4
#> 7 FREDERICO H2 B1 113.
#> 8 FREDERICO H2 B2 95.2
#> 9 SANTA_MARIA H1 B1 98.3
#> 10 SANTA_MARIA H1 B2 94.7
#> 11 SANTA_MARIA H2 B1 116.
#> 12 SANTA_MARIA H2 B2 117.
# Add sequential block numbering over environments
add_seq_block(df_ge, ENV, REP, prefix = "B")
#> The data `df_ge` has been arranged according to the `ENV` and `REP` columns.
#> # A tibble: 12 × 5
#> ENV GEN REP BLOCK V1
#> <chr> <fct> <fct> <chr> <dbl>
#> 1 CACIQUE H1 B1 B1 91.6
#> 2 CACIQUE H2 B1 B1 128.
#> 3 CACIQUE H1 B2 B2 79.5
#> 4 CACIQUE H2 B2 B2 123.
#> 5 FREDERICO H1 B1 B3 83.3
#> 6 FREDERICO H2 B1 B3 113.
#> 7 FREDERICO H1 B2 B4 84.4
#> 8 FREDERICO H2 B2 B4 95.2
#> 9 SANTA_MARIA H1 B1 B5 98.3
#> 10 SANTA_MARIA H2 B1 B5 116.
#> 11 SANTA_MARIA H1 B2 B6 94.7
#> 12 SANTA_MARIA H2 B2 B6 117.
# Recode the 'ENV' column to "ENV1", "ENV2", and so on.
recode_factor(df_ge,
factor = ENV,
prefix = "ENV",
new_factor = ENV_CODE)
#> Error: object 'ENV' not found
# Format the data to be used in the Selegen software (model 54)
df <- df_to_selegen_54(df_ge, ENV, GEN, REP) %>%
recode_factor(ENV, prefix = "E", new_factor = ENV)
#> Error: object 'ENV' not found
# }