Random Sampling — utils_samples • metan

sample_random() performs Simple Random Sampling or Stratified Random Sampling
sample_systematic() performs systematic sampling. In this case, a regular interval of size k (k = floor(N/n)) is generated considering the population size (N) and desired sample size (n). Then, the starting member (r) is randomly chosen between 1-k. The second element is r + k, and so on.

Usage

sample_random(data, n, prop, by = NULL, weight = NULL)

sample_systematic(data, n, r = NULL, by = NULL)

Arguments

data: A data frame. If data is a grouped_df, the operation will be performed on each group (stratified).
n, prop: Provide either n, the number of rows, or prop, the proportion of rows to select. If neither are supplied, n = 1 will be used.
by: A categorical variable to compute the sample by. It is a shortcut to dplyr::group_by() that allows to group the data by one categorical variable. If more than one grouping variable needs to be used, use dplyr::group_by() to pass the data grouped.
weight: Sampling weights. This must evaluate to a vector of non-negative numbers the same length as the input. Weights are automatically standardised to sum to 1.
r: The starting element. By default, r is randomly selected between 1:k

Value

An object of the same type as data.

Examples

library(metan)
sample_random(data_ge, n = 5)
#> # A tibble: 5 × 5
#>   ENV   GEN   REP      GY    HM
#>   <fct> <fct> <fct> <dbl> <dbl>
#> 1 E8    G8    2      2.80  49.2
#> 2 E7    G4    1      1.41  48  
#> 3 E11   G9    1      1.01  53.7
#> 4 E13   G3    3      3.47  45.4
#> 5 E1    G5    2      2.12  51  
sample_random(data_ge,
              n = 3,
              by = ENV)
#> # A tibble: 42 × 5
#>    ENV   GEN   REP      GY    HM
#>    <fct> <fct> <fct> <dbl> <dbl>
#>  1 E1    G9    2     2.21   46.7
#>  2 E1    G4    2     3.02   48.0
#>  3 E1    G4    3     2.38   49  
#>  4 E10   G4    1     2.25   42  
#>  5 E10   G3    2     2.4    45  
#>  6 E10   G1    1     2.55   44  
#>  7 E11   G6    3     1.14   57  
#>  8 E11   G1    3     1.39   55  
#>  9 E11   G9    3     0.928  53  
#> 10 E12   G8    2     2.32   47  
#> # ℹ 32 more rows

sample_systematic(data_g, n = 6)
#> k = 6
#> # A tibble: 6 × 18
#>     .id GEN   REP      PH    EH    EP    EL    ED    CL    CD    CW    KW    NR
#>   <int> <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1    32 H7    2      2.14  1.05 0.489  13.8  46.2  27.8  14.3  23.0  135.  14.4
#> 2    18 H2    3      2.31  1.16 0.501  15.5  51.2  30.2  16.6  21.0  160.  15.2
#> 3    15 H13   3      2.69  1.52 0.566  14.8  50.1  27.0  15.4  21.4  172.  17.2
#> 4    36 H8    3      2.05  1.12 0.545  14.1  44.3  27.4  16    18.3  127.  14  
#> 5    14 H13   2      2.58  1.32 0.511  15.2  50.3  26.7  15.9  19.3  174.  20.4
#> 6     2 H1    2      2.20  1.09 0.492  13.7  49.2  30.5  14.7  22.3  130.  16.4
#> # ℹ 5 more variables: NKR <dbl>, CDED <dbl>, PERK <dbl>, TKW <dbl>, NKE <dbl>