add_shuffle() adds a shuffling step to a transformation pipeline. When ran as a transformation, each specified variable undergoes a random sample without replacement so that summary metrics on a single variable are unchanged, but inter-variable metrics are rendered spurious.

add_shuffle(object, ..., limit = 0)

Arguments

object

Either a data.frame, tibble, or existing DeidentList pipeline.

...

variables to be transformed.

limit

integer - the minimum number of observations a variable needs to have for shuffling to be performed. If the variable has length less than limit values are replaced with NAs.

Value

A 'DeidentList' representing the untrained transformation pipeline. The object contains fields:

  • deident_methods a list of each step in the pipeline (consisting of variables and method)

and methods:

  • mutate apply the pipeline to a new data set

  • to_yaml serialize the pipeline to a '.yml' file

See also

add_group() for usage under aggregation

Examples


# Basic usage;
pipe.shuffle <- add_shuffle(ShiftsWorked, Employee)
pipe.shuffle$mutate(ShiftsWorked)
#> # A tibble: 3,100 × 7
#>    `Record ID` Employee   Date       Shift `Shift Start` `Shift End` `Daily Pay`
#>          <int> <chr>      <date>     <chr> <chr>         <chr>             <dbl>
#>  1           1 Joseph Va… 2015-01-01 Night 17:01         00:01              78.1
#>  2           2 Elizabeth… 2015-01-01 Day   08:01         16:01             155. 
#>  3           3 Michelle … 2015-01-01 Day   08:01         16:01              77.8
#>  4           4 Joseph Cr… 2015-01-01 Day   08:01         15:01             203. 
#>  5           5 Edward Jo… 2015-01-01 Night 16:01         23:01             211. 
#>  6           6 Samuel St… 2015-01-01 Night 17:01         00:01             142. 
#>  7           7 John Bell  2015-01-01 Rest  NA            NA                  0  
#>  8           8 Christina… 2015-01-01 Night 17:01         00:01             213. 
#>  9           9 Jason Hill 2015-01-01 Night 16:01         00:01             219. 
#> 10          10 Brian Jon… 2015-01-01 Night 16:01         00:01             242. 
#> # ℹ 3,090 more rows

pipe.shuffle.limit <- add_shuffle(ShiftsWorked, Employee, limit = 1)
pipe.shuffle.limit$mutate(ShiftsWorked)
#> # A tibble: 3,100 × 7
#>    `Record ID` Employee   Date       Shift `Shift Start` `Shift End` `Daily Pay`
#>          <int> <chr>      <date>     <chr> <chr>         <chr>             <dbl>
#>  1           1 Maria Cook 2015-01-01 Night 17:01         00:01              78.1
#>  2           2 Nathan Ba… 2015-01-01 Day   08:01         16:01             155. 
#>  3           3 Alexander… 2015-01-01 Day   08:01         16:01              77.8
#>  4           4 Carol Kim  2015-01-01 Day   08:01         15:01             203. 
#>  5           5 Tyler Gut… 2015-01-01 Night 16:01         23:01             211. 
#>  6           6 Samuel Pe… 2015-01-01 Night 17:01         00:01             142. 
#>  7           7 George Ho… 2015-01-01 Rest  NA            NA                  0  
#>  8           8 Jeffrey R… 2015-01-01 Night 17:01         00:01             213. 
#>  9           9 Christina… 2015-01-01 Night 16:01         00:01             219. 
#> 10          10 Laura Jac… 2015-01-01 Night 16:01         00:01             242. 
#> # ℹ 3,090 more rows