De-identification via replacement — add

add_pseudonymize() adds a psuedonymization step to a transformation pipeline. When ran as a transformation, terms that have not been seen before are given a new random alpha-numeric string while terms that have been previously transformed reuse the same term.

add_pseudonymize(object, ..., lookup = list())

Arguments

object: Either a data.frame, tibble, or existing DeidentList pipeline.
...: variables to be transformed.
lookup: a pre-existing name-value pair to define intended psuedonymizations. Instances of 'name' will be replaced with 'value' on transformation.#'

Value

A 'DeidentList' representing the untrained transformation pipeline. The object contains fields:

deident_methods a list of each step in the pipeline (consisting of variables and method)

and methods:

mutate apply the pipeline to a new data set
to_yaml serialize the pipeline to a '.yml' file

Examples


# Basic usage;
pipe.pseudonymize <- add_pseudonymize(ShiftsWorked, Employee)
pipe.pseudonymize$mutate(ShiftsWorked)
#> # A tibble: 3,100 × 7
#>    `Record ID` Employee Date       Shift `Shift Start` `Shift End` `Daily Pay`
#>          <int> <chr>    <date>     <chr> <chr>         <chr>             <dbl>
#>  1           1 n6ajf    2015-01-01 Night 17:01         00:01              78.1
#>  2           2 IwNIF    2015-01-01 Day   08:01         16:01             155. 
#>  3           3 J26Z1    2015-01-01 Day   08:01         16:01              77.8
#>  4           4 ox8RD    2015-01-01 Day   08:01         15:01             203. 
#>  5           5 Grs7g    2015-01-01 Night 16:01         23:01             211. 
#>  6           6 WOLOF    2015-01-01 Night 17:01         00:01             142. 
#>  7           7 dlqdf    2015-01-01 Rest  NA            NA                  0  
#>  8           8 siZKP    2015-01-01 Night 17:01         00:01             213. 
#>  9           9 59DXe    2015-01-01 Night 16:01         00:01             219. 
#> 10          10 sfcIr    2015-01-01 Night 16:01         00:01             242. 
#> # ℹ 3,090 more rows

pipe.pseudonymize2 <- add_pseudonymize(ShiftsWorked, Employee,
  lookup = list("Kyle Wilson" = "Kyle")
)
pipe.pseudonymize2$mutate(ShiftsWorked)
#> # A tibble: 3,100 × 7
#>    `Record ID` Employee Date       Shift `Shift Start` `Shift End` `Daily Pay`
#>          <int> <chr>    <date>     <chr> <chr>         <chr>             <dbl>
#>  1           1 CSmIB    2015-01-01 Night 17:01         00:01              78.1
#>  2           2 raxif    2015-01-01 Day   08:01         16:01             155. 
#>  3           3 ZxbqT    2015-01-01 Day   08:01         16:01              77.8
#>  4           4 ZChKS    2015-01-01 Day   08:01         15:01             203. 
#>  5           5 X4eLw    2015-01-01 Night 16:01         23:01             211. 
#>  6           6 qoGA5    2015-01-01 Night 17:01         00:01             142. 
#>  7           7 atM46    2015-01-01 Rest  NA            NA                  0  
#>  8           8 bTiKo    2015-01-01 Night 17:01         00:01             213. 
#>  9           9 MQtf6    2015-01-01 Night 16:01         00:01             219. 
#> 10          10 q95aY    2015-01-01 Night 16:01         00:01             242. 
#> # ℹ 3,090 more rows