Re-using Methods • deident

NB: the following is an advanced usage of deident. If you are just getting started we recommend looking at the other vignettes first.

While the deident package implements multiple different methods for deidentification, one of its key advantages is the ability to re-use and share methods across data sets due to the ‘stateful’ nature of its design.

If you wish to share a unit between different pipelines, the cleanest approach is to initialize the method of interest and then pass it into the first pipeline:

library(deident)

psu <- Pseudonymizer$new()

name_pipe <- starwars |>
  deident(psu, name)

apply_deident(starwars, name_pipe)
#> # A tibble: 87 × 14
#>    name  height  mass hair_color    skin_color eye_color birth_year sex   gender
#>    <chr>  <int> <dbl> <chr>         <chr>      <chr>          <dbl> <chr> <chr> 
#>  1 SwlKL    172    77 blond         fair       blue            19   male  mascu…
#>  2 UUEdL    167    75 NA            gold       yellow         112   none  mascu…
#>  3 ieexo     96    32 NA            white, bl… red             33   none  mascu…
#>  4 mb92Q    202   136 none          white      yellow          41.9 male  mascu…
#>  5 9QeuR    150    49 brown         light      brown           19   fema… femin…
#>  6 8HfdV    178   120 brown, grey   light      blue            52   male  mascu…
#>  7 HIyvQ    165    75 brown         light      blue            47   fema… femin…
#>  8 gF0fi     97    32 NA            white, red red             NA   none  mascu…
#>  9 vjF0H    183    84 black         light      brown           24   male  mascu…
#> 10 qZ3vE    182    77 auburn, white fair       blue-gray       57   male  mascu…
#> # ℹ 77 more rows
#> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
#> #   vehicles <list>, starships <list>

Having called apply_deident the Pseudonymizer psu has learned encodings for each string in starwars$name. If these strings appear a second time, they will be replaced in the same way, and we can build a second pipeline using psu:

combined.frm <- data.frame(
  ID = c(head(starwars$name, 5), head(ShiftsWorked$Employee, 5))
)

reused_pipe <- combined.frm |>
  deident(psu, ID)

apply_deident(combined.frm, reused_pipe)
#>       ID
#> 1  SwlKL
#> 2  UUEdL
#> 3  ieexo
#> 4  mb92Q
#> 5  9QeuR
#> 6  zbiPl
#> 7  fS5Hb
#> 8  OGqaB
#> 9  G4XIp
#> 10 EOl8m

Since the first 5 lines of combined.frm$ID are the same as starwars$ID the first 5 lines of each transformed data set are also the same.