Skip to contents

pidpos

pidpos()
Proper Noun Detection
summary(<pidpos>)
Summarize a pidpos report.
tag_data_frame()
Tags a data frame with part of speech tags
udpipe_factory()
Create a UDPipe tagging function
custom_tagger()
Convert a POS tagging function to a tagger for the pidpos package
filter_to_proper_nouns()
Filter a tagged data frame to proper nouns

Redaction tools

report_to_redaction_rules()
Initialize redaction rules
redact()
Redact PID
parse_redacter()
Parse a data frame into a redaction function with optional caching.
redaction_function_factory()
Replacement rules to redaction function
batched_redact()
A wrapper for efficient redaction.

Replacement Utilities

auto_replace()
Apply a replacement function to a rules.frm.
make_hashing_replacement()
Function factory for hashing replacement.
make_random_replacement()
Function factory for random replacement.
make_replacement_function()
Wrapper for custom replacement functions
get_replacement_cache() key_lookup() value_lookup()
Access the cache of replacements

Folder level API

report_on_folder()
Generate PID reports across folder structure
get_distinct_redaction_rules()
Combine multiple PID reports into a single rule set
redact_at_folder()
Redact PID across folder structure

Package Utilities

browse_model_location()
Browse user to folder for UDPipe models.
browse_udpipe_repo()
Open github link to the 'english-ewt-2.5' UD model.
enable_local_models()
Set the model folder to a local 'pidpos_models' sub-folder.
enable_package_models()
Set the model folder to the package data folder.
register_reader()
Add a reader function for a specific file extension.
set_udpipe_version()
Set the udpipe model repository version.
reinstate_default_reader()
Reinstate the default read functionality for csv, tsv, xls, and xlsx files.
merge_redactions()
Remove PID from a data frame via a merge

Datasets

the_one_in_massapequa
The One in Massapequa
sentence_frm
A short data frame of free text including PID. Used for basic examples and tests.
raw_redaction_rules
raw_redaction_rules An example of a redaction rules produced by the pidpos function. It is made using the first 20 rows of the_one_in_massapequa data set.