For use as part of the folder level API - this function is the equivalent of
pidpos(). The reports are saved in the specified report_dir.
Usage
report_on_folder(
data_path,
report_dir = "Proper Noun Reports",
tagger = "english-ewt",
filter_func = filter_to_proper_nouns,
chunk_size = 100,
to_ignore = c(),
export_function = NULL,
verbose = FALSE
)Arguments
- data_path
The file path at which data is stored
- report_dir
The location to write PID reports to
- tagger
Either a string naming a UDPipe model (see udpipe::udpipe_download_model for the list of models) or a custom tagging function (see
vignette("custom-functions")for details of what is required).- filter_func
A function to filter the tagged instances. See the 'Custom Filtering Functions' section of
vignette("custom-functions")for more details.- chunk_size
The number of sentences to tag at a time. The optimal value has yet to be determined.
- to_ignore
A vector of column names to be ignored by the algorithm. Intended to be used for variables that are giving strong false positives, such as IDs or ICD-10 codes.
- export_function
A function to control exporting the reports to disk. Current options are
export_as_treeandexport_flat- verbose
Boolean flag - if TRUE will...
Examples
{
input_dir <- withr::local_tempdir()
output_dir <- withr::local_tempdir()
dir.create(input_dir, recursive = TRUE, showWarnings = FALSE)
dir.create(output_dir, recursive = TRUE, showWarnings = FALSE)
example_data <- data.frame(
text = "Joey went to London",
stringsAsFactors = FALSE
)
utils::write.csv(example_data,
file.path(input_dir, "example.csv"),
row.names = FALSE
)
paths <- report_on_folder(input_dir, report_dir = output_dir)
paths
}
#> $example
#> [1] "/tmp/RtmpaO95Ns/file1cfc66bf9529/example.csv"
#>
