Skip to contents

For use as part of the folder level API - this function is the equivalent of pidpos(). The reports are saved in the specified report_dir.

Usage

report_on_folder(
  data_path,
  report_dir = "Proper Noun Reports",
  tagger = "english-ewt",
  filter_func = filter_to_proper_nouns,
  chunk_size = 100,
  to_ignore = c(),
  export_function = NULL,
  verbose = FALSE
)

Arguments

data_path

The file path at which data is stored

report_dir

The location to write PID reports to

tagger

Either a string naming a UDPipe model (see udpipe::udpipe_download_model for the list of models) or a custom tagging function (see vignette("custom-functions") for details of what is required).

filter_func

A function to filter the tagged instances. See the 'Custom Filtering Functions' section of vignette("custom-functions") for more details.

chunk_size

The number of sentences to tag at a time. The optimal value has yet to be determined.

to_ignore

A vector of column names to be ignored by the algorithm. Intended to be used for variables that are giving strong false positives, such as IDs or ICD-10 codes.

export_function

A function to control exporting the reports to disk. Current options are export_as_tree and export_flat

verbose

Boolean flag - if TRUE will...

Examples

{
  input_dir <- withr::local_tempdir()
  output_dir <- withr::local_tempdir()

  dir.create(input_dir, recursive = TRUE, showWarnings = FALSE)
  dir.create(output_dir, recursive = TRUE, showWarnings = FALSE)

  example_data <- data.frame(
    text = "Joey went to London",
    stringsAsFactors = FALSE
  )

  utils::write.csv(example_data,
    file.path(input_dir, "example.csv"),
    row.names = FALSE
  )

  paths <- report_on_folder(input_dir, report_dir = output_dir)

  paths
}
#> $example
#> [1] "/tmp/RtmpaO95Ns/file1cfc66bf9529/example.csv"
#>