get_writer_profiles: Estimate Writer Profiles

Description

Estimate writer profiles from handwritten documents scanned and saved as PNG files. Each file in input_dir is split into component shapes called graphs with process_batch_dir. Then the graphs are sorted into clusters with similar shapes using the cluster template and get_clusters_batch. An estimate of the writer profile for a document is the proportion of graphs from that document assigned to each of the clusters in template. The writer profiles are estimated by running get_cluster_fill_counts. If measure is counts than the cluster fill counts are returned. If measure is rates than get_cluster_fill_rates is run and the cluster fill rates are returned.

Usage

get_writer_profiles(
  input_dir,
  measure = "counts",
  num_cores = 1,
  template = templateK40,
  writer_indices = NULL,
  doc_indices = NULL,
  output_dir = NULL
)

Value

A data frame

Arguments

input_dir: A filepath to a folder containing one or more handwritten documents, scanned and saved as PNG file(s).
measure: A character string: either counts or rates. counts returns the cluster fill counts, I.e., the number of graphs assigned to each cluster. rates returns the cluster fill rates, I.e., the proportion of graphs assigned to each cluster.
num_cores: An integer number greater than or equal to 1 of cores to use for parallel processing.
template: Optional. A cluster template created with make_clustering_template. The default is templateK40.
writer_indices: A vector of start and stop characters for writer IDs in file names
doc_indices: A vector of start and stop characters for document names in file names
output_dir: Optional. A filepath to a folder to save the RDS files created by process_batch_dir and get_clusters_batch. If no folder is supplied, the RDS files will be saved to the temporary directory and then deleted before the function terminates.

Details

The functions process_batch_dir and get_clusters_batch take upwards of 30 seconds per document and the results are saved to RDS files in project_dir > graphs and project_dir > clusters, respectively. If project_dir is NULL than the results are saved to the temporary directory and deleted before the function terminates.

Examples

Run this code

# \donttest{
docs <- system.file(file.path("extdata"), package = "handwriter")
profiles <- get_writer_profiles(docs, measure = "counts")
plot_writer_profiles(profiles)

profiles <- get_writer_profiles(docs, measure = "rates")
plot_writer_profiles(profiles)
# }

Run the code above in your browser using DataLab