parse_path() parses parts of a path, i.e., anything separated by
"/", "-", "_" or ".", and adds them as a new variable. Parts that do not
consist of letters only, or of a real word, can be filtered via the argument keep.
webtrack data.frame with the same columns as wt
and a new column called 'path_split' (or, if varname not equal to 'url', '<varname>_path_split')
containing parts as a comma-separated string.
Arguments
wt
webtrack data object
varname
character. name of the column from which to extract the host.
Defaults to "url".
keep
character. Defines which types of path components to keep.
If set to "all", anything is kept. If "letters_only", only parts
containing letters are kept. If "words_only", only parts constituting
English words (as defined by the Word Game Dictionary,
cf. https://cran.r-project.org/web/packages/words/index.html) are kept.
Support for more languages will be added in future.
decode
logical. Whether to decode the path (see utils::URLdecode()), default to TRUE