Learn R Programming

eHDPrep (version 1.3.3)

skipgram_freq: Report Skipgram Frequency

Description

Measures the frequency of skipgrams (non-contiguous words in free text), reported in a tibble. Frequency is reported as both counts and percentages.

Usage

skipgram_freq(skipgram_tokens, min_freq = 1)

Value

Data frame containing frequency of skipgrams in absolute count and relative to the length of input variable.

Arguments

skipgram_tokens

Output of skipgram_identify.

min_freq

Minimum skipgram percentage frequency of occurrence to retain. Default = 1.

References

Guthrie, D., Allison, B., Liu, W., Guthrie, L. & Wilks, Y. A Closer Look at Skip-gram Modelling. in Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06) (European Language Resources Association (ELRA), 2006).

Benoit K, Watanabe K, Wang H, Nulty P, Obeng A, Müller S, Matsuo A (2018). “quanteda: An R package for the quantitative analysis of textual data.” _Journal of Open Source Software_, *3*(30), 774. doi:10.21105/joss.00774 <https://doi.org/10.21105/joss.00774>, <https://quanteda.io>.

Feinerer I, Hornik K (2020). _tm: Text Mining Package_. R package version 0.7-8, <https://CRAN.R-project.org/package=tm>.

Ingo Feinerer, Kurt Hornik, and David Meyer (2008). Text Mining Infrastructure in R. Journal of Statistical Software 25(5): 1-54. URL: https://www.jstatsoft.org/v25/i05/.

See Also

Principle underlying function: tokens_ngrams

Other free text functions: extract_freetext(), skipgram_append(), skipgram_identify()

Examples

Run this code
data(example_data)
toks_m <- skipgram_identify(x = example_data$free_text,
                            ids = example_data$patient_id,
                            max_interrupt_words = 5)
skipgram_freq(toks_m, min_freq = 0.5)

Run the code above in your browser using DataLab