This function turns a corpus of texts into a quanteda tokens object of sentences.
Usage
tokenize_sents(corpus, model = "en_core_web_sm")
Value
A quanteda tokens object where each token is a sentence.
Arguments
corpus
A quanteda corpus object, typically the output of the create_corpus() function or the output of contentmask().
model
The spacy model to use. The default is "en_core_web_sm".
Details
The function first split each text into paragraphs by splitting at new line markers and then uses spacy to tokenize each paragraph into sentences. The function accepts a plain text corpus input or the output of contentmask(). This function is necessary to prepare the data for lambdaG().