powered by
Tokenise text into a sequence of words. The function uses strsplit to split text into words by using the [:space:] and [:punct:] character classes.
strsplit
tokenize_spaces_punct(x)
a character string of length 1
a character vector with the sequence of words in x
x
# NOT RUN { tokenize_spaces_punct("This just splits. Text.alongside\nspaces right?") tokenize_spaces_punct("Also .. multiple punctuations or ??marks") tokenize_spaces_punct("Joske Vermeulen") # }
Run the code above in your browser using DataLab