Obtain a tokenised data frame by splitting text alongside a regular expression.
This is the inverse operation of paste.data.frame
.
strsplit.data.frame(
data,
term,
group,
split = "[[:space:][:punct:][:digit:]]+",
...
)
A tokenised data frame containing one row per token.
This data.frame has the columns from group
and term
where the text in column term
will be split by the provided regular expression into tokens.
a data.frame or data.table
a character with a column name from data
which you want to split into tokens
a string with a column name or a character vector of column names from data
indicating identifiers of groups.
The text in term
will be split into tokens by group.
a regular expression indicating how to split the term
column.
Defaults to splitting by spaces, punctuation symbols or digits. This will be passed on to strsplit
.
further arguments passed on to strsplit
paste.data.frame
, strsplit
data(brussels_reviews, package = "udpipe")
x <- strsplit.data.frame(brussels_reviews, term = "feedback", group = "id")
head(x)
x <- strsplit.data.frame(brussels_reviews,
term = c("feedback"),
group = c("listing_id", "language"))
head(x)
x <- strsplit.data.frame(brussels_reviews, term = "feedback", group = "id",
split = " ", fixed = TRUE)
head(x)
Run the code above in your browser using DataLab