predict.crf: Predict the label sequence based on the Conditional Random Field

Description

Predict the label sequence based on the Conditional Random Field

Usage

# S3 method for crf
predict(
  object,
  newdata,
  group,
  type = c("marginal", "sequence"),
  trace = FALSE,
  ...
)

Arguments

object

an object of class crf as returned by crf

newdata

a character matrix of data containing attributes about the label sequence y or an object which can be coerced to a character matrix. This data should be provided in the same format as was used for training the model

group

an integer or character vector of the same length as nrow newdata indicating the group the sequence y belongs to (e.g. a document or sentence identifier)

type

either 'marginal' or 'sequence' to get predictions at the level of newdata or a the level of the sequence group. Defaults to 'marginal'

trace

a logical indicating to show the trace of the labelling output. Defaults to FALSE.

...

not used

Value

If type is 'marginal': a data.frame with columns label and marginal containing the viterbi decoded predicted label and marginal probability. If type is 'sequence': a data.frame with columns group and probability containing for each sequence group the probability of the sequence.

Examples

Run this code

# NOT RUN {
library(udpipe)
data(airbnb_chunks, package = "crfsuite")
udmodel <- udpipe_download_model("dutch-lassysmall")
udmodel <- udpipe_load_model(udmodel$file_model)
airbnb_tokens <- unique(airbnb_chunks[, c("doc_id", "text")])
airbnb_tokens <- udpipe_annotate(udmodel, 
                                 x = airbnb_tokens$text, 
                                 doc_id = airbnb_tokens$doc_id)
airbnb_tokens <- as.data.frame(airbnb_tokens)
x <- merge(airbnb_chunks, airbnb_tokens)
x <- crf_cbind_attributes(x, terms = c("upos", "lemma"), by = "doc_id")
model <- crf(y = x$chunk_entity, 
             x = x[, grep("upos|lemma", colnames(x))], 
             group = x$doc_id, 
             method = "lbfgs", options = list(max_iterations = 5)) 
scores <- predict(model, 
                  newdata = x[, grep("upos|lemma", colnames(x))], 
                  group = x$doc_id, type = "marginal")
head(scores)
scores <- predict(model, 
                  newdata = x[, grep("upos|lemma", colnames(x))], 
                  group = x$doc_id, type = "sequence")
head(scores)


## cleanup for CRAN
file.remove(model$file_model)
file.remove("modeldetails.txt")
file.remove(udmodel$file)
# }

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples