monkey_classify: Monkeylearn classify from a dataframe column or vector of texts

Description

Independent classifications for each row of a dataframe using the Monkeylearn classifiers modules

Usage

monkey_classify(input, col = NULL, key = monkeylearn_key(quiet = TRUE),
  classifier_id = "cl_oFKL5wft", params = NULL, texts_per_req = NULL,
  unnest = TRUE, .keep_all = TRUE, verbose = TRUE, ...)

Arguments

input

A dataframe or vector of texts (each text smaller than 50kB)

col

If input is a dataframe, the unquoted name of the character column containing text to classify

key

The API key

classifier_id

The ID of the classifier

params

Parameters for the module as a named list.

texts_per_req

Number of texts to be processed per requests. Minimum value is the number of texts in input; max is 200, as per [Monkeylearn documentation](docs.monkeylearn.com/article/api-reference/). If NULL, we default to 200, or, if there are fewer than 200 texts, the length of the input.

unnest

Should the output column be unnested?

.keep_all

If input is a dataframe, should non-col columns be retained in the output?

verbose

Whether to output messages about batch requests and progress of processing.

...

Other arguments

Value

A data.frame (tibble) with the cleaned input (empty strings removed) and a new column, nested by default, containing the classification for that particular row. Attribute is a data.frame (tibble) "headers" including the number of remaining queries as "x.query.limit.remaining".

Details

Find IDs of classifiers using https://app.monkeylearn.com/main/explore.

This function relates the rows in your original dataframe or elements in your vector to a classification particular to that row. This allows you to know which row of your original dataframe is associated with which classification. Each row of the dataframe is classified separately from all of the others, but the number of classifications a particular input row is assigned may vary (unless you specify a fixed number of outputs in params).

The texts_per_req parameter simply specifies the number of rows to feed the API at a time; it does not lump these together for classification as a group. Varying this parameter does not affect the final output, but does affect speed: one batched request of x texts is faster than x single-text requests: http://help.monkeylearn.com/frequently-asked-questions/queries/can-i-classify-or-extract-more-than-one-text-with-one-api-request. Even if batched, each text still counts as one query, so batching does not save you on hits to the API. See the [Monkeylearn API docs](docs.monkeylearn.com/article/api-reference/) for more details.

You can check the number of calls you can still make in the API using attr(output, "headers")$x.query.limit.remaining and attr(output, "headers")$x.query.limit.limit.

Examples

Run this code

# NOT RUN {
text1 <- "Haur<U+00E0>s de dirigir-te al punt de trobada del grup al que et vulguis unir."
text2 <- "i want to buy an iphone"
text3 <- "Je d<U+00E9>teste ne plus avoir de dentifrice."
text_4 <- "I hate not having any toothpaste."
request_df <- tibble::as_tibble(list(txt = c(text1, text2, text3, text_4)))
monkey_classify(request_df, txt, texts_per_req = 2, unnest = TRUE)
attr(output, "headers")
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab