Learn R Programming

mscsweblm4r (version 0.1.2)

weblmCalculateJointProbability: Calculates the joint probability that a sequence of words will appear together.

Description

This function calculates the joint probability that a particular sequence of words will appear together. The input string must be in ASCII format.

Internally, this function invokes the Microsoft Cognitive Services Web Language Model REST API documented at https://www.microsoft.com/cognitive-services/en-us/web-language-model-api/documentation.

You MUST have a valid Microsoft Cognitive Services account and an API key for this function to work properly. See https://www.microsoft.com/cognitive-services/en-us/pricing for details.

Usage

weblmCalculateJointProbability(inputWords, modelToUse = "body", orderOfNgram = 5L)

Arguments

inputWords
(character vector) Vector of character strings for which to calculate the joint probability. Must be in ASCII format.
modelToUse
(character) Which language model to use, supported values: "title", "anchor", "query", or "body" (optional, default: "body")
orderOfNgram
(integer) Which order of N-gram to use, supported values: 1L, 2L, 3L, 4L, or 5L (optional, default: 5L)

Value

An S3 object of the class weblm. The results are stored in the results dataframe inside this object. The dataframe contains the word sequences and their log(probability).

Examples

Run this code
## Not run: 
#  tryCatch({
# 
#    # Calculate joint probability a particular sequence of words will appear together
#    jointProbabilities <- weblmCalculateJointProbability(
#      inputWords = c("where", "is", "San", "Francisco", "where is",
#                     "San Francisco", "where is San Francisco"),  # ASCII only
#      modelToUse = "query",                     # "title"|"anchor"|"query"(default)|"body"
#      orderOfNgram = 4L                         # 1L|2L|3L|4L|5L(default)
#    )
# 
#    # Class and structure of jointProbabilities
#    class(jointProbabilities)
#    #> [1] "weblm"
# 
#    str(jointProbabilities, max.level = 1)
#    #> List of 3
#    #>  $ results:'data.frame':  7 obs. of  2 variables:
#    #>  $ json   : chr "{"results":[{"words":"where","probability":-3.378}, __truncated__ ]}
#    #>  $ request:List of 7
#    #>   ..- attr(*, "class")= chr "request"
#    #>  - attr(*, "class")= chr "weblm"
# 
#    # Print results
#    pandoc.table(jointProbabilities$results)
#    #> ------------------------------------
#    #>         words           probability
#    #> ---------------------- -------------
#    #>         where             -3.378
#    #>
#    #>           is              -2.607
#    #>
#    #>          san              -3.292
#    #>
#    #>       francisco           -4.051
#    #>
#    #>        where is           -3.961
#    #>
#    #>     san francisco         -4.086
#    #>
#    #> where is san francisco    -7.998
#    #> ------------------------------------
# 
#  }, error = function(err) {
# 
#    # Print error
#    geterrmessage()
# 
#  })
# ## End(Not run)

Run the code above in your browser using DataLab