Learn R Programming

googlenlp


The googlenlp package provides an R interface to Google's Cloud Natural Language API.

"Google Cloud Natural Language API reveals the structure and meaning of text by offering powerful machine learning models in an easy to use REST API. You can use it to extract information about people, places, events and much more, mentioned in text documents, news articles or blog posts. You can use it to understand sentiment about your product on social media or parse intent from customer conversations happening in a call center or a messaging app." [source]

There are four main features of the API, all of which are available through this R package [source]:

  • Syntax Analysis: "Extract tokens and sentences, identify parts of speech (PoS) and create dependency parse trees for each sentence."
  • Entity Analysis: "Identify entities and label by types such as person, organization, location, events, products and media."
  • Sentiment Analysis: "Understand the overall sentiment expressed in a block of text."
  • Multi-Language: "Enables you to easily analyze text in multiple languages including English, Spanish and Japanese."

Resources

Installation

You can install the development version from GitHub:

devtools::install_github("BrianWeinstein/googlenlp")

Authentication

To use the API, you'll first need to create a Google Cloud project and enable billing, and get an API key.

Configuration

Load the package and set your API key. There are two ways to do this.

Method A (preferred)

Method A (preferred method) adds your API key as a variable to your .Renviron file. Under this method, you only need to do this setup process one time.

library(googlenlp)

configure_googlenlp() # follow the instructions printed to the console
googlenlp setup instructions:
 1. Your ~/.Renviron file will now open in a new window/tab.
    *** If it doesn't open, run:  file.edit("~/.Renviron") ***
 2. To use the API, you'll first need to create a Google Cloud project and enable billing (https://cloud.google.com/natural-language/docs/getting-started).
 3. Next you'll need to get an API key (https://cloud.google.com/natural-language/docs/common/auth).
 4. In your  ~/.Renviron  file, replace the ENTER_YOUR_API_KEY_HERE with your Google Cloud API key.
 5. Save your ~/.Renviron file.
 6. *** Restart your R session for changes to take effect. ***

Method B

Method B defines your API key as a session-level variable. Under this method, you'll need to set your API key at the beginning of each R session.

library(googlenlp)

set_api_key("MY_API_KEY") # replace this with your API key

Getting started

Define the text you'd like to analyze.

text <- "Google, headquartered in Mountain View, unveiled the new Android phone at the Consumer Electronic Show.
         Sundar Pichai said in his keynote that users love their new Android phones."

The annotate_text function analyzes the text's syntax (sentences and tokens), entities, sentiment, and language; and returns the result as a five-element list.

analyzed <- annotate_text(text_body = text)
#> Warning: package 'bindrcpp' was built under R version 3.4.4

str(analyzed, max.level = 1)
#> List of 5
#>  $ sentences        :Classes 'tbl_df', 'tbl' and 'data.frame':   2 obs. of  4 variables:
#>  $ tokens           :Classes 'tbl_df', 'tbl' and 'data.frame':   32 obs. of  17 variables:
#>  $ entities         :Classes 'tbl_df', 'tbl' and 'data.frame':   10 obs. of  8 variables:
#>  $ documentSentiment:'data.frame':   1 obs. of  2 variables:
#>  $ language         : chr "en"

Sentences

"Sentence extraction breaks up the stream of text into a series of sentences." [API Documentation]

  • beginOffset indicates the (zero-based) character index of where the sentence begins (wtih UTF-8 encoding).
  • The magnitude and score fields quantify each sentence's sentiment — see the Document Sentiment section for more details.
analyzed$sentences

Tokens

"Tokenization breaks the stream of text up into a series of tokens, with each token usually corresponding to a single word. The Natural Language API then processes the tokens and, using their locations within sentences, adds syntactic information to the tokens." [API Documentation]

  • lemma indicates the token's "root" word, and can be useful in standardizing the word within the text.
  • tag indicates the token's part of speech.
  • Additional column definitions are outlined here and here.
analyzed$tokens

Copy Link

Version

Install

install.packages('googlenlp')

Monthly Downloads

163

Version

0.2.0

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Brian Weinstien

Last Published

July 13th, 2018

Functions in googlenlp (0.2.0)

analyze_syntax

analyze_syntax
flatten_tokens

Flatten tokens
flatten_entities

Flatten entities
configure_googlenlp

Configure your computer or a server to connect to the Google Cloud Natural Language API via R functions
gcnlp_key

Retrieve API key
analyze_sentiment

analyze_sentiment
flatten_sentences

Flatten sentences
analyze_entities

analyze_entities
annotate_text

annotate_text
gcnlp_post

Send a POST request to the Google Cloud Natural Language API
set_api_key

Manually set access credentials
get_config_file

Fetch session-specific gcnlp default values
flatten_sentiment

Flatten sentiment