Learn R Programming

enigma (version 0.1.1)

enigma_stats: Get statistics on columns of a dataset from Enigma.

Description

Get statistics on columns of a dataset from Enigma.

Usage

enigma_stats(dataset = NULL, select = NULL, operation = NULL, by = NULL,
  of = NULL, limit = 500, search = NULL, where = NULL, sort = NULL,
  page = NULL, key = NULL, ...)

Arguments

dataset
Dataset name. Required.
select
(character) Column to get statistics on. Required.
operation
(character) Operation to run on a given column. For a numerical column, valid operations are sum, avg, stddev, variance, max, min and frequency. For a date column, valid operations are max, min and frequency. For all other columns, the only valid operatio
by
(character) Compound operation to run on a given pair of columns. Valid compound operations are sum and avg. When running a compound operation query, the of parameter is required (see below).
of
(character) Numerical column to compare against when running a compound operation. Required when using the by parameter. Must be a numerical column.
limit
(numeric) Limit the number of frequency, compound sum, or compound average results returned. Max: 500; Default: 500.
search
(character) Filter results by only returning rows that match a search query. By default this searches the entire table for matching text. To search particular fields only, use the query format "@fieldname query". To match multiple queries, the | (or) oper
where
(character) Filter results with a SQL-style "where" clause. Only applies to numerical columns - use the search parameter for strings. Valid operators are >, < and =. Only one where clause per request is currently supported.
sort
(character) Sort frequency, compound sum, or compound average results in a given direction. + denotes ascending order, - denotes descending
page
(numeric) Paginate frequency, compound sum, or compound average results and return the nth page of results. Pages are calculated based on the current limit, which defaults to 500.
key
(character) Required. An Enigma API key. Supply in the function call, or store in your .Rprofile file, or do options(enigmaKey = ""). Obtain an API key by creating an account with Enigma at http://enigma.io,
...
Named options passed on to GET

Examples

Run this code
# After obtaining an API key from Enigma's website, pass in your key to the function call
# or set in your options (see above instructions for the key parameter)
# If you pass in your key to the function call use the key parameter

# stats on a varchar column
cbase <- 'com.crunchbase.info.companies.acquisition'
enigma_stats(dataset=cbase, select='acquired_month')

# stats on a numeric column
enigma_stats(dataset=cbase, select='price_amount')

# stats on a date column
pakistan <- 'gov.pk.secp.business-registry.all-entities'
enigma_metadata(dataset=pakistan)
enigma_stats(dataset=pakistan, select='registration_date')

# stats on a date column, by the average of a numeric column
aust <- 'gov.au.government-spending.federal-contracts'
enigma_metadata(dataset=aust)
enigma_stats(dataset=aust, select='contractstart', by='avg', of='value')

# Get frequency of distances traveled, and plot
## get columns for the air carrier dataset
dset <- 'us.gov.dot.rita.trans-stats.air-carrier-statistics.t100d-market-all-carrier'
enigma_metadata(dset)$columns$table[,c(1:4)]
out <- enigma_stats(dset, select='distance')
library("ggplot2")
library("ggthemes")
df <- out$result$frequency
df <- data.frame(distance=as.numeric(df$distance), count=as.numeric(df$count))
ggplot(df, aes(distance, count)) +
 geom_bar(stat="identity") +
 geom_point() +
 theme_grey(base_size = 18) +
 labs(y="flights", x="distance (miles)")

Run the code above in your browser using DataLab