Learn R Programming

⚠️There's a newer version (1.1.3) of this package.Take me there.

AzureKusto

R interface to Kusto, also known as Azure Data Explorer, a fast and highly scalable data exploration service.

Installation

You can install the development version from GitHub. Note that if you are using Microsoft R, AzureKusto requires recent versions of some packages which will likely not be in your default MRAN snapshot. You can set the repository to CRAN before installing.

options(repos="https://cloud.r-project.org")
devtools::install_github("cloudyr/AzureKusto")

Example usage

Kusto endpoint interface

Connect to a Kusto cluster by instantiating a kusto_database_endpoint object with the cluster URI and database name.


library(AzureKusto)

Samples <- kusto_database_endpoint(server="https://help.kusto.windows.net", database="Samples")

# To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code FPD8GZPY9 to authenticate.
# Waiting for device code in browser...
# Press Esc/Ctrl + C to abort
# Authentication complete.

Now you can issue queries to the Kusto database with run_query and get the results back as a data.frame.


res <- run_query(Samples, "StormEvents | summarize EventCount = count() by State | order by State asc")

head(res)

##            State EventCount
## 1        ALABAMA       1315
## 2         ALASKA        257
## 3 AMERICAN SAMOA         16
## 4        ARIZONA        340
## 5       ARKANSAS       1028
## 6 ATLANTIC NORTH        188

run_query() also supports query parameters. Simply pass your parameters as additional keyword arguments and they will be escaped and interpolated into the query string.


res <- run_query(Samples, "MyFunction(lim)", lim=10L)

Command statements work much the same way, except that they do not accept parameters.


res <- run_query(Samples, ".show tables | count")

dplyr Interface

The package also implements a dplyr-style interface for building a query upon a tbl_kusto object and then running it on the remote Kusto database and returning the result as a regular tibble object with collect().


library(dplyr)

StormEvents <- tbl_kusto(Samples, "StormEvents")

q <- StormEvents %>%
    group_by(State) %>%
    summarize(EventCount=n()) %>%
    arrange(State)

show_query(q)

## <KQL> database('Samples').['StormEvents']
## | summarize ['EventCount'] = count() by ['State']
## | order by ['State'] asc

collect(q)

## # A tibble: 67 x 2
##    State          EventCount
##    <chr>               <dbl>
##  1 ALABAMA              1315
##  2 ALASKA                257
##  3 AMERICAN SAMOA         16
##  4 ARIZONA               340
##  5 ARKANSAS             1028
##  6 ATLANTIC NORTH        188
##  7 ATLANTIC SOUTH        193
##  8 CALIFORNIA            898
##  9 COLORADO             1654
## 10 CONNECTICUT           148
## # ... with 57 more rows

tbl_kusto also accepts query parameters, in case the Kusto source table is a parameterized function:


MyFunctionDate <- tbl_kusto(Samples, "MyFunctionDate(dt)", dt=as.Date("2019-01-01"))

MyFunctionDate %>%
    select(StartTime, EndTime, EpisodeId, EventId, State) %>%
    head() %>%
    collect()

## # A tibble: 6 x 5
##   StartTime           EndTime             EpisodeId EventId State         
##   <dttm>              <dttm>                  <int>   <int> <chr>         
## 1 2007-09-29 08:11:00 2007-09-29 08:11:00     11091   61032 ATLANTIC SOUTH
## 2 2007-09-18 20:00:00 2007-09-19 18:00:00     11074   60904 FLORIDA       
## 3 2007-09-20 21:57:00 2007-09-20 22:05:00     11078   60913 FLORIDA       
## 4 2007-12-30 16:00:00 2007-12-30 16:05:00     11749   64588 GEORGIA       
## 5 2007-12-20 07:50:00 2007-12-20 07:53:00     12554   68796 MISSISSIPPI   
## 6 2007-12-20 10:32:00 2007-12-20 10:36:00     12554   68814 MISSISSIPPI   

DBI interface

AzureKusto implements a subset of the DBI specification for interacting with databases. It should be noted that Kusto is quite different to the SQL databases that DBI targets, which affects the behaviour of certain DBI methods and renders other moot.

library(DBI)

# connect to the server: basically a wrapper for kusto_database_endpoint()
Samples <- dbConnect(AzureKusto(),
                     server="https://help.kusto.windows.net",
                     database="Samples")

dbListTables(Samples)

## [1] "StormEvents"       "demo_make_series1" "demo_series2"     
## [4] "demo_series3"      "demo_many_series1"

dbExistsTable(Samples, "StormEvents")

##[1] TRUE

dbGetQuery(Samples, "StormEvents | summarize ct = count()")

##      ct
## 1 59066

Azure Resource Manager interface

On the admin side, AzureKusto extends the framework supplied by the AzureRMR to support Kusto. Methods are provided to create and delete clusters and databases, and manage database principals.

# create a new Kusto cluster
az <- AzureRMR::get_azure_login()
ku <- az$
    get_subscription("sub_id")$
    get_resource_group("rgname")$
    create_kusto_cluster("mykustocluster")

# create a new database
db1 <- ku$create_database("database1")

# add a user
db1$add_principals("myusername", role="User", fqn="aaduser=username@mydomain")

Copy Link

Version

Install

install.packages('AzureKusto')

Monthly Downloads

1,411

Version

1.0.1

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Hong Ooi

Last Published

April 25th, 2019

Functions in AzureKusto (1.0.1)

dbGetQuery,AzureKustoConnection,character-method

DBI methods for Kusto queries and commands
add_op_single

Append an operation representing a single-table verb to the tbl_kusto object's ops list
AzureKustoDriver-class

Kusto DBI driver class
AzureKustoResult-class

Kusto DBI result class
base_scalar

Scalar operator translations (infix and prefix)
base_window

Window function translations
compute.tbl_kusto

Execute the query, store the results in a table, and return a reference to the new table
az_kusto

Kusto/Azure Data Explorer cluster resource class
copy_to.kusto_database_endpoint

This function uploads a local data frame into a remote data source, creating the table definition as needed. If the table exists, it will append the data to the existing table. If not, it will create a new table.
ident_q

Pass an already-escaped string to Kusto
ingest_local

Ingestion functions for Kusto
is_kusto_database

Information functions
AzureKusto

DBI interface: connect to a Kusto cluster
AzureKustoConnection-class

Kusto DBI connection class
az_kusto_database

Kusto/Azure Data Explorer database resource class
base_agg

Aggregation function translations
inner_join.tbl_kusto_abstract

Join methods for Kusto tables
kql_build

Build the tbl object into a data structure representing a Kusto query
kql_infix

Return a function representing a scalar KQL infix operator
kql_prefix

Return a function representing a scalar KQL prefix function
flatten_query

Walks the tree of ops and builds a stack.
kql_build.op_mutate

dplyr's mutate verb can include aggregations, but Kusto's extend verb cannot. If the mutate contains no aggregations, then it can emit an extend clause. If the mutate contains an aggregation and the tbl is ungrouped, then it must emit a summarize clause grouped by all variables. If the mutate contains an aggregation and the tbl is grouped, then it must join to a subquery containing the summarize clause.
reexports

Objects exported from other packages
kql_escape_ident_q

Pass through an already-escaped Kusto identifier
kql_escape_ident

Escape a Kusto identifer with [' ']
get_kusto_cluster

Get existing Kusto/Azure Data Explorer cluster
kql_translate_env

Build a kql_variant class out of the environments holding scalar and aggregation function definitions
add_op_join

Append a join operation to the tbl_kusto object's ops list
kql_translator

Builds an environment from a list of R -> Kusto query language translation pairs.
translate_kql

Translate R expressions into Kusto Query Language equivalents.
kql_window

Return a function representing a KQL window function
run_query

Run a query or command against a Kusto database
op_base

The "base case" operation representing the tbl itself and its column variables
kusto-DBI

DBI interface to Kusto
tbl_kusto

A tbl object representing a table in a Kusto database.
op_double

A double-table verb, e.g. joins, setops
add_op_set_op

Append a set operation to the tbl_kusto object's ops list
tbl_kusto_abstract

Create a local lazy tbl
create_kusto_cluster

Create Kusto/Azure Data Explorer cluster
delete_kusto_cluster

Delete Kusto/Azure Data Explorer cluster
kusto_database_endpoint

Endpoints for communicating with a Kusto database
escape

Escape/quote a string.
show_query.tbl_kusto_abstract

Translate a sequence of dplyr operations on a tbl into a Kusto query string.
kql

Tag character strings as Kusto Query Language. Assumes the string is valid and properly escaped.
summarise.tbl_kusto_abstract

Summarise method for Kusto tables
kql_aggregate

Return a function representing a KQL aggregation function
op_grps

Look up the applicable grouping variables for an operation based on the data source and preceding sequence of operations
op_single

A class representing a single-table verb
build_kql

Build a KQL string.
collect.tbl_kusto

Compile the preceding dplyr oprations into a kusto query, execute it on the remote server, and return the result as a tibble.
get_kusto_token

Manage AAD authentication tokens for Kusto clusters
ident

Flag a character string as a Kusto identifier
kql_escape_logical

Escape a Kusto logical value. Converts TRUE/FALSE to true / false
kql_escape_string

Escape a Kusto string by single-quoting
kql_quote

Helper function for quoting kql elements.
kql_render

Render a set of operations on a tbl_kusto_abstract to a Kusto query
op_vars

Look up the applicable variables in scope for a given operation based on the data source and preceding sequence of operations
partial_eval

Partially evaluate an expression.
dbReadTable,AzureKustoConnection,character-method

DBI methods for Kusto table management