botor v0.2.0

0

Monthly downloads

0th

Percentile

'AWS Python SDK' ('boto3') for R

Fork-safe, raw access to the 'Amazon Web Services' ('AWS') 'SDK' via the 'boto3' 'Python' module, and convenient helper functions to query the 'Simple Storage Service' ('S3') and 'Key Management Service' ('KMS'), partial support for 'IAM', the 'Systems Manager Parameter Store' and 'Secrets Manager'.

Readme

botor: Reticulate wrapper on 'boto3'

Project Status: Active – The project has reached a stable, usable state and is being actively developed. CRAN Build Status Code Coverage

This R package provides raw access to the 'Amazon Web Services' ('AWS') 'SDK' via the 'boto3' Python module and some convenient helper functions (currently for S3 and KMS) and workarounds, eg taking care of spawning new resources in forked R processes.

Installation

This package requires Python to be installed along with the boto3 Python module, which can be installed from R via:

reticulate::py_install('boto3')

If that might result in technical problems that you cannot solve, then it's probably easier to install a standalone Python along with the system dependencies etc via rminiconda.

Once the Python dependencies are resolved, you can either install from CRAN or the most recent (development version) of botor can be installed from GitHub:

remotes::install_github('daroczig/botor')

Loading the package

Loading the botor package might take a while as it will also import the boto3 Python module in the background:

system.time(library(botor))
#>    user  system elapsed 
#>   1.131   0.250   1.191

Getting started

Quick examples:

  1. Check the currently used AWS user's name:

     iam_whoami()
     #> [1] "gergely-dev"
    
  2. Read a csv file stored in S3 using a helper function:

     s3_read('s3://botor/example-data/mtcars.csv', read.csv)
     #>     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
     #> 1  21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
     #> 2  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
     #> ...
    
  3. Encrypt a string via KMS using a helper function:

     kms_encrypt('alias/key', 'secret')
     #> [1] "QWERTY..."
    
  4. Get more info on the currently used AWS user calling the IAM client directly:

     iam()$get_user()
    
  5. Create a new client to a service without helper functions:

     ec2 <- botor_client('ec2')
     ec2$describe_vpcs()
    

AWS Auth

The botor package by default will use the credentials and related options set in environmental variables or in the ~/.aws/config and ~/.aws/credentials files. If you need to specify a custom profile or AWS region etc, there are various options with different complexity and flexibility:

  • set the related environment variable(s) before loading botor
  • call the botor() function with the relevant argument to set the config of the default session for the botor helper functions, eg
botor(region_name = 'eu-west-42')
  • if you need to manage multiple sessions, then use the raw boto3 object from the botor package and boto3.session.Session to init these custom sessions and the required clients/resources on the top of those, eg
my_custom_session1 <- boto3$Session(region_name = 'us-west-1')
my_custom_s3_client1 <- my_custom_session1$client('s3')
my_custom_session2 <- boto3$Session(region_name = 'us-west-2')
my_custom_s3_client2 <- my_custom_session2$client('s3')

Using the raw boto3 module

The botor package provides the boto3 object with full access to the boto3 Python SDK. Quick example on listing all S3 buckets:

library(botor)
s3 <- boto3$resource('s3')
library(reticulate)
iter_next(s3$buckets$pages())

Note that this approach requires a stable understanding of the boto3 Python module, plus a decent familiarity with reticulate as well (see eg iter_next) -- so you might want to rather consider using the helper functions described below.

Using the default botor session

Calling botor() will provide you a default boto3 session that is cached internally. You can always override the default session by calling botor() again with new arguments. See eg setting the default boto3 session to use us-west-2:

botor(region_name = 'us-west-2')
botor()$resource('s3')

A great advantage of using botor() instead of custom sessions is that it's fork-safe. See eg:

attr(botor(), 'pid')
#> [1] 31225
attr(botor(), 'pid')
#> [1] 31225

lapply(1:2, function(i) attr(botor(), 'pid'))
#> [[1]]
#> [1] 31225
#>
#> [[2]]
#> [1] 31225

mclapply(1:2, function(i) attr(botor(), 'pid'), mc.cores = 2)
#> [[1]]
#> [1] 13209
#> 
#> [[2]]
#> [1] 13210

Convenient helper functions

Besides the botor pre-initialized default Boto3 session, the package also provides some further R helper functions for the most common AWS actions, like interacting with S3 or KMS. Note, that the list of these functions is pretty limited for now, but you can always fall back to the raw Boto3 functions if needed. PRs on new helper functions are appreciated :)

Examples:

  1. Listing all S3 buckets takes some time as it will first initialize the S3 Boto3 client in the background:

     system.time(s3_list_buckets())[['elapsed']]
     #> [1] 1.426
    
  2. But the second query is much faster as reusing the same s3 Boto3 resource:

     system.time(s3_list_buckets())[['elapsed']]
     #> [1] 0.323
    
  3. Unfortunately, sharing the same Boto3 resource between (forked) processes is not ideal, so botor will take care of that by spawning new resources in the forked threads:

     library(parallel)
     simplify2array(mclapply(1:4, function(i) system.time(s3_list_buckets())[['elapsed']], mc.cores = 2))
     #> [1] 1.359 1.356 0.406 0.397
    
  4. Want to speed it up more?

     library(memoise)
     s3_list_buckets <- memoise(s3_list_buckets)
     simplify2array(mclapply(1:4, function(i) system.time(s3_list_buckets())[['elapsed']], mc.cores = 2))
     #> [1] 1.330 1.332 0.000 0.000
    

The currently supported resources and features via helper functions: https://daroczig.github.io/botor/reference/index.html

Error handling

The convenient helper functions try to suppress the boring Python traceback and provide you only the most relevant information on the error. If you want to see the full tracelog and more details after an error, call reticulate::py_last_error(). When working with the raw boto3 wrapper, you may find botor:::trypy useful as well.

s3_download_file('s3://bottttor/example-data/mtcars.csv', tempfile())
#> Error in s3_download_file("s3://bottttor/example-data/mtcars.csv", tempfile()) : 
#>   Python `ClientError`: An error occurred (404) when calling the HeadObject operation: Not Found

s3_read('s3://botor/example-data/mtcars2.csv', read.csv)
#> Error in s3_download(object, t) : 
#>   Python `ClientError`: An error occurred (403) when calling the HeadObject operation: Forbidden

botor(region_name = 'us-west-2')
s3_read('s3://botor/example-data/mtcars.csv', read.csv)
#>     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#> 1  21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
#> 2  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
#> ...

Logging

botor uses the logger package to write log messages to the console by default with the following log level standards:

  • TRACE start of an AWS query (eg just about to start listing all S3 buckets in an AWS account)
  • DEBUG summary on the result of an AWS query (eg number of S3 buckets found in an AWS account)
  • INFO currently not used
  • WARN currently not used
  • ERROR something bad happened and logging extra context besides what's being returned in the error message
  • FATAL currently not used

The default log level threshold set to DEBUG. If you want to update that, use the package name for the namespace argument of log_threshold from the logger package, eg to enable all log messages:

library(logger)
log_threshold(TRACE, namespace = 'botor')

s3_download_file('s3://botor/example-data/mtcars.csv', tempfile())
#> TRACE [2019-01-11 14:48:07] Downloading s3://botor/example-data/mtcars.csv to '/tmp/RtmpCPNrOk/file6fac556567d4' ...
#> DEBUG [2019-01-11 14:48:09] Downloaded 1303 bytes from s3://botor/example-data/mtcars.csv and saved at '/tmp/RtmpCPNrOk/file6fac556567d4'

Or update to not fire the less important messages than warnings:

library(logger)
log_threshold(WARN, namespace = 'botor')

You can use the same approach to set custom (or more than one) log appenders, eg writing the log messages to files, a database etc -- check the logger docs for more details.

Why the name?

botor means "goofy" in Hungarian. This is how I feel when looking back to all the dev hours spent on integrating the AWS Java SDK in R -- this includes AWR.KMS, where I ended up debugging and fixing many issues in forked processes, but AWR.Kinesis still rocks :)

The name also reminds you that it's not exactly boto3, as eg you have to use $ instead of . to access methods.

Functions in botor

Name Description
base64_dec Base64-decode a string into raw bytes using Python's base64 module
botor_session Internal boto3 session
botor_client Creates an initial or reinitialize an already existing AWS client or resource cached in the package's namespace
iam_whoami Get the current AWS username
s3 The default, fork-safe Amazon Simple Storage Service (S3) client on the top of botor
s3_ls List objects at an S3 path
iam_get_user Retrieves information about the specified IAM user, including the user's creation date, path, unique ID, and ARN
iam The default, fork-safe IAM client on the top of botor
s3_list_buckets List all S3 buckets
require_python_module Imports and caches a Python module
botor_session_pid Look up the PID used to initialize the Boto3 session
kinesis The default, fork-safe Kinesis client on the top of botor
s3_write Write an R object into S3
clients boto3 clients cache
ssm The default, fork-safe AWS Systems Manager (SSM) client on the top of botor
base64_enc Base64-encode raw bytes using Python's base64 module
botor_session_uuid Look up the UUID of the initialized Boto3 session
mime_guess Guess the type of a file based on the filename using mimetypes Python module
require_python_builtins Imports and caches a Python module
s3_copy Copy an object from one S3 location to another
s3_delete Delete an object stored in S3
kms_encrypt_file Encrypt file via KMS
kms_generate_data_key Generate a data encryption key for envelope encryption via KMS
s3_split_uri Split the bucket name and object key from the S3 URI
s3_object Create an S3 Object reference from an URI
coerce_bytes_literals_to_string Transforms a python2 string literal or python3 bytes literal into an R string
s3_read Download and read a file from S3, then clean up
s3_upload_file Upload a file to S3
boto3 Raw access to the boto3 module imported at package load time
boto3_version boto3 version
kinesis_describe_stream Describes the specified Kinesis data stream
kinesis_get_records Gets data records from a Kinesis data stream's shard
ssm_get_parameter Read AWS System Manager's Parameter Store
kms_decrypt Decrypt cipher into plain text via KMS
kms The default, fork-safe KMS client on the top of botor
sts_whoami Returns details about the IAM user or role whose credentials are used to call the operation
check_s3_uri Check if an argument looks like an S3 bucket
kinesis_get_shard_iterator Gets an Amazon Kinesis shard iterator
kinesis_put_record Writes a single data record into an Amazon Kinesis data stream
kms_decrypt_file Decrypt file via KMS
s3_download_file Download a file from S3
trypy Extract error message from a Python exception
kms_encrypt Encrypt plain text via KMS
s3_exists Checks if an object exists in S3
uuid Generate UUID using Python's uuid module
botor The default, fork-safe Boto3 session
No Results!

Last month downloads

Details

Type Package
SystemRequirements boto3 (https://aws.amazon.com/sdk-for-python)
Date 2019-09-23
URL https://daroczig.github.io/botor
BugReports https://github.com/daroczig/botor/issues
RoxygenNote 6.1.1
License AGPL-3
Encoding UTF-8
NeedsCompilation no
Packaged 2019-09-23 00:39:27 UTC; daroczig
Repository CRAN
Date/Publication 2019-10-21 17:40:02 UTC

Include our badge in your README

[![Rdoc](http://www.rdocumentation.org/badges/version/botor)](http://www.rdocumentation.org/packages/botor)