mongo: MongoDB client

Description

Connect to a MongoDB collection. Returns a mongo connection object with methods listed below. Connections automatically get pooled between collection and gridfs objects to the same database.

Usage

mongo(
  collection = "test",
  db = "test",
  url = "mongodb://localhost",
  verbose = FALSE,
  options = ssl_options()
)

Arguments

collection

name of collection

name of database

url

address of the mongodb server in mongo connection string URI format

verbose

emit some more output

options

additional connection options such as SSL keys/certs.

Value

Upon success returns a pointer to a collection on the server. The collection can be interfaced using the methods described below.

Methods

aggregate(pipeline = '{}', handler = NULL, pagesize = 1000, iterate = FALSE): Execute a pipeline using the Mongo aggregation framework. Set iterate = TRUE to return an iterator instead of data frame.
count(query = '{}'): Count the number of records matching a given query. Default counts all records in collection.
disconnect(gc = TRUE): Disconnect collection. The connection gets disconnected once the client is not used by collections in the pool.
distinct(key, query = '{}'): List unique values of a field given a particular query.
drop(): Delete entire collection with all data and metadata.
export(con = stdout(), bson = FALSE, query = '{}', fields = '{}', sort = '{"_id":1}'): Streams all data from collection to a connection in jsonlines format (similar to mongoexport). Alternatively when bson = TRUE it outputs the binary bson format (similar to mongodump).
find(query = '{}', fields = '{"_id" : 0}', sort = '{}', skip = 0, limit = 0, handler = NULL, pagesize = 1000): Retrieve fields from records matching query. Default handler will return all data as a single dataframe.
import(con, bson = FALSE): Stream import data in jsonlines format from a connection, similar to the mongoimport utility. Alternatively when bson = TRUE it assumes the binary bson format (similar to mongorestore).
index(add = NULL, remove = NULL): List, add, or remove indexes from the collection. The add and remove arguments can either be a field name or json object. Returns a dataframe with current indexes.
info(): Returns collection statistics and server info (if available).
insert(data, pagesize = 1000, stop_on_error = TRUE, ...): Insert rows into the collection. Argument 'data' must be a data-frame, named list (for single record) or character vector with json strings (one string for each row). For lists and data frames, arguments in ... get passed to jsonlite::toJSON
iterate(query = '{}', fields = '{"_id":0}', sort = '{}', skip = 0, limit = 0): Runs query and returns iterator to read single records one-by-one.
mapreduce(map, reduce, query = '{}', sort = '{}', limit = 0, out = NULL, scope = NULL): Performs a map reduce query. The map and reduce arguments are strings containing a JavaScript function. Set out to a string to store results in a collection instead of returning.
remove(query = "{}", just_one = FALSE): Remove record(s) matching query from the collection.
rename(name, db = NULL): Change the name or database of a collection. Changing name is cheap, changing database is expensive.
replace(query, update = '{}', upsert = FALSE): Replace matching record(s) with value of the update argument.
run(command = '{"ping": 1}', simplify = TRUE): Run a raw mongodb command on the database. If the command returns data, output is simplified by default, but this can be disabled.
update(query, update = '{"$set":{}}', upsert = FALSE, multiple = FALSE): Modify fields of matching record(s) with value of the update argument.

Details

This manual page is deliberately minimal, see the mongolite user manual for more details and worked examples.

References

Mongolite User Manual

Jeroen Ooms (2014). The jsonlite Package: A Practical and Consistent Mapping Between JSON Data and R Objects. arXiv:1403.2805. https://arxiv.org/abs/1403.2805

Examples

Run this code

# NOT RUN {
# Connect to demo server
con <- mongo("mtcars", url =
  "mongodb+srv://readwrite:test@cluster0-84vdt.mongodb.net/test")
if(con$count() > 0) con$drop()
con$insert(mtcars)
stopifnot(con$count() == nrow(mtcars))

# Query data
mydata <- con$find()
stopifnot(all.equal(mydata, mtcars))
con$drop()

# Automatically disconnect when connection is removed
rm(con)
gc()

# }
# NOT RUN {
# dplyr example
library(nycflights13)

# Insert some data
m <- mongo(collection = "nycflights")
m$drop()
m$insert(flights)

# Basic queries
m$count('{"month":1, "day":1}')
jan1 <- m$find('{"month":1, "day":1}')

# Sorting
jan1 <- m$find('{"month":1,"day":1}', sort='{"distance":-1}')
head(jan1)

# Sorting on large data requires index
m$index(add = "distance")
allflights <- m$find(sort='{"distance":-1}')

# Select columns
jan1 <- m$find('{"month":1,"day":1}', fields = '{"_id":0, "distance":1, "carrier":1}')

# List unique values
m$distinct("carrier")
m$distinct("carrier", '{"distance":{"$gt":3000}}')

# Tabulate
m$aggregate('[{"$group":{"_id":"$carrier", "count": {"$sum":1}, "average":{"$avg":"$distance"}}}]')

# Map-reduce (binning)
hist <- m$mapreduce(
  map = "function(){emit(Math.floor(this.distance/100)*100, 1)}",
  reduce = "function(id, counts){return Array.sum(counts)}"
)

# Stream jsonlines into a connection
tmp <- tempfile()
m$export(file(tmp))

# Remove the collection
m$drop()

# Import from jsonlines stream from connection
dmd <- mongo("diamonds")
dmd$import(url("http://jeroen.github.io/data/diamonds.json"))
dmd$count()

# Export
dmd$drop()
# }

Run the code above in your browser using DataLab