
Connect to a MongoDB collection. Returns a mongo connection object with methods listed below. Connections automatically get pooled between collection and gridfs objects to the same database.
mongo(
collection = "test",
db = "test",
url = "mongodb://localhost",
verbose = FALSE,
options = ssl_options()
)
Upon success returns a pointer to a collection on the server. The collection can be interfaced using the methods described below.
name of collection
name of database
address of the mongodb server in mongo connection string URI format
emit some more output
additional connection options such as SSL keys/certs.
aggregate(pipeline = '{}', handler = NULL, pagesize = 1000, iterate = FALSE)
Execute a pipeline using the Mongo aggregation framework. Set iterate = TRUE
to return an iterator instead of data frame.
count(query = '{}')
Count the number of records matching a given query
. Default counts all records in collection.
disconnect(gc = TRUE)
Disconnect collection. The connection gets disconnected once the client is not used by collections in the pool.
distinct(key, query = '{}')
List unique values of a field given a particular query.
drop()
Delete entire collection with all data and metadata.
export(con = stdout(), bson = FALSE, query = '{}', fields = '{}', sort = '{"_id":1}')
Streams all data from collection to a connection
in jsonlines format (similar to mongoexport). Alternatively when bson = TRUE
it outputs the binary bson format (similar to mongodump).
find(query = '{}', fields = '{"_id" : 0}', sort = '{}', skip = 0, limit = 0, handler = NULL, pagesize = 1000)
Retrieve fields
from records matching query
. Default handler
will return all data as a single dataframe.
import(con, bson = FALSE)
Stream import data in jsonlines format from a connection
, similar to the mongoimport utility. Alternatively when bson = TRUE
it assumes the binary bson format (similar to mongorestore).
index(add = NULL, remove = NULL)
List, add, or remove indexes from the collection. The add
and remove
arguments can either be a field name or json object. Returns a dataframe with current indexes.
info()
Returns collection statistics and server info (if available).
insert(data, pagesize = 1000, stop_on_error = TRUE, ...)
Insert rows into the collection. Argument 'data' must be a data-frame, named list (for single record) or character vector with json strings (one string for each row). For lists and data frames, arguments in ...
get passed to jsonlite::toJSON
iterate(query = '{}', fields = '{"_id":0}', sort = '{}', skip = 0, limit = 0)
Runs query and returns iterator to read single records one-by-one.
mapreduce(map, reduce, query = '{}', sort = '{}', limit = 0, out = NULL, scope = NULL)
Performs a map reduce query. The map
and reduce
arguments are strings containing a JavaScript function. Set out
to a string to store results in a collection instead of returning.
remove(query = "{}", just_one = FALSE)
Remove record(s) matching query
from the collection.
rename(name, db = NULL)
Change the name or database of a collection. Changing name is cheap, changing database is expensive.
replace(query, update = '{}', upsert = FALSE)
Replace matching record(s) with value of the update
argument.
run(command = '{"ping": 1}', simplify = TRUE)
Run a raw mongodb command on the database. If the command returns data, output is simplified by default, but this can be disabled.
update(query, update = '{"$set":{}}', upsert = FALSE, multiple = FALSE)
Modify fields of matching record(s) with value of the update
argument.
This manual page is deliberately minimal, see the mongolite user manual for more details and worked examples.
Jeroen Ooms (2014). The jsonlite
Package: A Practical and Consistent Mapping Between JSON Data and R Objects. arXiv:1403.2805. https://arxiv.org/abs/1403.2805
# Connect to demo server
con <- mongo("mtcars", url =
"mongodb+srv://readwrite:test@cluster0-84vdt.mongodb.net/test")
if(con$count() > 0) con$drop()
con$insert(mtcars)
stopifnot(con$count() == nrow(mtcars))
# Query data
mydata <- con$find()
stopifnot(all.equal(mydata, mtcars))
con$drop()
# Automatically disconnect when connection is removed
rm(con)
gc()
if (FALSE) {
# dplyr example
library(nycflights13)
# Insert some data
m <- mongo(collection = "nycflights")
m$drop()
m$insert(flights)
# Basic queries
m$count('{"month":1, "day":1}')
jan1 <- m$find('{"month":1, "day":1}')
# Sorting
jan1 <- m$find('{"month":1,"day":1}', sort='{"distance":-1}')
head(jan1)
# Sorting on large data requires index
m$index(add = "distance")
allflights <- m$find(sort='{"distance":-1}')
# Select columns
jan1 <- m$find('{"month":1,"day":1}', fields = '{"_id":0, "distance":1, "carrier":1}')
# List unique values
m$distinct("carrier")
m$distinct("carrier", '{"distance":{"$gt":3000}}')
# Tabulate
m$aggregate('[{"$group":{"_id":"$carrier", "count": {"$sum":1}, "average":{"$avg":"$distance"}}}]')
# Map-reduce (binning)
hist <- m$mapreduce(
map = "function(){emit(Math.floor(this.distance/100)*100, 1)}",
reduce = "function(id, counts){return Array.sum(counts)}"
)
# Stream jsonlines into a connection
tmp <- tempfile()
m$export(file(tmp))
# Remove the collection
m$drop()
# Import from jsonlines stream from connection
dmd <- mongo("diamonds")
dmd$import(url("http://jeroen.github.io/data/diamonds.json"))
dmd$count()
# Export
dmd$drop()
}
Run the code above in your browser using DataLab