tidyjson (version 0.2.4)

json_schema: Create a schema for a JSON document or collection

Description

Returns a JSON document that captures the 'schema' of the collection of document(s) passed in, as a JSON string. The schema collapses complex JSON into a simple form using the following rules:

Usage

json_schema(.x, type = c("string", "value"))

Arguments

.x

a json string or tbl_json object

type

whether to capture scalar nodes using the string that defines their type (e.g., "logical") or as a representative value (e.g., "true")

Value

a character string JSON document that represents the schema of the collection

Details

  • string -> "string", e.g., "a sentence" -> "string"

  • number -> "number", e.g., 32000.1 -> "number"

  • true -> "logical", e.g., true -> "logical"

  • false -> "logical", e.g., false -> "logical"

  • null -> "null", e.g., null -> "null"

  • array -> [<type>] e.g., [1, 2] -> ["number"]

  • object -> "name": <type> e.g., "age": 32 -> "age": "number"

For more complex JSON objects, ties are broken by taking the most complex example (using json_complexity), and then by type (using json_types).

This means that if a name has varying schema across documents, the most complex schema will be chosen as being representative. Similarly, if the elements of an array vary in schema, the most complex element is chosen, and if arrays vary in schema across documents, the most complex is chosen.

Note that json_schema can be slow for large JSON document collections, you may want to sample your JSON collection first.

See Also

json_structure to recursively structure all documents into a single data frame

Examples

Run this code
# NOT RUN {
# A simple string
'"string"' %>% json_schema %>% writeLines

# A simple object
'{"name": "value"}' %>% json_schema %>% writeLines

# A more complex JSON array
json <- '[{"a": 1}, [1, 2], "a", 1, true, null]'

# Using type = 'string' (default)
json %>% json_schema %>% writeLines

# Using type = 'value' to show a representative value
json %>% json_schema(type = "value") %>% writeLines

# Schema of the first 5 github issues
library(dplyr)
issues %>% gather_array %>% slice(1:10) %>%
  json_schema(type = "value") %>% writeLines
# }

Run the code above in your browser using DataCamp Workspace