Read a JSON file into a Spark DataFrame
spark_read_json(sc, name, path, options = list(), repartition = 0,
memory = TRUE, overwrite = TRUE)
The Spark connection
Name of table
The path to the file. Needs to be accessible from the cluster. Supports: "hdfs://" or "s3n://"
A list of strings with additional options.
Total of partitions used to distribute table or 0 (default) to avoid partitioning
Load data eagerly into memory
Overwrite the table with the given name if it already exists
You can read data from HDFS (hdfs://
), S3 (s3n://
), as well as
the local file system (file://
).
If you are reading from a secure S3 bucket be sure that the AWS_ACCESS_KEY_ID
and
AWS_SECRET_ACCESS_KEY
environment variables are both defined.
Other reading and writing data: spark_read_csv
,
spark_read_parquet
,
spark_write_csv
,
spark_write_json
,
spark_write_parquet