spark_read_csv
From sparklyr v0.3.9
by Javier Luraschi
Read a CSV file into a Spark DataFrame
Read a CSV file into a Spark DataFrame
Usage
spark_read_csv(sc, name, path, header = TRUE, delimiter = ",", quote = "\"", escape = "\\", charset = "UTF-8", null_value = NULL, options = list(), repartition = 0, memory = TRUE, overwrite = TRUE)
Arguments
- sc
- The Spark connection
- name
- Name of table
- path
- The path to the file. Needs to be accessible from the cluster. Supports: "hdfs://" or "s3n://"
- header
- Should the first row of data be used as a header? Defaults to
TRUE
. - delimiter
- The character used to delimit each column, defaults to
,
. - quote
- The character used as a quote, defaults to
"hdfs://"
. - escape
- The chatacter used to escape other characters, defaults to
\
. - charset
- The character set, defaults to
"UTF-8"
. - null_value
- The character to use for default values, defaults to
NULL
. - options
- A list of strings with additional options.
- repartition
- Total of partitions used to distribute table or 0 (default) to avoid partitioning
- memory
- Load data eagerly into memory
- overwrite
- Overwrite the table with the given name if it already exists
Details
You can read data from HDFS (hdfs://
), S3 (s3n://
), as well as
the local file system (file://
).
If you are reading from a secure S3 bucket be sure that the AWS_ACCESS_KEY_ID
and
AWS_SECRET_ACCESS_KEY
environment variables are both defined.
When header
is FALSE
, the column names are generated with a V
prefix;
e.g. V1, V2, ...
.
Value
-
Reference to a Spark DataFrame / dplyr tbl
See Also
Other reading and writing data: spark_read_json
,
spark_read_parquet
,
spark_write_csv
,
spark_write_json
,
spark_write_parquet
Community examples
Looks like there are no examples yet.