spark_read_csv

The name to assign to the newly generated table.

name

The path to the file. Needs to be accessible from the cluster.
Supports the "hdfs://", "s3n://" and "file://" protocols.

path

Boolean; should the first row of data be used as a header?
Defaults to <code>TRUE</code>.

header

columns

Boolean; should column types be automatically inferred?
Requires one extra pass over the data. Defaults to <code>TRUE</code>.

infer_schema

The character used to delimit each column. Defaults to ','.

delimiter

The character used as a quote. Defaults to '"'.

quote

The character used to escape other characters. Defaults to '\'.

escape

The character set. Defaults to "UTF-8".

charset

The character to use for null, or missing, values. Defaults to <code>NULL</code>.

null_value

A list of strings with additional options.

options

The number of partitions used to distribute the
generated table. Use 0 (the default) to avoid partitioning.

repartition

Boolean; should the data be loaded eagerly into memory? (That
is, should the table be cached?)

memory

Boolean; overwrite the table with the given name if it
already exists?

overwrite


Read a tabular data file into a Spark DataFrame.


Provision, connect and interface to Apache Spark from within R.
This package supports connecting to local and remote Apache Spark clusters,
provides a 'dplyr' compatible back-end, and provides an interface to Spark's
built-in machine learning algorithms.

Javier Luraschi

sparklyr

R Interface to Apache Spark

spark_read_csv function

The path to the file. Needs to be accessible from the cluster.
Supports the "hdfs://", "s3n://" and "file://" protocols.

The character used to delimit each column. Defaults to ','.

The character used as a quote. Defaults to '"'.

The character used to escape other characters. Defaults to '\'.

The character set. Defaults to "UTF-8".

spark_read_csv: Read a CSV file into a Spark DataFrame

Description

Usage

Arguments

Details

See Also