
Last chance! 50% off unlimited learning
Sale ends in
Additional JDBC database connection properties can be set (...)
read.jdbc(url, tableName, partitionColumn = NULL, lowerBound = NULL,
upperBound = NULL, numPartitions = 0L, predicates = list(), ...)
JDBC database url of the form jdbc:subprotocol:subname
the name of the table in the external database
the name of a column of integral type that will be used for partitioning
the minimum value of partitionColumn
used to decide partition stride
the maximum value of partitionColumn
used to decide partition stride
the number of partitions, This, along with lowerBound
(inclusive),
upperBound
(exclusive), form partition strides for generated WHERE
clause expressions used to split the column partitionColumn
evenly.
This defaults to SparkContext.defaultParallelism when unset.
a list of conditions in the where clause; each one defines one partition
additional JDBC database connection named properties.
SparkDataFrame
Only one of partitionColumn or predicates should be set. Partitions of the table will be
retrieved in parallel based on the numPartitions
or by the predicates.
Don't create too many partitions in parallel on a large cluster; otherwise Spark might crash your external database systems.
# NOT RUN {
sparkR.session()
jdbcUrl <- "jdbc:mysql://localhost:3306/databasename"
df <- read.jdbc(jdbcUrl, "table", predicates = list("field<=123"), user = "username")
df2 <- read.jdbc(jdbcUrl, "table2", partitionColumn = "index", lowerBound = 0,
upperBound = 10000, user = "username", password = "password")
# }
Run the code above in your browser using DataLab