Uploads a local data frame to Spark/Databricks using multi-row
INSERT INTO ... VALUES (...), (...), ... statements. This is the same
mechanism insertTable() now uses by default on Spark, so you only need
insertTableSpark() directly when you want to tune batchSize. Multi-row
VALUES inserts are dramatically faster than the INSERT ... SELECT ... UNION ALL approach Spark's planner struggles with (benchmarked ~50x faster
at 1000 rows).
insertTableSpark(cdm, name, table, overwrite = TRUE, batchSize = 5000L)A cdm_table referencing the newly inserted table.
A cdm_reference or db_cdm source object backed by a
Spark/Databricks connection. Must have a writeSchema.
Name of the destination table (single character).
A local data frame to upload.
If TRUE (default), drop the table first if it exists.
Number of rows per INSERT statement. Default 5000.
Larger batches reduce round trips but Spark imposes a query-size
limit (~16MB) — reduce if you hit "query too large" errors with very
wide tables.
Only intended for Spark connections. For other dialects use
insertTable().