insertTableSpark

Uploads a local data frame to Spark/Databricks using multi-row
<code>INSERT INTO ... VALUES (...), (...), ...</code> statements. This is the same
mechanism <code>insertTable()</code> now uses by default on Spark, so you only need
<code>insertTableSpark()</code> directly when you want to tune <code>batchSize</code>. Multi-row
VALUES inserts are dramatically faster than the <code>INSERT ... SELECT ... UNION ALL</code> approach Spark's planner struggles with (benchmarked ~50x faster
at 1000 rows).

Provides tools for working with observational health data in the
Observational Medical Outcomes Partnership (OMOP) Common Data Model format with a pipe friendly syntax.
Common data model database table references are stored in a single compound object along with metadata.

Ger Inberg

CDMConnector

Connect to an OMOP Common Data Model

Adam Black

Artem Gorbachev

Edward Burn

Marti Catala Sabate

Ioanna Nika

insertTableSpark function

<dl><dt>cdm</dt>
<dd>A <code>cdm_reference</code> or <code>db_cdm</code> source object backed by a
Spark/Databricks connection. Must have a <code>writeSchema</code>.</dd>
<dt>name</dt>
<dd>Name of the destination table (single character).</dd>
<dt>table</dt>
<dd>A local data frame to upload.</dd>
<dt>overwrite</dt>
<dd>If <code>TRUE</code> (default), drop the table first if it exists.</dd>
<dt>batchSize</dt>
<dd>Number of rows per <code>INSERT</code> statement. Default 5000.
Larger batches reduce round trips but Spark imposes a query-size
limit (~16MB) — reduce if you hit "query too large" errors with very
wide tables.</dd></dl>

Arguments

Fast bulk insert of a local table on Spark / Databricks — insertTableSpark

<dl>

<dt>cdm</dt>
<dd>A <code>cdm_reference</code> or <code>db_cdm</code> source object backed by a
Spark/Databricks connection. Must have a <code>writeSchema</code>.</dd>


<dt>name</dt>
<dd>Name of the destination table (single character).</dd>


<dt>table</dt>
<dd>A local data frame to upload.</dd>


<dt>overwrite</dt>
<dd>If <code>TRUE</code> (default), drop the table first if it exists.</dd>


<dt>batchSize</dt>
<dd>Number of rows per <code>INSERT</code> statement. Default 5000.
Larger batches reduce round trips but Spark imposes a query-size
limit (~16MB) — reduce if you hit "query too large" errors with very
wide tables.</dd>

</dl>

insertTableSpark: Fast bulk insert of a local table on Spark / Databricks

Description

Usage

Value

Arguments

Details