download_BERT_checkpoint: Download a BERT checkpoint

Description

Downloads the specified BERT checkpoint from the Google Research collection or other repositories.

Usage

download_BERT_checkpoint(
  model = c("bert_base_uncased", "bert_base_cased", "bert_large_uncased",
    "bert_large_cased", "bert_large_uncased_wwm", "bert_large_cased_wwm",
    "bert_base_multilingual_cased", "bert_base_chinese", "scibert_scivocab_uncased",
    "scibert_scivocab_cased", "scibert_basevocab_uncased", "scibert_basevocab_cased"),
  dir = NULL,
  url = NULL,
  force = FALSE,
  keep_archive = FALSE,
  archive_type = NULL
)

Arguments

model

Character vector. Which model checkpoint to download.

dir

Character vector. Destination directory for checkpoints. Leave NULL to allow RBERT to automatically choose a directory. The path is determined from the dir parameter if supplied, followed by the `RBERT.dir` option (set using set_BERT_dir), followed by an "RBERT" folder in the user cache directory (determined using user_cache_dir). If you provide a dir, the `RBERT.dir` option will be updated to that location. Note that the checkpoint will create a subdirectory inside this dir.

url

Character vector. An optional url from which to download a checkpoint. Overrides model parameter if not NULL.

force

Logical. Download even if the checkpoint already exists in the specified directory? Default FALSE.

keep_archive

Logical. Keep the zip (or other archive) file? Leave as FALSE to save space.

archive_type

How is the checkpoint archived? We currently support "zip" and "tar-gzip". Leave NULL to infer from the url.

Value

If successful, returns the path to the downloaded checkpoint.

Checkpoints

download_BERT_checkpoint knows about several pre-trained BERT checkpoints. You can specify these checkpoints using the model parameter. Alternatively, you can supply a direct url to any BERT tensorflow checkpoint.

model	layers	hidden	heads	parameters	special
bert_base_*	12	768	12	110M	bert_large_*
24	1024	16	340M	bert_large_*_wwm	24
1024	16	340M	whole word masking	bert_base_multilingual_cased	12
768	12	110M	104 languages	bert_base_chinese	12
768	12	110M	Chinese Simplified and Traditional	scibert_scivocab_*	12
768	12	110M	Trained using the full text of 1.14M scientific papers (18% computer science, 82% biomedical), with a science-specific vocabulary.	model	layers

Examples

Run this code

# NOT RUN {
download_BERT_checkpoint("bert_base_uncased")
download_BERT_checkpoint("bert_large_uncased")
temp_dir <- tempdir()
download_BERT_checkpoint("bert_base_uncased", dir = temp_dir)
# }

Run the code above in your browser using DataLab