Downloads the specified BERT checkpoint from the Google Research collection or other repositories.
download_BERT_checkpoint(
model = c("bert_base_uncased", "bert_base_cased", "bert_large_uncased",
"bert_large_cased", "bert_large_uncased_wwm", "bert_large_cased_wwm",
"bert_base_multilingual_cased", "bert_base_chinese", "scibert_scivocab_uncased",
"scibert_scivocab_cased", "scibert_basevocab_uncased", "scibert_basevocab_cased"),
dir = NULL,
url = NULL,
force = FALSE,
keep_archive = FALSE,
archive_type = NULL
)Character vector. Which model checkpoint to download.
Character vector. Destination directory for checkpoints. Leave
NULL to allow RBERT to automatically choose a directory. The path is
determined from the dir parameter if supplied, followed by the
`RBERT.dir` option (set using set_BERT_dir), followed by an "RBERT"
folder in the user cache directory (determined using
user_cache_dir). If you provide a dir, the
`RBERT.dir` option will be updated to that location. Note that the
checkpoint will create a subdirectory inside this dir.
Character vector. An optional url from which to download a
checkpoint. Overrides model parameter if not NULL.
Logical. Download even if the checkpoint already exists in the
specified directory? Default FALSE.
Logical. Keep the zip (or other archive) file? Leave as
FALSE to save space.
How is the checkpoint archived? We currently support
"zip" and "tar-gzip". Leave NULL to infer from the url.
If successful, returns the path to the downloaded checkpoint.
download_BERT_checkpoint knows about several
pre-trained BERT checkpoints. You can specify these checkpoints using the
model parameter. Alternatively, you can supply a direct url
to any BERT tensorflow checkpoint.
| model | layers | hidden | heads | parameters | special |
| bert_base_* | 12 | 768 | 12 | 110M | bert_large_* |
| 24 | 1024 | 16 | 340M | bert_large_*_wwm | 24 |
| 1024 | 16 | 340M | whole word masking | bert_base_multilingual_cased | 12 |
| 768 | 12 | 110M | 104 languages | bert_base_chinese | 12 |
| 768 | 12 | 110M | Chinese Simplified and Traditional | scibert_scivocab_* | 12 |
| 768 | 12 | 110M | Trained using the full text of 1.14M scientific papers (18% computer science, 82% biomedical), with a science-specific vocabulary. | model | layers |
# NOT RUN {
download_BERT_checkpoint("bert_base_uncased")
download_BERT_checkpoint("bert_large_uncased")
temp_dir <- tempdir()
download_BERT_checkpoint("bert_base_uncased", dir = temp_dir)
# }
Run the code above in your browser using DataLab