Downloads the specified BERT checkpoint from the Google Research collection or other repositories.
download_BERT_checkpoint(
model = c("bert_base_uncased", "bert_base_cased", "bert_large_uncased",
"bert_large_cased", "bert_large_uncased_wwm", "bert_large_cased_wwm",
"bert_base_multilingual_cased", "bert_base_chinese", "scibert_scivocab_uncased",
"scibert_scivocab_cased", "scibert_basevocab_uncased", "scibert_basevocab_cased"),
dir = NULL,
url = NULL,
force = FALSE,
keep_archive = FALSE,
archive_type = NULL
)
Character vector. Which model checkpoint to download.
Character vector. Destination directory for checkpoints. Leave
NULL
to allow RBERT to automatically choose a directory. The path is
determined from the dir
parameter if supplied, followed by the
`RBERT.dir` option (set using set_BERT_dir), followed by an "RBERT"
folder in the user cache directory (determined using
user_cache_dir
). If you provide a dir
, the
`RBERT.dir` option will be updated to that location. Note that the
checkpoint will create a subdirectory inside this dir
.
Character vector. An optional url from which to download a
checkpoint. Overrides model
parameter if not NULL.
Logical. Download even if the checkpoint already exists in the
specified directory? Default FALSE
.
Logical. Keep the zip (or other archive) file? Leave as
FALSE
to save space.
How is the checkpoint archived? We currently support
"zip" and "tar-gzip". Leave NULL to infer from the url
.
If successful, returns the path to the downloaded checkpoint.
download_BERT_checkpoint
knows about several
pre-trained BERT checkpoints. You can specify these checkpoints using the
model
parameter. Alternatively, you can supply a direct url
to any BERT tensorflow checkpoint.
model | layers | hidden | heads | parameters | special |
bert_base_* | 12 | 768 | 12 | 110M | bert_large_* |
24 | 1024 | 16 | 340M | bert_large_*_wwm | 24 |
1024 | 16 | 340M | whole word masking | bert_base_multilingual_cased | 12 |
768 | 12 | 110M | 104 languages | bert_base_chinese | 12 |
768 | 12 | 110M | Chinese Simplified and Traditional | scibert_scivocab_* | 12 |
768 | 12 | 110M | Trained using the full text of 1.14M scientific papers (18% computer science, 82% biomedical), with a science-specific vocabulary. | model | layers |
# NOT RUN {
download_BERT_checkpoint("bert_base_uncased")
download_BERT_checkpoint("bert_large_uncased")
temp_dir <- tempdir()
download_BERT_checkpoint("bert_base_uncased", dir = temp_dir)
# }
Run the code above in your browser using DataLab