Given a set of values as parameter inputs, construct a BertConfig object with those values.
BertConfig(
vocab_size,
hidden_size = 768L,
num_hidden_layers = 12L,
num_attention_heads = 12L,
intermediate_size = 3072L,
hidden_act = "gelu",
hidden_dropout_prob = 0.1,
attention_probs_dropout_prob = 0.1,
max_position_embeddings = 512L,
type_vocab_size = 16L,
initializer_range = 0.02
)
Integer; vocabulary size of inputs_ids
in
BertModel
.
Integer; size of the encoder layers and the pooler layer.
Integer; number of hidden layers in the Transformer encoder.
Integer; number of attention heads for each attention layer in the Transformer encoder.
Integer; the size of the "intermediate" (i.e., feed-forward) layer in the Transformer encoder.
The non-linear activation function (function or string) in the encoder and pooler.
Numeric; the dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
Numeric; the dropout ratio for the attention probabilities.
Integer; the maximum sequence length that this model might ever be used with. Typically set this to something large just in case (e.g., 512 or 1024 or 2048).
Integer; the vocabulary size of the
token_type_ids
passed into BertModel
.
Numeric; the stdev of the truncated_normal_initializer for initializing all weight matrices.
An object of class BertConfig
# NOT RUN {
BertConfig(vocab_size = 30522L)
# }
Run the code above in your browser using DataLab