Use this when your inputs are in string or integer format, and you have a
vocabulary file that maps each value to an integer ID. By default,
out-of-vocabulary values are ignored. Use either (but not both) of
num_oov_buckets
and default_value
to specify how to include
out-of-vocabulary values. For input dictionary features
, features[key]
is
either tensor or sparse tensor object. If it's tensor object, missing values can be
represented by -1
for int and ''
for string. Note that these values are
independent of the default_value
argument.
column_categorical_with_vocabulary_file(
...,
vocabulary_file,
vocabulary_size,
num_oov_buckets = 0L,
default_value = NULL,
dtype = tf$string
)
Expression(s) identifying input feature(s). Used as the column name and the dictionary key for feature parsing configs, feature tensors, and feature columns.
The vocabulary file name.
Number of the elements in the vocabulary. This must be
no greater than length of vocabulary_file
, if less than length, later
values are ignored.
Non-negative integer, the number of out-of-vocabulary
buckets. All out-of-vocabulary inputs will be assigned IDs in the range
[vocabulary_size, vocabulary_size+num_oov_buckets)
based on a hash of the
input value. A positive num_oov_buckets
can not be specified with
default_value
.
The integer ID value to return for out-of-vocabulary
feature values, defaults to -1
. This can not be specified with a positive
num_oov_buckets
.
The type of features. Only string and integer types are supported.
A categorical column with a vocabulary file.
ValueError: vocabulary_file
is missing.
ValueError: vocabulary_size
is missing or < 1.
ValueError: num_oov_buckets
is not a non-negative integer.
ValueError: dtype
is neither string nor integer.
Other feature column constructors:
column_bucketized()
,
column_categorical_weighted()
,
column_categorical_with_hash_bucket()
,
column_categorical_with_identity()
,
column_categorical_with_vocabulary_list()
,
column_crossed()
,
column_embedding()
,
column_numeric()
,
input_layer()