Use this when your inputs are in string or integer format, and you have a
vocabulary file that maps each value to an integer ID. By default,
out-of-vocabulary values are ignored. Use either (but not both) of
num_oov_buckets and default_value to specify how to include
out-of-vocabulary values. For input dictionary features, features[key] is
either tensor or sparse tensor object. If it's tensor object, missing values can be
represented by -1 for int and '' for string. Note that these values are
independent of the default_value argument.
column_categorical_with_vocabulary_file(
...,
vocabulary_file,
vocabulary_size,
num_oov_buckets = 0L,
default_value = NULL,
dtype = tf$string
)Expression(s) identifying input feature(s). Used as the column name and the dictionary key for feature parsing configs, feature tensors, and feature columns.
The vocabulary file name.
Number of the elements in the vocabulary. This must be
no greater than length of vocabulary_file, if less than length, later
values are ignored.
Non-negative integer, the number of out-of-vocabulary
buckets. All out-of-vocabulary inputs will be assigned IDs in the range
[vocabulary_size, vocabulary_size+num_oov_buckets) based on a hash of the
input value. A positive num_oov_buckets can not be specified with
default_value.
The integer ID value to return for out-of-vocabulary
feature values, defaults to -1. This can not be specified with a positive
num_oov_buckets.
The type of features. Only string and integer types are supported.
A categorical column with a vocabulary file.
ValueError: vocabulary_file is missing.
ValueError: vocabulary_size is missing or < 1.
ValueError: num_oov_buckets is not a non-negative integer.
ValueError: dtype is neither string nor integer.
Other feature column constructors:
column_bucketized(),
column_categorical_weighted(),
column_categorical_with_hash_bucket(),
column_categorical_with_identity(),
column_categorical_with_vocabulary_list(),
column_crossed(),
column_embedding(),
column_numeric(),
input_layer()