Helper function to supply data in batches of a data iterator when
constructing a DMatrix from external memory through xgb.ExtMemDMatrix()
or through xgb.QuantileDMatrix.from_iterator().
This function is only meant to be called inside of a callback function (which
is passed as argument to function xgb.DataIter() to construct a data iterator)
when constructing a DMatrix through external memory - otherwise, one should call
xgb.DMatrix() or xgb.QuantileDMatrix().
The object that results from calling this function directly is not like
an xgb.DMatrix - i.e. cannot be used to train a model, nor to get predictions - only
possible usage is to supply data to an iterator, from which a DMatrix is then constructed.
For more information and for example usage, see the documentation for xgb.ExtMemDMatrix().
xgb.DataBatch(
data,
label = NULL,
weight = NULL,
base_margin = NULL,
feature_names = colnames(data),
feature_types = NULL,
group = NULL,
qid = NULL,
label_lower_bound = NULL,
label_upper_bound = NULL,
feature_weights = NULL
)An object of class xgb.DataBatch, which is just a list containing the
data and parameters passed here. It does not inherit from xgb.DMatrix.
Batch of data belonging to this batch.
Note that not all of the input types supported by xgb.DMatrix() are possible
to pass here. Supported types are:
matrix, with types numeric, integer, and logical. Note that for types
integer and logical, missing values might not be automatically recognized as
as such - see the documentation for parameter missing in xgb.ExtMemDMatrix()
for details on this.
data.frame, with the same types as supported by 'xgb.DMatrix' and same
conversions applied to it. See the documentation for parameter data in
xgb.DMatrix() for details on it.
CSR matrices, as class dgRMatrix from package "Matrix".
Label of the training data. For classification problems, should be passed encoded as integers with numeration starting at zero.
Weight for each instance.
Note that, for ranking task, weights are per-group. In ranking task, one weight is assigned to each group (not each data point). This is because we only care about the relative ordering of data points within each group, so it doesn't make sense to assign weights to individual data points.
Base margin used for boosting from existing model.
In the case of multi-output models, one can also pass multi-dimensional base_margin.
Set names for features. Overrides column names in data frame and matrix.
Note: columns are not referenced by name when calling predict, so the column order there
must be the same as in the DMatrix construction, regardless of the column names.
Set types for features.
If data is a data.frame and passing feature_types is not supplied,
feature types will be deduced automatically from the column types.
Otherwise, one can pass a character vector with the same length as number of columns in data,
with the following possible values:
"c", which represents categorical columns.
"q", which represents numeric columns.
"int", which represents integer columns.
"i", which represents logical (boolean) columns.
Note that, while categorical types are treated differently from the rest for model fitting purposes, the other types do not influence the generated model, but have effects in other functionalities such as feature importances.
Important: Categorical features, if specified manually through feature_types, must
be encoded as integers with numeration starting at zero, and the same encoding needs to be
applied when passing data to predict(). Even if passing factor types, the encoding will
not be saved, so make sure that factor columns passed to predict have the same levels.
Group size for all ranking group.
Query ID for data samples, used for ranking.
Lower bound for survival training.
Upper bound for survival training.
Set feature weights for column sampling.
xgb.DataIter(), xgb.ExtMemDMatrix().