Create an xgb.QuantileDMatrix object (exact same class as would be returned by
calling function xgb.QuantileDMatrix(), with the same advantages and limitations) from
external data supplied by xgb.DataIter(), potentially passed in batches from
a bigger set that might not fit entirely in memory, same way as xgb.ExtMemDMatrix().
Note that, while external data will only be loaded through the iterator (thus the full data might not be held entirely in-memory), the quantized representation of the data will get created in-memory, being concatenated from multiple calls to the data iterator. The quantized version is typically lighter than the original data, so there might be cases in which this representation could potentially fit in memory even if the full data does not.
For more information, see the guide 'Using XGBoost External Memory Version': https://xgboost.readthedocs.io/en/stable/tutorials/external_memory.html
xgb.QuantileDMatrix.from_iterator(
data_iterator,
missing = NA,
nthread = NULL,
ref = NULL,
max_bin = NULL
)An 'xgb.DMatrix' object, with subclass 'xgb.QuantileDMatrix'.
A data iterator structure as returned by xgb.DataIter(),
which includes an environment shared between function calls, and functions to access
the data in batches on-demand.
A float value to represents missing values in data.
Note that, while functions like xgb.DMatrix() can take a generic NA and interpret it
correctly for different types like numeric and integer, if an NA value is passed here,
it will not be adapted for different input types.
For example, in R integer types, missing values are represented by integer number -2147483648
(since machine 'integer' types do not have an inherent 'NA' value) - hence, if one passes NA,
which is interpreted as a floating-point NaN by xgb.ExtMemDMatrix() and by
xgb.QuantileDMatrix.from_iterator(), these integer missing values will not be treated as missing.
This should not pose any problem for numeric types, since they do have an inheret NaN value.
Number of threads used for creating DMatrix.
The training dataset that provides quantile information, needed when creating
validation/test dataset with xgb.QuantileDMatrix(). Supplying the training DMatrix
as a reference means that the same quantisation applied to the training data is
applied to the validation/test data
The number of histogram bin, should be consistent with the training parameter
max_bin.
This is only supported when constructing a QuantileDMatrix.
xgb.DataIter(), xgb.DataBatch(), xgb.ExtMemDMatrix(),
xgb.QuantileDMatrix()