- formula
Object of class formula
or describing the model to fit
of the form y ~ x1 + x2 + ...
where y
is a numeric response and
x1, x2, ...
are vectors of covariates. Interactions are not supported.
- data
Training data of class data.frame
containing the variables in the model.
- x
Matrix of covariates, alternative to formula
and data
.
- y
Vector of responses, alternative to formula
and data
.
- max_leaves
Maximum number of leaves for the grown tree.
- cp
Complexity parameter, minimum loss decrease to split a node.
A split is only performed if the loss decrease is larger than cp * initial_loss
,
where initial_loss
is the loss of the initial estimate using only a stump.
- min_sample
Minimum number of observations per leaf.
A split is only performed if both resulting leaves have at least
min_sample
observations.
- mtry
Number of randomly selected covariates to consider for a split,
if NULL
all covariates are available for each split.
- fast
If TRUE
, only the optimal splits in the new leaves are
evaluated and the previously optimal splits and their potential loss-decrease are reused.
If FALSE
all possible splits in all the leaves are reevaluated after every split.
- Q_type
Type of deconfounding, one of 'trim', 'pca', 'no_deconfounding'.
'trim' corresponds to the Trim transform Cevid2020SpectralModelsSDModels
as implemented in the Doubly debiased lasso Guo2022DoublyConfoundingSDModels,
'pca' to the PCA transformationPaul2008PreconditioningProblemsSDModels.
See get_Q
.
- trim_quantile
Quantile for Trim transform,
only needed for trim, see get_Q
.
- q_hat
Assumed confounding dimension, only needed for pca,
see get_Q
.
- Qf
Spectral transformation, if NULL
it is internally estimated using get_Q
.
- A
Numerical Anchor of class matrix
. See get_W
.
- gamma
Strength of distributional robustness, \(\gamma \in [0, \infty]\).
See get_W
.
- gpu
If TRUE
, the calculations are performed on the GPU.
If it is properly set up.
- mem_size
Amount of split candidates that can be evaluated at once.
This is a trade-off between memory and speed can be decreased if either
the memory is not sufficient or the gpu is to small.
- max_candidates
Maximum number of split points that are
proposed at each node for each covariate.
- Q_scale
Should data be scaled to estimate the spectral transformation?
Default is TRUE
to not reduce the signal of high variance covariates,
and we do not know of a scenario where this hurts.