partDSA(x, y, wt=rep(1, nrow(x)), x.test=x, y.test=y, wt.test, control=DSA.control(), sleigh)
DSA.control(vfold=10, minsplit = 20, minbuck=round(minsplit/3), cut.off.growth=10, MPD=0.1, missing="impute.at.split", loss.function="default", wt.method="KM", brier.vec=NULL, leafy=0, leafy.random.num.variables.per.split=4, leafy.num.trees=50, leafy.subsample=0, save.input=FALSE, boost=0, boost.rounds=100, cox.vec=NULL,IBS.wt=NULL, partial=NULL)
x
.
The length of this vector should equal the number of rows in x
.x
.
Default is a vector of ones with length equal to the number of training
set observations.x.test
should equal the number
of columns as x
.
The default is x
.x.test
.
The length of this vector should equal the number of rows in x.test
.
The default value is y
.wt
if x.test
wasn't specified, otherwise
it is a vector of ones with length equal to the number of test
set observations.DSA.control
function.
Default value is the result of calling DSA.control
with no arguments.sleigh
object to allow the cross-validation
to be performed in parallel using the nws
package.
If not specified, the cross-validation will be executed sequentially.x
and y
should be saved
in the object returned by partDSA. If FALSE
, x
and y
are set to NULL
. The default value is FALSE
.missing
set to "no" indicates that there is no missing data and
will create an error if missing data is found in the dataset. Setting missing="impute.at.split" will use a data
imputation method similar to that in CRUISE (Kim and Loh, 2001). At each
split, the non-missing observations for a given variable will be used
to find the best split, and the missing observations will be imputed
based on the mean or mode (depending on whether the variable is
categorical or continuous) of the non-missing observations in that node.
Once the node assignment of these missing observations is determined
using the imputed values, the imputed values are returned to their
missing status. For missing values in the test set, the grand mean or
mode from the corresponding variables in the training set are used.
Including variables which are entirely missing will result in an error.
library(MASS)
set.seed(6442)
n <- nrow(Boston)
tr.n <- floor(n / 2)
train.index <- sample(1:n, tr.n, replace=FALSE)
test.index <- (1:n)[-train.index]
x <- Boston[train.index, -14]
y <- Boston[train.index, 14]
x.test <- Boston[test.index, -14]
y.test <- Boston[test.index, 14]
control <- DSA.control(vfold=1) # no cross-validation
partDSA(x, y, x.test=x.test, y.test=y.test, control=control)
Run the code above in your browser using DataLab