s2net (version 1.0)

s2Data: Data wrapper for s2net.

Description

This function preprocess the data to fit a semi-supervised linear joint trained model.

Usage

s2Data(xL, yL, xU = NULL, preprocess = T)

Arguments

xL

The labeled data. Could be a matrix or data.frame.

yL

The labels associated with xL. Could be a vector, matrix or data.frame, of factor or numeric types.

xU

The unlabeled data (optional). Could be a matrix or data.frame.

preprocess

Should the input data be pre-processed? Possible values are:

TRUE (default) The data is converted to a matrix. Factor variables are automatically coded using model.matrix. The data is scaled, and constant columns are removed.

FALSE Do nothing. Keep in mind that the theoretical framework assumes that xL is centered. Unless you are absolutely sure, avoid this.

Another object of class s2Data that was obtained from similar data (same original variables). This is useful when using train/validation sets, to apply the validation data the same transformation as train data.

Value

Returns an object of S3 class s2Data with fields

xL

Transformed labeled data

yL

Transformed labels. If yL was a factor, it is converted to numeric, and the base category is kept in base

xU

Tranformed unlabeled data

type

Type of task. This one is inferred from the response labels.

base

Base category for classification 0 = base

In addition the following attributes are stored.

pr:rm_cols

logical vector of removed columns

pr:center

column center

pr:scale

column scale

pr:ycenter

yL center. Regression

pr:yscale

yL scale. Regression

See Also

s2Fista

Examples

Run this code
# NOT RUN {
data("auto_mpg")

train = s2Data( xL = auto_mpg$P1$xL,
                  yL = auto_mpg$P1$yL,
                  xU = auto_mpg$P1$xU,
                  preprocess = TRUE )
show(train)

# Notice how ordered factor variable $cylinders is handled 
# .L (linear) .Q (quadratic) .C (cubic) and .^4
head(train$xL) 


#if you want to do validation with the unlabeled data
idx = sample(length(auto_mpg$P1$yU), 200)

train = s2Data(xL = auto_mpg$P1$xL, yL = auto_mpg$P1$yL, xU = auto_mpg$P1$xU[idx, ])

valid = s2Data(xL = auto_mpg$P1$xU[-idx, ], yL = auto_mpg$P1$yU[-idx], preprocess = train)

test = s2Data(xL = auto_mpg$P1$xU[idx, ], yL = auto_mpg$P1$yU[idx], preprocess = train)

train
valid
test
# }

Run the code above in your browser using DataLab