Learn R Programming

sharp (version 1.4.7)

Split: Splitting observations into non-overlapping sets

Description

Generates a list of length(tau) non-overlapping sets of observation IDs.

Usage

Split(data, family = NULL, tau = c(0.5, 0.25, 0.25))

Value

A list of length length(tau) with sets of non-overlapping observation IDs.

Arguments

data

vector or matrix of data. In regression, this should be the outcome data.

family

type of regression model. This argument is defined as in glmnet. Possible values include "gaussian" (linear regression), "binomial" (logistic regression), "multinomial" (multinomial regression), and "cox" (survival analysis).

tau

vector of the proportion of observations in each of the sets.

Details

With categorical outcomes (i.e. family argument is set to "binomial", "multinomial" or "cox"), the split is done such that the proportion of observations from each of the categories in each of the sets is representative of that of the full sample.

Examples

Run this code
# Splitting into 3 sets
simul <- SimulateRegression()
ids <- Split(data = simul$ydata)
lapply(ids, length)

# Balanced splits with respect to a binary variable
simul <- SimulateRegression(family = "binomial")
ids <- Split(data = simul$ydata, family = "binomial")
lapply(ids, FUN = function(x) {
  table(simul$ydata[x, ])
})

Run the code above in your browser using DataLab