Learn R Programming

actuaRE (version 0.1.7)

simulatedclustereddata: Simulated data sets to illustrate the package functionality

Description

Both the tweedietraindata and tweedietestdata dataframe are synthetically generated data sets to illustrate the functionality of the package. The tweedietraindata has 250 000 observations and the tweedietestdata has 250 000 observations. The same settings were used to generate both data sets.

Usage

data(tweedietraindata)
  data(tweedietestdata)

Arguments

Format

y

the tweedie distributed outcome variable

cluster

the cluster

subcluster

the subcluster nested within cluster

x1

covariate 1

x2

covariate 2

x3

covariate 3

x4

covariate 4

x5

covariate 5

Details

See the examples for how the data sets were generated.

Examples

Run this code
  # The data sets were generated as follows
  lapply(c("magrittr", "dplyr", "data.table", "tweedie"), library, character.only = TRUE)
  set.seed(1)

  # Simulate training data
  set.seed(1)
  nClusters    = 5
  nSubclusters = 5
  p            = 5
  Uj           = scale(rnorm(nClusters))
  Ujk          = do.call("c", lapply(seq_len(nClusters), function(x) scale(rnorm(nSubclusters))))
  nPop         = 1e6
  nSample      = 50
  nTest        = 1e3
  X            = replicate(p, rnorm(nPop))
  Beta         = rnorm(p)
  cluster      = sample(seq_len(nClusters), nPop, TRUE)
  subcluster   = NULL
  uniqueCl     = sort(unique(cluster))
  for(cl in uniqueCl)
    subcluster[cluster == cl] <- sample(
      1 - seq_len(nSubclusters) + which(cl == uniqueCl) * nSubclusters,
      sum(cluster == cl),
      TRUE)
  table(subcluster, cluster)
  eta       = X %*% Beta + Uj[match(cluster, seq_len(nClusters))] +
              Ujk[match(subcluster, seq_len(nClusters * nSubclusters))]
  y         = rtweedie(nPop, mu = exp(as.vector(eta)), phi = 1, power = 1.5)
  wt        = runif(nPop)
  Dt        = data.frame(y, X, wt, cluster, subcluster)
  colnames(Dt) %<>% tolower

  tweedietraindata = Dt %>%
    group_by(subcluster) %>%
    sample_n(size = nSample) %>%
    as.data.table

  tweedietestdata = Dt %>%
    group_by(subcluster) %>%
    sample_n(size = nSample) %>%
    as.data.table

Run the code above in your browser using DataLab