Learn R Programming

freqweights (version 0.0.1)

smartround: Smart round of variables

Description

Smart round of variables

Usage

smartround(x, ndistinct = 100, freq = ~1, stats = c("mean", "median"),
  method = c("centroid", "median", "ward"), short = FALSE)

Arguments

x
a numeric matrix or data frame. It must contains the variables in the freq formula
ndistinct
number of distinct values you want to obtain. If there are missing values, you will get ndistinct + 1 distint values.
freq
a one-sided, single term formula specifying frequency weights
stats
statistic to compute each distinct value
method
method to calculate the different groups. See hclustvfreq
short
logical. If it is TRUE, the function returns the data collected in a frequency table. Otherwise, it returns a object with the same rows than the original data.

Value

  • If short is TRUE, it returns a frequency table of dimension ndistinct, and missing values are removed.

    Otherwise, a vector or a data frame with the same rows than the original data is returned. It preserves the original order.

Details

If you want to reduce the number of unique elements in your data set to a specific number, you can use this function. It collects the numeric data (uni or multi-dimensional) into groups and estimates the values of the centers for each group.

See Also

hclustvfreq

Examples

Run this code
smartround(c(1:10,NA),2)
smartround(c(1:10,NA),2,short=TRUE)
smartround(iris[,1:4],7,short=TRUE)

if(require(hflights)){

  x <- hflights[,c("ArrDelay")]
  xr <- smartround(x, 100)
  print(length(unique(xr))) ## 101: 100 and NA
  ## xr is almost equal to x
  print(cor(x,xr,use="pairwise.complete.obs")) # 0.998

  ## Now, with a 2-dimension data frame

  d0 <- hflights[,c("ArrDelay","DepDelay")]
  t0 <- tablefreq(d0)
  print(nrow(t0)) # there are 11046 distinct cases

  ## we reduce to just 100  distinct cases
  d2 <- smartround(d0,100)
  ## The correlation is greater than 0.96
  print(cor(cbind(d0,d2),use="pairwise.complete.obs"))

  print(system.time(t1 <- smartround(d0,100,short=TRUE)))
  ## this is fast
  print(system.time(tfast <- smartround(t0,100,freq=~freq, short=TRUE)))
  print(all.equal(t1,tfast))
  print(nrow(t1))
}

Run the code above in your browser using DataLab