wtd.mean: Weighted Statistical Estimates

Description

These functions compute various weighted versions of standard estimators. In most cases the weights vector is a vector the same length of x, containing frequency counts that in effect expand x by these counts. weights can also be sampling weights, in which setting normwt to TRUE will often be appropriate. This results in making weights sum to the length of the non-missing elements in x. normwt=TRUE thus reflects the fact that the true sample size is the length of the x vector and not the sum of the original values of weights (which would be appropriate had normwt=FALSE). When weights is all ones, the estimates are all identical to unweighted estimates (unless one of the non-default quantile estimation options is specified to wtd.quantile). When missing data have already been deleted for, x, weights, and (in the case of wtd.loess.noiter) y, specifying na.rm=FALSE will save computation time. Omitting the weights argument or specifying NULL or a zero-length vector will result in the usual unweighted estimates.

wtd.mean, wtd.var, and wtd.quantile compute weighted means, variances, and quantiles, respectively. wtd.ecdf computes a weighted empirical distribution function. wtd.table computes a weighted frequency table (although only one stratification variable is supported at present). wtd.rank computes weighted ranks, using mid--ranks for ties. This can be used to obtain Wilcoxon tests and rank correlation coefficients. wtd.loess.noiter is a weighted version of loess.smooth when no iterations for outlier rejection are desired. This results in especially good smoothing when y is binary.

num.denom.setup is a utility function that allows one to deal with observations containing numbers of events and numbers of trials, by outputting two observations when the number of events and non-events (trials - events) exceed zero. A vector of subscripts is generated that will do the proper duplications of observations, and a new binary variable y is created along with usual cell frequencies (weights) for each of the y=0, y=1 cells per observation.

Usage

wtd.mean(x, weights=NULL, normwt="ignored", na.rm=TRUE)
wtd.var(x, weights=NULL, normwt=FALSE, na.rm=TRUE)
wtd.quantile(x, weights=NULL, probs=c(0, .25, .5, .75, 1), 
             type=c('quantile','(i-1)/(n-1)','i/(n+1)','i/n'), 
             normwt=FALSE, na.rm=TRUE)
wtd.ecdf(x, weights=NULL, 
         type=c('i/n','(i-1)/(n-1)','i/(n+1)'), 
         normwt=FALSE, na.rm=TRUE)
wtd.table(x, weights=NULL, type=c('list','table'), 
          normwt=FALSE, na.rm=TRUE)
wtd.rank(x, weights=NULL, normwt=FALSE, na.rm=TRUE)
wtd.loess.noiter(x, y, weights=rep(1,n), robust=rep(1,n), 
                 span=2/3, degree=1, cell=.13333, 
                 type=c('all','ordered all','evaluate'), 
                 evaluation=100, na.rm=TRUE)
num.denom.setup(num, denom)

Arguments

a numeric vector (may be a character or category or factor vector for wtd.table)

num

vector of numerator frequencies

denom

vector of denominators (numbers of trials)

weights

a numeric vector of weights

normwt

specify normwt=TRUE to make weights sum to length(x) after deletion of NAs

na.rm

set to FALSE to suppress checking for NAs

probs

a vector of quantiles to compute. Default is 0 (min), .25, .5, .75, 1 (max).

type

For wtd.quantile, type defaults to quantile to use the same interpolated order statistic method as quantile. Set type to "(i-1)/(n-1)","i/(n+1)", or "i/n"

y

a numeric vector the same length as x

robust, span, degree, cell, evaluation

see loess.smooth.  The default is linear (degree=1) and 100 points
to evaluation (if type="evaluate").

`Value`

wtd.mean and wtd.var return scalars.  wtd.quantile returns a
vector the same length as probs.  wtd.ecdf returns a list whose
elements x and ecdf correspond to unique sorted values of x.
If the first CDF estimate is greater than zero, a point (min(x),0) is
placed at the beginning of the estimates.
See above for wtd.table.  wtd.rank returns a vector the same
length as x (after removal of NAs, depending on na.rm).  See above
for wtd.loess.noiter.

`concept`

weighted sampling
grouping
weights

`Details`

The functions correctly combine weights of observations having
duplicate values of x before computing estimates.

wtd.rank does not handle NAs as elegantly as rank if weights is
specified.

`References`

Research Triangle Institute (1995): SUDAAN User's Manual, Release
6.40, pp. 8--16 to 8--17.

`See Also`

mean, var, quantile, table, rank, loess.smooth, lowess,
plsmo, ecdf, somers2, describe

`Examples`

Run this codeset.seed(1)
x <- runif(500)
wts <- sample(1:6, 500, TRUE)
std.dev <- sqrt(wtd.var(x, wts))
wtd.quantile(x, wts)
death <- sample(0:1, 500, TRUE)
plot(wtd.loess.noiter(x, death, wts, type='evaluate'))
describe(~x, weights=wts)
# describe uses wtd.mean, wtd.quantile, wtd.table
xg <- cut2(x,g=4)
table(xg)
wtd.table(xg, wts, type='table')

# Here is a method for getting stratified weighted means
y <- runif(500)
g <- function(y) wtd.mean(y[,1],y[,2])
summarize(cbind(y, wts), llist(xg), g, stat.name='y')


# Restructure data to generate a dichotomous response variable
# from records containing numbers of events and numbers of trials
num   <- c(10,NA,20,0,15)   # data are 10/12 NA/999 20/20 0/25 15/35
denom <- c(12,999,20,25,35)
w     <- num.denom.setup(num, denom)
w
# attach(my.data.frame[w$subs,])
Run the code above in your browser using DataLab