
Last chance! 50% off unlimited learning
Sale ends in
These functions compute various weighted versions of standard
estimators. In most cases the weights
vector is a vector the same
length of x
, containing frequency counts that in effect expand x
by these counts. weights
can also be sampling weights, in which
setting normwt
to TRUE
will often be appropriate. This results in
making weights
sum to the length of the non-missing elements in
x
. normwt=TRUE
thus reflects the fact that the true sample size is
the length of the x
vector and not the sum of the original values of
weights
(which would be appropriate had normwt=FALSE
). When weights
is all ones, the estimates are all identical to unweighted estimates
(unless one of the non-default quantile estimation options is
specified to wtd.quantile
). When missing data have already been
deleted for, x
, weights
, and (in the case of wtd.loess.noiter
) y
,
specifying na.rm=FALSE
will save computation time. Omitting the
weights
argument or specifying NULL
or a zero-length vector will
result in the usual unweighted estimates.
wtd.mean
, wtd.var
, and wtd.quantile
compute
weighted means, variances, and quantiles, respectively. wtd.Ecdf
computes a weighted empirical distribution function. wtd.table
computes a weighted frequency table (although only one stratification
variable is supported at present). wtd.rank
computes weighted
ranks, using mid--ranks for ties. This can be used to obtain Wilcoxon
tests and rank correlation coefficients. wtd.loess.noiter
is a
weighted version of loess.smooth
when no iterations for outlier
rejection are desired. This results in especially good smoothing when
y
is binary. wtd.quantile
removes any observations with
zero weight at the beginning. Previously, these were changing the
quantile estimates.
num.denom.setup
is a utility function that allows one to deal with
observations containing numbers of events and numbers of trials, by
outputting two observations when the number of events and non-events
(trials - events) exceed zero. A vector of subscripts is generated
that will do the proper duplications of observations, and a new binary
variable y
is created along with usual cell frequencies (weights
)
for each of the y=0
, y=1
cells per observation.
wtd.mean(x, weights=NULL, normwt="ignored", na.rm=TRUE)
wtd.var(x, weights=NULL, normwt=FALSE, na.rm=TRUE,
method=c('unbiased', 'ML'))
wtd.quantile(x, weights=NULL, probs=c(0, .25, .5, .75, 1),
type=c('quantile','(i-1)/(n-1)','i/(n+1)','i/n'),
normwt=FALSE, na.rm=TRUE)
wtd.Ecdf(x, weights=NULL,
type=c('i/n','(i-1)/(n-1)','i/(n+1)'),
normwt=FALSE, na.rm=TRUE)
wtd.table(x, weights=NULL, type=c('list','table'),
normwt=FALSE, na.rm=TRUE)
wtd.rank(x, weights=NULL, normwt=FALSE, na.rm=TRUE)
wtd.loess.noiter(x, y, weights=rep(1,n),
span=2/3, degree=1, cell=.13333,
type=c('all','ordered all','evaluate'),
evaluation=100, na.rm=TRUE)
num.denom.setup(num, denom)
a numeric vector (may be a character or category
or factor
vector
for wtd.table
)
vector of numerator frequencies
vector of denominators (numbers of trials)
a numeric vector of weights
specify normwt=TRUE
to make weights
sum to
length(x)
after deletion of NA
s. If weights
are
frequency weights, then normwt
should be FALSE
, and if
weights
are normalization (aka reliability) weights, then
normwt
should be TRUE
. In the case of the former, no check
is made that weights
are valid frequencies.
set to FALSE
to suppress checking for NAs
determines the estimator type; if 'unbiased'
(the
default) then the usual unbiased estimate (using Bessel's correction)
is returned, if 'ML'
then it is the maximum likelihood estimate
for a Gaussian distribution. In the case of the latter, the
normwt
argument has no effect. Uses stats:cov.wt
for
both methods.
a vector of quantiles to compute. Default is 0 (min), .25, .5, .75, 1 (max).
For wtd.quantile
, type
defaults to quantile
to use the same
interpolated order statistic method as quantile
. Set type
to
"(i-1)/(n-1)"
,"i/(n+1)"
, or "i/n"
to use the inverse of the
empirical distribution function, using, respectively, (wt - 1)/T,
wt/(T+1), or wt/T, where wt is the cumulative weight and T is the
total weight (usually total sample size). These three values of
type
are the possibilities for wtd.Ecdf
. For wtd.table
the
default type
is "list"
, meaning that the function is to return a
list containing two vectors: x
is the sorted unique values of x
and sum.of.weights
is the sum of weights for that x
. This is the
default so that you don't have to convert the names
attribute of the
result that can be obtained with type="table"
to a numeric variable
when x
was originally numeric. type="table"
for wtd.table
results in an object that is the same structure as those returned from
table
. For wtd.loess.noiter
the default type
is "all"
,
indicating that the function is to return a list containing all the
original values of x
(including duplicates and without sorting) and
the smoothed y
values corresponding to them. Set type="ordered
all"
to sort by x
, and type="evaluate"
to evaluate the smooth
only at evaluation
equally spaced points between the observed limits
of x
.
a numeric vector the same length as x
see loess.smooth
. The default is linear (degree
=1) and 100 points
to evaluation (if type="evaluate"
).
wtd.mean
and wtd.var
return scalars. wtd.quantile
returns a
vector the same length as probs
. wtd.Ecdf
returns a list whose
elements x
and Ecdf
correspond to unique sorted values of x
.
If the first CDF estimate is greater than zero, a point (min(x),0) is
placed at the beginning of the estimates.
See above for wtd.table
. wtd.rank
returns a vector the same
length as x
(after removal of NAs, depending on na.rm
). See above
for wtd.loess.noiter
.
The functions correctly combine weights of observations having
duplicate values of x
before computing estimates.
When normwt=FALSE
the weighted variance will not equal the
unweighted variance even if the weights are identical. That is because
of the subtraction of 1 from the sum of the weights in the denominator
of the variance formula. If you want the weighted variance to equal the
unweighted variance when weights do not vary, use normwt=TRUE
.
The articles by Gatz and Smith discuss alternative approaches, to arrive
at estimators of the standard error of a weighted mean.
wtd.rank
does not handle NAs as elegantly as rank
if
weights
is specified.
Research Triangle Institute (1995): SUDAAN User's Manual, Release 6.40, pp. 8-16 to 8-17.
Gatz DF, Smith L (1995): The standard error of a weighted mean concentration--I. Bootstrapping vs other methods. Atmospheric Env 11:1185-1193.
Gatz DF, Smith L (1995): The standard error of a weighted mean concentration--II. Estimating confidence intervals. Atmospheric Env 29:1195-1200.
https://en.wikipedia.org/wiki/Weighted_arithmetic_mean
mean
, var
, quantile
, table
, rank
, loess.smooth
, lowess
,
plsmo
, Ecdf
, somers2
, describe
# NOT RUN {
set.seed(1)
x <- runif(500)
wts <- sample(1:6, 500, TRUE)
std.dev <- sqrt(wtd.var(x, wts))
wtd.quantile(x, wts)
death <- sample(0:1, 500, TRUE)
plot(wtd.loess.noiter(x, death, wts, type='evaluate'))
describe(~x, weights=wts)
# describe uses wtd.mean, wtd.quantile, wtd.table
xg <- cut2(x,g=4)
table(xg)
wtd.table(xg, wts, type='table')
# Here is a method for getting stratified weighted means
y <- runif(500)
g <- function(y) wtd.mean(y[,1],y[,2])
summarize(cbind(y, wts), llist(xg), g, stat.name='y')
# Empirically determine how methods used by wtd.quantile match with
# methods used by quantile, when all weights are unity
set.seed(1)
u <- eval(formals(wtd.quantile)$type)
v <- as.character(1:9)
r <- matrix(0, nrow=length(u), ncol=9, dimnames=list(u,v))
for(n in c(8, 13, 22, 29))
{
x <- rnorm(n)
for(i in 1:5) {
probs <- sort( runif(9))
for(wtype in u) {
wq <- wtd.quantile(x, type=wtype, weights=rep(1,length(x)), probs=probs)
for(qtype in 1:9) {
rq <- quantile(x, type=qtype, probs=probs)
r[wtype, qtype] <- max(r[wtype,qtype], max(abs(wq-rq)))
}
}
}
}
r
# Restructure data to generate a dichotomous response variable
# from records containing numbers of events and numbers of trials
num <- c(10,NA,20,0,15) # data are 10/12 NA/999 20/20 0/25 15/35
denom <- c(12,999,20,25,35)
w <- num.denom.setup(num, denom)
w
# attach(my.data.frame[w$subs,])
# }
Run the code above in your browser using DataLab