Hmisc (version 5.1-2)

HmiscOverview: Overview of Hmisc Library

Description

The Hmisc library contains many functions useful for data analysis, high-level graphics, utility operations, functions for computing sample size and power, translating SAS datasets into R, imputing missing values, advanced table making, variable clustering, character string manipulation, conversion of R objects to LaTeX code, recoding variables, and bootstrap repeated measures analysis. Most of these functions were written by F Harrell, but a few were collected from statlib and from s-news; other authors are indicated below. This collection of functions includes all of Harrell's submissions to statlib other than the functions in the rms and display libraries. A few of the functions do not have “Help” documentation.

To make Hmisc load silently, issue options(Hverbose=FALSE) before library(Hmisc).

Arguments

Functions

Function NamePurpose
abs.error.predComputes various indexes of predictive accuracy based
on absolute errors, for linear models
addMarginalAdd marginal observations over selected variables
all.is.numericCheck if character strings are legal numerics
approxExtrapLinear extrapolation
aregImputeMultiple imputation based on additive regression,
bootstrapping, and predictive mean matching
areg.bootNonparametrically estimate transformations for both
sides of a multiple additive regression, and
bootstrap these estimates and \(R^2\)
ballocationOptimum sample allocations in 2-sample proportion test
binconfExact confidence limits for a proportion and more accurate
(narrower!) score stat.-based Wilson interval
(Rollin Brant, mod. FEH)
bootkmBootstrap Kaplan-Meier survival or quantile estimates
bpowerApproximate power of 2-sided test for 2 proportions
Includes bpower.sim for exact power by simulation
bpplotBox-Percentile plot
(Jeffrey Banfield, umsfjban@bill.oscs.montana.edu)
bpplotMChart extended box plots for multiple variables
bsamsizeSample size requirements for test of 2 proportions
bystatsStatistics on a single variable by levels of >=1 factors
bystats22-way statistics
character.tableShows numeric equivalents of all latin characters
Useful for putting many special chars. in graph titles
(Pierre Joyet, pierre.joyet@bluewin.ch)
ciapowerPower of Cox interaction test
cleanup.importMore compactly store variables in a data frame, and clean up
problem data when e.g. Excel spreadsheet had a non-
numeric value in a numeric column
combine.levelsCombine infrequent levels of a categorical variable
confbarDraws confidence bars on an existing plot using multiple
confidence levels distinguished using color or gray scale
contentsPrint the contents (variables, labels, etc.) of a data frame
cpowerPower of Cox 2-sample test allowing for noncompliance
CsVector of character strings from list of unquoted names
csv.getEnhanced importing of comma separated files labels
cut2Like cut with better endpoint label construction and allows
construction of quantile groups or groups with given n
datadensitySnapshot graph of distributions of all variables in
a data frame. For continuous variables uses scat1d.
dataRepQuantify representation of new observations in a database
ddmmmyySAS “date7” output format for a chron object
deffKish design effect and intra-cluster correlation
describeFunction to describe different classes of objects.
Invoke by saying describe(object). It calls one of the
following:
describe.data.frameDescribe all variables in a data frame (generalization
of SAS UNIVARIATE)
describe.defaultDescribe a variable (generalization of SAS UNIVARIATE)
dotplot3A more flexible version of dotplot
DotplotEnhancement of Trellis dotplot allowing for matrix
x-var., auto generation of Key function, superposition
drawPlotSimple mouse-driven drawing program, including a function
for fitting Bezier curves
EcdfEmpirical cumulative distribution function plot
errbarPlot with error bars (Charles Geyer, U. Chi., mod FEH)
event.chartPlot general event charts (Jack Lee, jjlee@mdanderson.org,
Ken Hess, Joel Dubin; Am Statistician 54:63-70,2000)
event.historyEvent history chart with time-dependent cov. status
(Joel Dubin, jdubin@uwaterloo.ca)
find.matchesFind matches (with tolerances) between columns of 2 matrices
first.wordFind the first word in an R expression (R Heiberger)
fit.mult.imputeFit most regression models over multiple transcan imputations,
compute imputation-adjusted variances and avg. betas
format.dfFormat a matrix or data frame with much user control
(R Heiberger and FE Harrell)
ftupwrPower of 2-sample binomial test using Fleiss, Tytun, Ury
ftussSample size for 2-sample binomial test using " " " "
(Both by Dan Heitjan, dheitjan@biostats.hmc.psu.edu)
gbayesBayesian posterior and predictive distributions when both
the prior and the likelihood are Gaussian
getHdataFetch and list datasets on our web site
hdquantileHarrell-Davis nonparametric quantile estimator with s.e.
histbackbackBack-to-back histograms (Pat Burns, Salomon Smith
Barney, London, pburns@dorado.sbi.com)
hist.data.frameMatrix of histograms for all numeric vars. in data frame
Use hist.data.frame(data.frame.name)
histSpikeAdd high-resolution spike histograms or density estimates
to an existing plot
hoeffdHoeffding's D test (omnibus test of independence of X and Y)
imputeImpute missing data (generic method)
interactionMore flexible version of builtin function
is.presentTests for non-blank character values or non-NA numeric values
james.steinJames-Stein shrinkage estimates of cell means from raw data
labcurveOptimally label a set of curves that have been drawn on
an existing plot, on the basis of gaps between curves.
Also position legends automatically at emptiest rectangle.
labelSet or fetch a label for an R-object
LagLag a vector, padding on the left with NA or ''
latexConvert an R object to LaTeX (R Heiberger & FE Harrell)
list.treePretty-print the structure of any data object
(Alan Zaslavsky, zaslavsk@hcp.med.harvard.edu)
LoadEnhancement of load
mask8-bit logical representation of a short integer value
(Rick Becker)
matchCasesMatch each case on one continuous variable
matxvFast matrix * vector, handling intercept(s) and NAs
mgp.axisVersion of axis() that uses appropriate mgp from
mgp.axis.labels and gets around bug in axis(2, ...)
that causes it to assume las=1
mgp.axis.labelsUsed by survplot and plot in rms library (and other
functions in the future) so that different spacing
between tick marks and axis tick mark labels may be
specified for x- and y-axes.
Use mgp.axis.labels('default') to set defaults.
Users can set values manually using
mgp.axis.labels(x,y) where x and y are 2nd value of
par('mgp') to use. Use mgp.axis.labels(type=w) to
retrieve values, where w='x', 'y', 'x and y', 'xy',
to get 3 mgp values (first 3 types) or 2 mgp.axis.labels.
minor.tickAdd minor tick marks to an existing plot
mtitleAdd outer titles and subtitles to a multiple plot layout
multLinesDraw multiple vertical lines at each x
in a line plot
%nin%Opposite of %in%
nobsYCompute no. non-NA observations for left hand formula side
nomissReturn a matrix after excluding any row with an NA
panel.bpplotPanel function for trellis bwplot - box-percentile plots
panel.plsmoPanel function for trellis xyplot - uses plsmo
pBlockBlock variables for certain lattice charts
pc1Compute first prin. component and get coefficients on
original scale of variables
plotCorrPrecisionPlot precision of estimate of correlation coefficient
plsmoPlot smoothed x vs. y with labeling and exclusion of NAs
Also allows a grouping variable and plots unsmoothed data
popowerPower and sample size calculations for ordinal responses
(two treatments, proportional odds model)
prnprn(expression) does print(expression) but titles the
output with 'expression'. Do prn(expression,txt) to add
a heading (‘txt’) before the ‘expression’ title
pstampStamp a plot with date in lower right corner (pstamp())
Add ,pwd=T and/or ,time=T to add current directory
name or time
Put additional text for label as first argument, e.g.
pstamp('Figure 1') will draw 'Figure 1 date'
putKeyDifferent way to use key()
putKeyEmptyPut key at most empty part of existing plot
rcorrPearson or Spearman correlation matrix with pairwise deletion
of missing data
rcorr.censSomers' Dxy rank correlation with censored data
rcorrp.censAssess difference in concordance for paired predictors
rcspline.evalEvaluate restricted cubic spline design matrix
rcspline.plotPlot spline fit with nonparametric smooth and grouped estimates
rcspline.restateRestate restricted cubic spline in unrestricted form, and
create TeX expression to print the fitted function
reShapeReshape a matrix into 3 vectors, reshape serial data
rm.bootBootstrap spline fit to repeated measurements model,
with simultaneous confidence region - least
squares using spline function in time
rMultinomGenerate multinomial random variables with varying prob.
samplesize.binSample size for 2-sample binomial problem
(Rick Chappell, chappell@stat.wisc.edu)
sas.getConvert SAS dataset to S data frame
sasxport.getEnhanced importing of SAS transport dataset in R
SaveEnhancement of save
scat1dAdd 1-dimensional scatterplot to an axis of an existing plot
(like bar-codes, FEH/Martin Maechler,
maechler@stat.math.ethz.ch/Jens Oehlschlaegel-Akiyoshi,
oehl@psyres-stuttgart.de)
score.binaryConstruct a score from a series of binary variables or
expressions
seditA set of character handling functions written entirely
in R. sedit() does much of what the UNIX sed
program does. Other functions included are
substring.location, substring<-, replace.string.wild,
and functions to check if a string is numeric or
contains only the digits 0-9
setTrellisSet Trellis graphics to use blank conditioning panel strips,
line thickness 1 for dot plot reference lines:
setTrellis(); 3 optional arguments
show.colShow colors corresponding to col=0,1,...,99
show.pchShow all plotting characters specified by pch=.
Just type show.pch() to draw the table on the
current device.
showPsfragUse LaTeX to compile, and dvips and ghostview to
display a postscript graphic containing psfrag strings
solvetVersion of solve with argument tol passed to qr
somers2Somers' rank correlation and c-index for binary y
spearmanSpearman rank correlation coefficient spearman(x,y)
spearman.testSpearman 1 d.f. and 2 d.f. rank correlation test
spearman2Spearman multiple d.f. \(\rho^2\), adjusted \(\rho^2\), Wilcoxon-Kruskal-
Wallis test, for multiple predictors
spowerSimulate power of 2-sample test for survival under
complex conditions
Also contains the Gompertz2,Weibull2,Lognorm2 functions.
spss.getEnhanced importing of SPSS files using read.spss function
srcsrc(name) = source("name.s") with memory
storestore an object permanently (easy interface to assign function)
strmatchShortest unique identifier match
(Terry Therneau, therneau@mayo.edu)
subsetMore easily subset a data frame
substiSubstitute one var for another when observations NA
summarizeGenerate a data frame containing stratified summary
statistics. Useful for passing to trellis.
summary.formulaGeneral table making and plotting functions for summarizing
data
summaryDSummarizing using user-provided formula and dotchart3
summaryMReplacement for summary.formula(..., method='reverse')
summaryPMulti-panel dot chart for summarizing proportions
summarySSummarize multiple response variables for multi-panel
dot chart or scatterplot
summaryRcSummary for continuous variables using lowess
symbol.freqX-Y Frequency plot with circles' area prop. to frequency
sysExecute unix() or dos() depending on what's running
tabulrFront-end to tabular function in the tables package
texEnclose a string with the correct syntax for using
with the LaTeX psfrag package, for postscript graphics
transaceace() packaged for easily automatically transforming all
variables in a matrix
transcanautomatic transformation and imputation of NAs for a
series of predictor variables
trap.ruleArea under curve defined by arbitrary x and y vectors,
using trapezoidal rule
trellis.strip.blankTo make the strip titles in trellis more visible, you can
make the backgrounds blank by saying trellis.strip.blank().
Use before opening the graphics device.
t.test.cluster2-sample t-test for cluster-randomized observations
uncbindForm individual variables from a matrix
upDataUpdate a data frame (change names, labels, remove vars, etc.)
unitsSet or fetch "units" attribute - units of measurement for var.
varclusGraph hierarchical clustering of variables using squared
Pearson or Spearman correlations or Hoeffding D as similarities
Also includes the naclus function for examining similarities in
patterns of missing values across variables.
wtd.mean
wtd.var
wtd.quantile
wtd.Ecdf
wtd.table
wtd.rank
wtd.loess.noiter
num.denom.setupSet of function for obtaining weighted estimates
xy.groupCompute mean x vs. function of y by groups of x
xYplotLike trellis xyplot but supports error bars and multiple
response variables that are connected as separate lines
ynbindCombine a series of yes/no true/false present/absent variables into a matrix
zoomZoom in on any graphical display
(Bill Dunlap, bill@statsci.com)

Copyright Notice

GENERAL DISCLAIMER
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

In short: You may use it any way you like, as long as you don't charge money for it, remove this notice, or hold anyone liable for its results. Also, please acknowledge the source and communicate changes to the author.

If this software is used is work presented for publication, kindly reference it using for example:
Harrell FE (2014): Hmisc: A package of miscellaneous R functions. Programs available from https://hbiostat.org/R/Hmisc/.
Be sure to reference R itself and other libraries used.

Author

Frank E Harrell Jr
Professor of Biostatistics
Vanderbilt University School of Medicine
Nashville, Tennessee
fh@fharrell.com

References

See Alzola CF, Harrell FE (2004): An Introduction to S and the Hmisc and Design Libraries at https://hbiostat.org/R/doc/sintro.pdf for extensive documentation and examples for the Hmisc package.