reshape
Reshape Grouped Data
This function reshapes a data frame between wide format with repeated measurements in separate columns of the same record and long format with the repeated measurements in separate records.
- Keywords
- manip
Usage
reshape(data, varying = NULL, v.names = NULL, timevar = "time", idvar = "id", ids = 1:NROW(data), times = seq_along(varying[[1]]), drop = NULL, direction, new.row.names = NULL, sep = ".", split = if (sep == "") { list(regexp = "[A-Za-z][0-9]", include = TRUE) } else { list(regexp = sep, include = FALSE, fixed = TRUE)} )
Arguments
- data
- a data frame
- varying
- names of sets of variables in the wide format that
correspond to single variables in long format
(time-varying). This is canonically a list of vectors of
variable names, but it can optionally be a matrix of names, or a
single vector of names. In each case, the names can be replaced by
indices which are interpreted as referring to
names(data)
. See Details for more details and options. - v.names
- names of variables in the long format that correspond to multiple variables in the wide format. See Details.
- timevar
- the variable in long format that differentiates multiple records from the same group or individual. If more than one record matches, the first will be taken (with a warning).
- idvar
- Names of one or more variables in long format that identify multiple records from the same group/individual. These variables may also be present in wide format.
- ids
- the values to use for a newly created
idvar
variable in long format. - times
- the values to use for a newly created
timevar
variable in long format. See Details. - drop
- a vector of names of variables to drop before reshaping.
- direction
- character string, partially matched to either
"wide"
to reshape to wide format, or"long"
to reshape to long format. - new.row.names
- character or
NULL
: a non-null value will be used for the row names of the result. - sep
- A character vector of length 1, indicating a separating
character in the variable names in the wide format. This is used for
guessing
v.names
andtimes
arguments based on the names invarying
. Ifsep == ""
, the split is just before the first numeral that follows an alphabetic character. This is also used to create variable names when reshaping to wide format. - split
- A list with three components,
regexp
,include
, and (optionally)fixed
. This allows an extended interface to variable name splitting. See Details.
Details
The arguments to this function are described in terms of longitudinal data, as that is the application motivating the functions. A wide longitudinal dataset will have one record for each individual with some time-constant variables that occupy single columns and some time-varying variables that occupy a column for each time point. In long format there will be multiple records for each individual, with some variables being constant across these records and others varying across the records. A long format dataset also needs a time variable identifying which time point each record comes from and an id variable showing which records refer to the same person.
If the data frame resulted from a previous reshape
then the
operation can be reversed simply by reshape(a)
. The
direction
argument is optional and the other arguments are
stored as attributes on the data frame.
If direction = "wide"
and no varying
or v.names
arguments are supplied it is assumed that all variables except
idvar
and timevar
are time-varying. They are all
expanded into multiple variables in wide format.
If direction = "long"
the varying
argument can be a vector
of column names (or a corresponding index). The function will attempt
to guess the v.names
and times
from these names. The
default is variable names like x.1
, x.2
, where
sep = "."
specifies to split at the dot and drop it from the
name. To have alphabetic followed by numeric times use sep = ""
.
Variable name splitting as described above is only attempted in the
case where varying
is an atomic vector, if it is a list or a
matrix, v.names
and times
will generally need to be
specified, although they will default to, respectively, the first
variable name in each set, and sequential times.
Also, guessing is not attempted if v.names
is given
explicitly. Notice that the order of variables in varying
is
like x.1
,y.1
,x.2
,y.2
.
The split
argument should not usually be necessary. The
split$regexp
component is passed to either
strsplit
or regexpr
, where the latter is
used if split$include
is TRUE
, in which case the
splitting occurs after the first character of the matched string. In
the strsplit
case, the separator is not included in the
result, and it is possible to specify fixed-string matching using
split$fixed
.
Value
-
The reshaped data frame with added attributes to simplify reshaping
back to the original form.
See Also
Examples
library(stats)
summary(Indometh)
wide <- reshape(Indometh, v.names = "conc", idvar = "Subject",
timevar = "time", direction = "wide")
wide
reshape(wide, direction = "long")
reshape(wide, idvar = "Subject", varying = list(2:12),
v.names = "conc", direction = "long")
## times need not be numeric
df <- data.frame(id = rep(1:4, rep(2,4)),
visit = I(rep(c("Before","After"), 4)),
x = rnorm(4), y = runif(4))
df
reshape(df, timevar = "visit", idvar = "id", direction = "wide")
## warns that y is really varying
reshape(df, timevar = "visit", idvar = "id", direction = "wide", v.names = "x")
## unbalanced 'long' data leads to NA fill in 'wide' form
df2 <- df[1:7, ]
df2
reshape(df2, timevar = "visit", idvar = "id", direction = "wide")
## Alternative regular expressions for guessing names
df3 <- data.frame(id = 1:4, age = c(40,50,60,50), dose1 = c(1,2,1,2),
dose2 = c(2,1,2,1), dose4 = c(3,3,3,3))
reshape(df3, direction = "long", varying = 3:5, sep = "")
## an example that isn't longitudinal data
state.x77 <- as.data.frame(state.x77)
long <- reshape(state.x77, idvar = "state", ids = row.names(state.x77),
times = names(state.x77), timevar = "Characteristic",
varying = list(names(state.x77)), direction = "long")
reshape(long, direction = "wide")
reshape(long, direction = "wide", new.row.names = unique(long$state))
## multiple id variables
df3 <- data.frame(school = rep(1:3, each = 4), class = rep(9:10, 6),
time = rep(c(1,1,2,2), 3), score = rnorm(12))
wide <- reshape(df3, idvar = c("school","class"), direction = "wide")
wide
## transform back
reshape(wide)