Learn R Programming

TraMineR (version 1.8-9)

seqformat: Conversion between sequence formats

Description

Convert a sequence data set from one format to another.

Usage

seqformat(data, var=NULL, id=NULL,
         from, to, compressed=FALSE,
         nrep=NULL, tevent, stsep=NULL, covar=NULL,
         SPS.in=list(xfix="()", sdsep=","),
         SPS.out=list(xfix="()", sdsep=","),
         begin=NULL, end=NULL, status=NULL,
         process=TRUE, pdata=NULL, pvar=NULL,
         limit=100, overwrite=TRUE,
         fillblanks=NULL, tmin=NULL, tmax=NULL, nr="*")

Arguments

data
a data frame or matrix containing sequence data.
var
List of columns with the sequence data. Default is NULL, i.e., all columns. Sequences are assumed to be in compressed form (character strings) when there is a single column and in extended form otherwise.
id
Column containing the 'id' of the sequences. Mandatory with from="SPELL" in order to identify the spells of a same sequence.
from
Format of the input data. One of "STS", "SPS", "SPELL". If data is a sequence object, format is automatically set to "STS".
to
Format for output data. One of "STS", "SPS", "SRS", "DSS", "TSE".
compressed
Logical. Should "STS", "SPS" or "DSS" output be compressed into character strings? Ignored for other output formats.
nrep
Number of shifted replications for output in "SRS" format.
tevent
Transition definition matrix for converting to time-stamped-event ("TSE") format. Should be a matrix of size $d * d$ where $d$ is the number of distinct states appearing in the sequences. In this matrix, the cell $(i,j)$ lists the events asso
stsep
Separator character between successive elements in compressed (character strings) input data. If NULL (default value), the seqfcheck function is called for detecting automatically a separator
covar
When from="STS" or from="SPS", additional column names to be included as covariates in the output data frame. When to="SRS" the covariates are replicated across the shifted replicated rows. Default is NULL
SPS.in
List with the xfix= and sdsep= specifications for the state-duration couples in input data in SPS form. The first specification, xfix, specifies the prefix/suffix character (use a two-character string if
SPS.out
List with the xfix and sdsep specifications for output in SPS format. (see argument SPS.in above.)
nr
Symbol used for missing state in input "SPS" format which will be converted to NA in "STS" representation.
begin
When converting from SPELL, the column with the beginning position of the spell
end
When converting from SPELL, the column with the end position of the spell
status
When converting from SPELL, the column with the status
process
Logical: When converting from SPELL, should sequences be created on a process time axis? Default is TRUE. Set as FALSE for creating sequences on a calendar time axis.
pdata
When converting from SPELL and process=TRUE, either NULL, "auto" or the name of the data frame containing the individual 'birth' time, that is, the initial time from which the process time will be comput
pvar
When pdata is a data frame, a vector of two names or numbers, the first one specifying the column with the individual 'id', and the second one the 'birth' time.
limit
When converting from SPELL, size of the resulting data frame when creating age sequences (by default ranges from age 1 to age 100)
overwrite
When converting from SPELL, if overwrite is set to TRUE, the most recent episode overwrites the older one when they overlap each other. If set to FALSE, the most recent episode starts in case of overl
fillblanks
When converting from SPELL, if fillblanks is not NULL, gaps between episodes are filled with the fillblanks character value.
tmin
Integer. When converting from SPELL with process=FALSE, defines the starting time of the axis. If set as NULL, the minimum time is taken from the begin column in the data.
tmax
Integer. When converting from SPELL with process=FALSE, defines the ending time. If set as NULL, the value is guessed from the data (not so accurately!).

Value

  • A data frame

encoding

latin1

Details

The seqformat function is used to convert data from one format to another. The input data is first converted into the STS format and then converted to the output format. Depending on input and output formats, some information can be lost in the conversion process. The output is a matrix, NOT a sequence object to be passed to TraMineR functions for plotting and mining sequences (use the seqdef function for that). See Gabadinho et al. (2009) and Ritschard et al. (2009) for more details on longitudinal data formats and converting between them.

References

Gabadinho, A., G. Ritschard, M. Studer and N. S. M�ller (2009). Mining Sequence Data in R with the TraMineR package: A user's guide. Department of Econometrics and Laboratory of Demography, University of Geneva. Ritschard, G., A. Gabadinho, M. Studer and N. S. M�ller. Converting between various sequence representations. in Ras, Z. & Dardzinska, A. (ed.) Advances in Data Management, Springer, 2009, 223, 155-175.

See Also

seqdef

Examples

Run this code
## Converting sequences into SPS format
data(actcal)
actcal.SPS.A <- seqformat(actcal,13:24, from="STS", to="SPS")
head(actcal.SPS.A)

## SPS (compressed) format with no prefix/suffix "/" as state/duration separator
actcal.SPS.B <- seqformat(actcal,13:24,
	from="STS", to="SPS", compressed=TRUE,
	SPS.out=list(xfix="", sdsep="/"))
head(actcal.SPS.B)

## Converting sequences into DSS (compressed) format
actcal.DSS <- seqformat(actcal,13:24,
	from="STS", to="DSS", compressed=TRUE)
head(actcal.DSS)

Run the code above in your browser using DataLab