Currently haven can read and write logical, integer, numeric, character
and factors. See
labelled() for how labelled variables in
Stata are handled in R.
Character vectors will be stored as
strL if any components are
strl_threshold bytes or longer (and
version >= 13); otherwise they will
be stored as the appropriate
read_dta( file, encoding = NULL, col_select = NULL, skip = 0, n_max = Inf, .name_repair = "unique" )
read_stata( file, encoding = NULL, col_select = NULL, skip = 0, n_max = Inf, .name_repair = "unique" )
write_dta( data, path, version = 14, label = attr(data, "label"), strl_threshold = 2045 )
Either a path to a file, a connection, or literal data (either a single string or a raw vector).
Files ending in
be automatically uncompressed. Files starting with
ftps:// will be automatically
downloaded. Remote gz files can also be automatically downloaded and
Literal data is most useful for examples and tests. To be recognised as
literal data, the input must be either wrapped with
I(), be a string
containing at least one new line, or be a vector containing at least one
string with a new line.
Using a value of
clipboard() will read from the system clipboard.
The character encoding used for the file. Generally, only needed for Stata 13 files and earlier. See Encoding section for details.
One or more selection expressions, like in
list() to use more than one expression.
?dplyr::select for details on available selection options. Only the
specified columns will be read from
Number of lines to skip before reading data.
Maximum number of lines to read.
Treatment of problematic column names:
"minimal": No name repair or checks, beyond basic existence,
"unique": Make sure names are unique and not empty,
"check_unique": (default value), no name repair, but check they are
"universal": Make the names
unique and syntactic
a function: apply custom name repair (e.g.,
.name_repair = make.names
for names in the style of base R).
A purrr-style anonymous function, see
This argument is passed on as
See there for more details on these terms and the strategies used
to enforce them.
Data frame to write.
Path to a file where the data will be written.
File version to use. Supports versions 8-15.
Dataset label to use, or
NULL. Defaults to the value stored in
the "label" attribute of
data. Must be <= 80 characters.
Any character vectors with a maximum length greater
strl_threshold bytes will be stored as a long string (strL) instead
of a standard string (str#) variable if
version >= 13. This defaults to
2045, the maximum length of str# variables. See the Stata long string
documentation for more details.
A tibble, data frame variant with nice defaults.
Variable labels are stored in the "label" attribute of each variable. It is not printed on the console, but the RStudio viewer will show it.
If a dataset label is defined in Stata, it will stored in the "label" attribute of the tibble.
write_dta() returns the input
Prior to Stata 14, files did not declare a text encoding, and the
default encoding differed across platforms. If
encoding = NULL,
haven assumes the encoding is windows-1252, the text encoding used by
Stata on Windows. Unfortunately Stata on Mac and Linux use a different
default encoding, "latin1". If you encounter an error such as
"Unable to convert string to the requested encoding", try
encoding = "latin1"
For Stata 14 and later, you should not need to manually specify
value unless the value was incorrectly recorded in the source file.