Downloads (if necessary) and loads VIGITEL survey microdata into R. Data is automatically converted to Parquet format for faster subsequent loading. The data includes survey weights for proper statistical analysis.
vigitel_data(
year,
vars = NULL,
force_download = FALSE,
parallel = TRUE,
lazy = FALSE,
cache_dir = NULL
)A tibble with the VIGITEL microdata. When multiple years are
requested, a year column is added to identify the source year.
If lazy = TRUE, returns an Arrow Dataset that can be queried
with dplyr verbs before calling collect().
Year(s) of the survey. Can be:
Single year: 2023
Range: 2021:2023
Vector: c(2021, 2023)
Character: c("2021", "2023")
All years: "all"
Character vector. Variable names to select, or NULL for all variables. Default is NULL.
Logical. If TRUE, re-download and reconvert data. Default is FALSE.
Logical. If TRUE, download and process multiple years in parallel. Default is TRUE when multiple years are requested.
Logical. If TRUE, return an Arrow Dataset for lazy evaluation
instead of loading all data into memory. Useful for filtering large
datasets before collecting. Use collect() to retrieve results.
Default is FALSE.
Character. Optional custom cache directory. If NULL (default),
uses the standard user cache directory. Use tempdir() for temporary
storage that won't persist.
On first access, data is downloaded from the Ministry of Health and converted to Parquet format. Subsequent loads read directly from the Parquet file, which is significantly faster.
The arrow package is required for Parquet file support. If not
installed, an informative error message will be shown with installation
instructions.
For parallel downloads, the function uses the furrr and future
packages if installed. Install them with install.packages(c("furrr", "future"))
to enable parallel processing. The number of workers is automatically set
based on available CPU cores. If these packages are not installed, processing
falls back to sequential mode.
When lazy = TRUE, the function returns an Arrow Dataset that supports
dplyr operations (filter, select, mutate, etc.) without loading data into
memory. This is useful for working with large datasets or when you only
need a subset of the data. Call collect() to retrieve the results
as a tibble.
The VIGITEL survey uses complex sampling weights. For proper statistical
analysis, use survey packages like survey or srvyr.
The weight variable is named pesorake.
# \donttest{
# single year (uses tempdir to avoid leaving files on system)
df <- vigitel_data(2023, cache_dir = tempdir())
# specific variables
df <- vigitel_data(2023, vars = c("cidade", "sexo", "idade", "pesorake"),
cache_dir = tempdir())
# }
Run the code above in your browser using DataLab