Learn R Programming

⚠️There's a newer version (0.2.0) of this package.Take me there.

healthbR

Overview

healthbR provides easy access to Brazilian public health survey data directly from R. The package downloads, caches, and processes data from official sources, returning clean, analysis-ready tibbles following tidyverse conventions.

Currently supported data sources:

  • VIGITEL - Surveillance of Risk Factors for Chronic Diseases by Telephone Survey (Vigilância de Fatores de Risco e Proteção para Doenças Crônicas por Inquérito Telefônico)

Planned for future releases:

  • PNS (National Health Survey)
  • PNAD (National Household Sample Survey)
  • SIM (Mortality Information System)
  • SINASC (Live Birth Information System)
  • SIH (Hospital Information System)

Installation

You can install the development version of healthbR from GitHub:

# install.packages("pak")
pak::pak("SidneyBissoli/healthbR")

Usage

Check available years

library(healthbR)

# list available VIGITEL survey years
vigitel_years()
#> [1] 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020
#> [16] 2021 2022 2023

Download and load data

# load data for a single year
df <- vigitel_data(2023)

# load data for multiple years
df <- vigitel_data(2021:2023)

Explore variables

# list variables available in a specific year
vigitel_variables(2023)

# get the data dictionary with variable descriptions
dict <- vigitel_dictionary()

# search for specific variables
dict |>
  dplyr::filter(stringr::str_detect(variable_name, "peso"))

Survey analysis with srvyr

VIGITEL uses complex survey sampling. Use the pesorake weight variable for proper inference:

library(dplyr)
library(srvyr)

# create survey design
vigitel_svy <- df |>
  as_survey_design(weights = pesorake)

# calculate weighted prevalence
vigitel_svy |>
  group_by(cidade) |>
  summarize(
    prevalence = survey_mean(diab == 1, na.rm = TRUE),
    n = unweighted(n())
  )

Performance optimization

healthbR offers three strategies for handling large datasets efficiently:

1. Parquet conversion (recommended for repeated use)

Convert Excel files to Parquet format for 10-20x faster loading:

# convert downloaded files to parquet (one-time operation
vigitel_convert_to_parquet(2020:2023)

# subsequent loads are much faster
df <- vigitel_data(2020:2023)

2. Parallel downloads

Download multiple years simultaneously (requires optional packages):

# install optional packages for parallel processing
install.packages(c("furrr", "future"))

# uses furrr for parallel processing (2-4 workers)
df <- vigitel_data(2015:2023)

3. Lazy evaluation with Arrow

For very large datasets, use lazy evaluation to process data without loading everything into memory:

# returns an Arrow Dataset (not loaded into RAM)
df_lazy <- vigitel_data(2020:2023, lazy = TRUE)

# filter and select before collecting
result <- df_lazy |>
  dplyr::filter(cidade == 1) |>
  dplyr::select(q6, q8_anos, pesorake, diab, hart) |>
  dplyr::collect()

Data sources

All data is downloaded from official Brazilian Ministry of Health repositories:

Citation

If you use healthbR in your research, please cite it:

citation("healthbR")

Contributing

Contributions are welcome! Please open an issue to discuss proposed changes or submit a pull request.

Code of Conduct

Please note that the healthbR project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

License

MIT © Sidney da Silva Pereira Bissoli

Copy Link

Version

Install

install.packages('healthbR')

Version

0.1.1

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Sidney Bissoli

Last Published

February 4th, 2026

Functions in healthbR (0.1.1)

vigitel_cache_dir

Get VIGITEL cache directory
vigitel_clear_cache

Clear VIGITEL cache
utils

Utility Functions for healthbR
list_sources

List Available Data Sources
vigitel_cache_status

Get VIGITEL cache status
vigitel_data

Load VIGITEL microdata
vigitel_info

Get VIGITEL survey information
vigitel_data_single

Load single year of VIGITEL data
vigitel_excel_path

Get path to Excel file for a specific year
vigitel_convert_to_parquet

Convert Excel file to Parquet format
vigitel_base_url

Get VIGITEL base URL
vigitel_file_url

Build VIGITEL file URL for a specific year
vigitel_download_dictionary

Download VIGITEL data dictionary
check_arrow

Check arrow availability and stop with informative message
healthbR-package

healthbR: Access Brazilian Public Health Data
has_arrow

Check if arrow package is available
vigitel_years

List available VIGITEL survey years
vigitel_parse_years

Parse year argument
vigitel_parquet_path

Get path to Parquet file for a specific year
vigitel_variables

List VIGITEL variables
vigitel_download

Download VIGITEL microdata for a specific year
vigitel_dictionary

Get VIGITEL variable dictionary