Learn R Programming

ipumsr (version 0.9.0)

read_ihgis_codebook: Read metadata from an IHGIS extract's codebook files

Description

[Experimental]

Read the variable metadata contained in an IHGIS extract into an ipums_ddi object.

Because IHGIS variable metadata do not adhere to all the standards of microdata DDI files, some of the ipums_ddi fields will not be populated.

This function is marked as experimental while we determine whether there may be a more robust way to standardize codebook reading across IPUMS aggregate data collections.

Usage

read_ihgis_codebook(cb_file, tbls_file = NULL, raw = FALSE)

Value

If raw = FALSE, an ipums_ddi object with metadata about the variables contained in the data for the extract associated with the given cb_file.

If raw = TRUE, a character vector with one element for each line of the given cb_file.

Arguments

cb_file

Path to a .zip archive containing an IHGIS extract, an IHGIS data dictionary (_datadict.csv) file, or an IHGIS codebook (.txt) file.

tbls_file

If cb_file is the path to an IHGIS data dictionary .csv file, path to the _tables.csv metadata file from the same IHGIS extract. If these files are in the same directory, this file will be automatically loaded. If you have moved this file, provide the path to it here.

raw

If TRUE return a character vector containing the lines of cb_file rather than an ipums_ddi object. Defaults to FALSE.

If TRUE, cb_file must be a .zip archive or a .txt codebook file.

Details

IHGIS extracts store variable and geographic metadata in multiple files:

  • _datadict.csv contains the data dictionary with metadata about the variables included across all files in the extract.

  • _tables.csv contains metadata about all IHGIS tables included in the extract.

  • _geog.csv contains metadata about the tabulation geographies included for any tables in the extract.

  • _codebook.txt contains table and variable metadata in human readable form and contains citation information for IHGIS data.

By default, read_ihgis_codebook() uses information from all these files and assumes they exist in the provided extract (.zip) file or directory. If you have unzipped your IHGIS extract and moved the _tables.csv file, you will need to provide the path to that file in the tbls_file argument. Certain variable metadata can still be loaded without the _geog.csv or _codebook.txt files. However, if raw = TRUE, the _codebook.txt file must be present in the .zip archive or provided to cb_file.

If you no longer have access to these files, consider resubmitting the extract request that produced the data.

Note that IHGIS codebooks contain metadata for all the datasets contained in a given extract. Individual data files from the extract may not contain all of the variables shown in the output of read_ihgis_codebook().

Examples

Run this code
ihgis_file <- ipums_example("ihgis0014.zip")

ihgis_cb <- read_ihgis_codebook(ihgis_file)

# Variable labels and descriptions
ihgis_cb$var_info

# Citation information
ihgis_cb$conditions

# If variable metadata have been lost from a data source, reattach from
# the corresponding `ipums_ddi` object:
ihgis_data <- read_ipums_agg(
  ihgis_file,
  file_select = matches("AAA_g0"),
  verbose = FALSE
)

ihgis_data <- zap_ipums_attributes(ihgis_data)
ipums_var_label(ihgis_data$AAA001)

ihgis_data <- set_ipums_var_attributes(ihgis_data, ihgis_cb)
ipums_var_label(ihgis_data$AAA001)

# Load in raw format
ihgis_cb_raw <- read_ihgis_codebook(ihgis_file, raw = TRUE)

# Use `cat()` to display in the R console in human readable format
cat(ihgis_cb_raw[1:21], sep = "\n")

Run the code above in your browser using DataLab