Define the parameters of an IPUMS NHGIS extract request to be submitted via the IPUMS API.
Use get_metadata_nhgis()
to browse and identify data sources for use
in NHGIS extract definitions. For general information, see the NHGIS
data source overview and the
FAQ.
Learn more about the IPUMS API in vignette("ipums-api")
and
NHGIS extract definitions in vignette("ipums-api-nhgis")
.
define_extract_nhgis(
description = "",
datasets = NULL,
time_series_tables = NULL,
shapefiles = NULL,
geographic_extents = NULL,
breakdown_and_data_type_layout = NULL,
tst_layout = NULL,
data_format = NULL
)
An object of class nhgis_extract
containing
the extract definition.
Description of the extract.
List of dataset specifications for any
datasets
to include in the extract request. Use ds_spec()
to create a
ds_spec
object containing a dataset specification. See examples.
List of time series table specifications for any
time series tables
to include in the extract request. Use tst_spec()
to create a
tst_spec
object containing a time series table specification. See
examples.
Names of any shapefiles to include in the extract request.
Vector of geographic extents to use for
all of the datasets
in the extract definition (for instance, to obtain
data within a particular state). Use "*"
to select all available extents.
Required when any of the datasets
included in the extract definition
include geog_levels
that require extent selection. See
get_metadata_nhgis()
to determine if a geographic level requires extent
selection. At the time of writing, NHGIS supports extent selection only
for blocks and block groups.
The desired layout
of any datasets
that have multiple data types or breakdown values.
"single_file"
(default) keeps all data types and breakdown values in
one file
"separate_files"
splits each data type or breakdown value into its
own file
Required if any datasets
included in the extract definition consist of
multiple data types (for instance, estimates and margins of error) or have
multiple breakdown values specified. See get_metadata_nhgis()
to
determine whether a requested dataset has multiple data types.
The desired layout of all time_series_tables
included in
the extract definition.
"time_by_column_layout"
(wide format, default): rows correspond to
geographic units, columns correspond to different times in the time
series
"time_by_row_layout"
(long format): rows correspond to a single
geographic unit at a single point in time
"time_by_file_layout"
: data for different times are provided in
separate files
Required when an extract definition includes any time_series_tables
.
The desired format of the extract data file.
"csv_no_header"
(default) includes only a minimal header in the first
row
"csv_header"
includes a second, more descriptive header row.
"fixed_width"
provides data in a fixed width format
Note that by default, read_nhgis()
removes the additional header row in
"csv_header"
files.
Required when an extract definition includes any datasets
or
time_series_tables
.
An NHGIS extract definition must include at least one dataset, time series table, or shapefile specification.
Create an NHGIS dataset specification with ds_spec()
. Each dataset
must be associated with a selection of data_tables
and geog_levels
. Some
datasets also support the selection of years
and breakdown_values
.
Create an NHGIS time series table specification with tst_spec()
. Each time
series table must be associated with a selection of geog_levels
and
may optionally be associated with a selection of years
.
See examples or vignette("ipums-api-nhgis")
for more details about
specifying datasets and time series tables in an NHGIS extract definition.
get_metadata_nhgis()
to find data to include in an extract definition.
submit_extract()
to submit an extract request for processing.
save_extract_as_json()
and define_extract_from_json()
to share an
extract definition.
# Extract definition for tables from an NHGIS dataset
# Use `ds_spec()` to create an NHGIS dataset specification
nhgis_extract <- define_extract_nhgis(
description = "Example NHGIS extract",
datasets = ds_spec(
"1990_STF3",
data_tables = "NP57",
geog_levels = c("county", "tract")
)
)
nhgis_extract
# Use `tst_spec()` to create an NHGIS time series table specification
define_extract_nhgis(
description = "Example NHGIS extract",
time_series_tables = tst_spec("CL8", geog_levels = "county"),
tst_layout = "time_by_row_layout"
)
# To request multiple datasets, provide a list of `ds_spec` objects
define_extract_nhgis(
description = "Extract definition with multiple datasets",
datasets = list(
ds_spec("2014_2018_ACS5a", "B01001", c("state", "county")),
ds_spec("2015_2019_ACS5a", "B01001", c("state", "county"))
)
)
# If you need to specify the same table or geographic level for
# many datasets, you may want to make a set of datasets before defining
# your extract request:
dataset_names <- c("2014_2018_ACS5a", "2015_2019_ACS5a")
dataset_spec <- purrr::map(
dataset_names,
~ ds_spec(
.x,
data_tables = "B01001",
geog_levels = c("state", "county")
)
)
define_extract_nhgis(
description = "Extract definition with multiple datasets",
datasets = dataset_spec
)
# You can request datasets, time series tables, and shapefiles in the same
# definition:
define_extract_nhgis(
description = "Extract with datasets and time series tables",
datasets = ds_spec("1990_STF1", c("NP1", "NP2"), "county"),
time_series_tables = tst_spec("CL6", "state"),
shapefiles = "us_county_1990_tl2008"
)
# Geographic extents are applied to all datasets in the definition
define_extract_nhgis(
description = "Extent selection",
datasets = list(
ds_spec("2018_2022_ACS5a", "B01001", "blck_grp"),
ds_spec("2017_2021_ACS5a", "B01001", "blck_grp")
),
geographic_extents = c("010", "050")
)
# Extract specifications can be indexed by name
names(nhgis_extract$datasets)
nhgis_extract$datasets[["1990_STF3"]]
if (FALSE) {
# Use the extract definition to submit an extract request to the API
submit_extract(nhgis_extract)
}
Run the code above in your browser using DataLab