censuspyrID
Explorer of Indonesian Population Pyramids from Harmonized and Non-Harmonized Census Data
Overview
censuspyrID R package provides harmonized and non-harmonized population pyramid datasets
from the Indonesian population censuses (1971–2020).
It includes functions for loading, filtering, and visualizing population data,
as well as an interactive Shiny application for exploring demographic structures
across provinces and census years.
The harmonized dataset (hpop5) uses consistent province codes across censuses
to enable long-term trend analysis despite administrative expansion
(pemekaran), while the non-harmonized dataset (ypop5) retains original province
codes as published in each census year.
Data sources include:
- IPUMS International (1971–2010) — IPUMS International
- Population Census 2020 (BPS Indonesia) — BPS Census Portal
Both datasets were processed with steps including aggregation into 5-year age groups, prorate adjustment (redistribution) for missing attributes, and demographic smoothing (Arriaga and Karup–King–Newton methods). See Shyrock & Siegel (1976) for prorate adjusment and Aburto et al. (2022) for demographic smoothing methods.
Installation & Run
You can install the development version of censuspyrID from GitHub using remotes:
# install remotes if not already installed
install.packages("remotes")
# install censuspyrID from GitHub
remotes::install_github("aripurwantosp/censuspyrID")The core feature of this package is the censuspyrID_explorer(), an interactive Shiny application that allows you to visually explore population pyramids, age profiles, and demographic trends across different provinces, census years, and smoothing methods., from R do
# load package
library(censuspyrID)
# launch the interactive application
censuspyrID_explorer()
The censuspyrID_explorer() function launches the application in your default web browser. See the Help menu within the application for a detailed navigation guide.
Functions
Besides the interactive Shiny application, censuspyrID also provides several functions that you can use directly in your R scripts for data processing and visualization:
| Function | Description |
|---|---|
load_pop_data() | Loads the main population datasets (hpop5 or ypop5) with optional demographic smoothing applied (Arriaga or Karup–King–Newton). |
pop_data_by_year() | Filters the data for a specific census year. |
pop_data_by_reg() | Filters the data for a specific province ID. |
pyr_single() | Creates a single population pyramid plot for a given region and year. |
pyr_trends() | Generates trend plots of population pyramids over multiple census years. |
area_trends() | Plots population proportions across three broad age groups (young, working-age, old) over time. |
pop_summary() | Prints a formatted statistical summary, including sex ratio and dependency ratios. |
Example Usage
library(censuspyrID)
# Load harmonized population data with Arriaga smoothing
pop_data <- load_pop_data(harmonized = TRUE, smoothing = "arriaga")
# Filter data for the 2020 census
pop_2020 <- pop_data_by_year(pop_data, year = 2020)
# Filter data for a specific province (e.g., DKI Jakarta, province_id = 31)
pop_jakarta <- pop_data_by_reg(pop_2020, reg = 31)
# Create a single population pyramid for Jakarta in 2020
pyr_single(pop_jakarta, reg_code = 31, year = 2020)
# Generate pyramid trends for Jakarta across all census years
pyr_trends(pop_data, reg_code = 31)
# Plot age-structure trends (0-14, 15-64, 65+) for Jakarta
area_trends(pop_data, reg_code = 31)
# Print a summary with sex ratio and dependency ratios
pop_summary(pop_jakarta)