run.mig.mcmc: Run Markov chain Monte Carlo for parameters of net migration rate model

Description

Runs MCMCs for simulating the net migration rate of all countries of the world or for locations specified by users, using the Bayesian hierarchical model of Azose & Raftery (2015).

Usage

run.mig.mcmc(
  output.dir,
  nr.chains = 3,
  iter = 50000,
  thin = 1,
  replace.output = FALSE,
  annual = FALSE,
  start.year = 1950,
  present.year = 2020,
  wpp.year = 2019,
  my.mig.file = NULL,
  sigma.c.min = 1e-04,
  a.ini = NULL,
  a.half.width = NULL,
  mu.ini = NULL,
  exclude.from.world = NULL,
  pop.denom = 1,
  seed = NULL,
  parallel = FALSE,
  nr.nodes = nr.chains,
  buffer.size = 1000,
  verbose = TRUE,
  verbose.iter = 10,
  ...
)

Value

An object of class bayesMig.mcmc.set which is a list with two components:

meta: An object of class bayesMig.mcmc.meta. It contains information that is common to all chains. Most items are the same as in bayesTFR.mcmc.meta. In addition, mig.rates is a matrix of the observed migration rates with NAs in spots that were not used for estimation. mig.rates.all is a similar matrix but contains all data, regardless if used for estimation or not. Item user.data is a logical indicating if the migration rates are given by the user (TRUE) or are taken from a wpp package (FALSE).
mcmc.list: A list of objects of class bayesMig.mcmc, one for each MCMC. Information stored here is specific to each MCMC chain, similarly to bayesTFR.mcmc.

Arguments

output.dir: A file path pointing to the directory in which to store results.
nr.chains: An integer number of independent Markov chains to run.
iter: The number of iterations to run per Markov chain.
thin: Thinning interval. A chain with 1000 iterations thinned by 20 will return a final count of 50 iterations.
replace.output: If the specified output directory already exists, should it be overwritten?
annual: If TRUE, the model assumes the underlying data is on annual time scale.
start.year: Start year for using historical data.
present.year: End year for using historical data.
wpp.year: Year for which WPP data is used if no user data is provided via my.mig.file. In such a case, the function loads a package called wpp\(x\) where \(x\) is the wpp.year and generates historical migration rates using the migration and pop datasets.
my.mig.file: File name containing user-specified historical time series of migration rates for all locations that should be included in the simulation. It should be a tab-separated file. For structure, see Details below.
sigma.c.min, a.ini, mu.ini: Settings for the parameters of the model (see Azose & Raftery 2015), such as minimum value and initial values. Initial values (*.ini) can be given as a vector of length nr.chains, giving one initial value per chain. By default the initial values are equidistantly spread between their respective ranges.
a.half.width: Half width for Metropolis proposals of the a parameter. This argument can greatly influence the convergence and it is dependent on the scale of the data. By default it is set to 0.01 for 5-year data defined as rate per population; to 0.03 for 5-year data defined as per 1000; to 0.3 for annual data per population; to 0.5 for annual data per 1000. If the default does not yield satisfactory results, use the function estimate.a.hw to estimate an appropriate value, based on an existing simulation. Also it is important to set the pop.denom argument correctly.
exclude.from.world: Vector of location codes that should be excluded from estimating the hyperparameters. These would be for example small locations or locations with unusual patters. Note that location-specific parameters are generated for all locations, regardless of this setting.
pop.denom: Denominator used to generate the input migration rates. It is used to derive an appropriate scaler for the priors and conditional distributions. Typically, this will be either 1 (default) if the rates are defined as per population, or 1000, if the rates are per 1000 population. Use this argument only if user-specified rates are supplied via the my.mig.file argument.
seed: Seed of the random number generator. If NULL no seed is set. It can be used to generate reproducible results.
parallel: Whether to run code in parallel.
nr.nodes: Relevant only if parallel is TRUE. It gives the number of nodes for running the simulation in parallel. By default it equals to the number of chains.
buffer.size: Buffer size (in number of iterations) for keeping data in the memory before flushing to disk.
verbose: Whether or not to print status updates to console window while the code is running.
verbose.iter: If verbose is TRUE, the number of iterations to wait between printing updates.
...: Additional parameters to be passed to the function performParallel, if parallel is TRUE.

Details

The function creates an object of class bayesMig.mcmc.meta and stores it in output.dir. It launches nr.chains MCMCs, either sequentially or in parallel. Parameter traces of each chain are stored as ASCII files in a subdirectory of output.dir, called mcx where x is the identifier of that chain. There is one file per parameter, named after the parameter with the suffix “.txt”. Location-specific parameters have the suffix _countryc where c is the location code. In addition to the trace files, each mcx directory contains the object bayesMig.mcmc in binary format. All chain-specific files are written onto disk after the first, last and each \(i\)-th (thinned) iteration, where \(i\) is given by the argument buffer.size.

By default (if no data is passed via the my.mig.file argument), the function loads observed data (further denoted as WPP dataset), from the migration and pop datasets in the wpp\(x\) package where \(x\) is the wpp.year. Net migration rates are computed as migration(\(t\)) / (population(\(t_e\)) - migration(\(t\))) where \(t_e\) means the end of time period \(t\). For an annual simulation and wpp.year set to 2022, \(t = t_e\) because the population in year \(t\) is considered at the end of the year. If wpp.year is smaller than 2022 and annual is TRUE the default dataset is interpolated from 5-year data.

The argument my.mig.file can be used to overwrite the default data. It should be a tab-separated file. If it is used, it should contain net migration rates for all locations to be used in the simulation, as no WPP data is used in such a case. The structure of the file has the same format as the migration dataset, but the values should be rates (instead of counts). Use the argument pop.denom to define the scale of the denominator in these rates, i.e. if the rates are to be interpreted as per population (default) or some other scale. Each row in the my.mig.file file corresponds to a location. It does not have to be necessarily a country - it can be for example a subnational unit. It must contain columns “country_code” or “code” (unique identifier of the location), “name”, and columns representing 5-year time intervals (if annual is FALSE), e.g., “1995-2000”, “2000-2005” etc., or single years (if annual is TRUE). An example dataset of annual net migration rates for US states is included in the package, see example below.

Optionally, the my.mig.file can contain columns called “first.observed” and/or “last.observed”, containing for each location the year of the first and last observation, respectively. In such a case, any data before and after those time points will be ignored. Furthermore, the function mig.predict fills in the missing values after the last observation, using the median of the BHM procedure.

If there are countries or locations that should be excluded from influencing the hyperparameters, for example small countries or locations with unique migration patterns, their codes should be included in the argument exclude.from.world. These locations will still get their parameters simulated and thus, will be included in a projection. Alternatively if my.mig.file is used, these locations can be determined using an additional column, called “include_code”. Value 2 means the location is included in the BHM; value 1 means it's excluded but location-specific parameters are generated; value 0 means the location is ignored.

References

Azose, J. J., & Raftery, A. E. (2015). Bayesian probabilistic projection of international migration. Demography, 52(5), 1627-1650.

Examples

Run this code

# \donttest{
# Toy simulation for US states
us.mig.file <- file.path(find.package("bayesMig"), "extdata", "USmigrates.txt")
sim.dir <- tempfile()
m <- run.mig.mcmc(nr.chains = 3, iter = 100, thin = 1, my.mig.file = us.mig.file, 
        annual = TRUE, output.dir = sim.dir)
summary(m)
summary(m, "Washington")

mig.partraces.plot(m)
mig.partraces.cs.plot("California", m)

# later one can access the object from disk
m <- get.mig.mcmc(sim.dir)
 
unlink(sim.dir, recursive = TRUE)
# For a country-level simulation, see example in ?bayesMig. 
# }

Run the code above in your browser using DataLab