createMRGobject: Create a single object containing all necessary objects for multiResGrid functions

Description

Create a single object containing all necessary objects for multiResGrid functions

Prints MRG-objects

Usage

createMRGobject(
  ifg,
  ress = c(1, 5, 10, 20, 40) * 1000,
  geovar = c("GEO_LCT", "geometry"),
  srvNames = NULL,
  vars = NULL,
  weights = NULL,
  dummy = "RECORDS",
  mincount = 10,
  countFeatureOrTotal = "feature",
  nlarge = 2,
  plim = 0.85,
  verbose = FALSE,
  nclus = 1,
  clusType = NULL,
  domEstat = TRUE,
  consistencyCheck = FALSE,
  outfile = NULL,
  splitlim = 5e+07,
  checkDominance = TRUE,
  checkReliability = FALSE,
  userfun = NULL,
  strat = NULL,
  confrules = "individual",
  suppresslim = 0,
  sumsmall = FALSE,
  suppresslimSum = 0,
  reliabilitySplit = TRUE,
  pseudoreg = NULL,
  plotIntermediate = FALSE,
  addIntermediate = FALSE,
  locAdj = "LL",
  postProcess = TRUE,
  rounding = -1,
  remCols = TRUE,
  ...
)
# S3 method for MRG
print(x, ...)

Value

A list containing the necessary elements for further processing with the MRG-package, referred to as being of class MRG.

Arguments

ifg: Either a data.frame or tibble or sf-object with the locations and the data of the survey or census data, or a list of such objects.
ress: A vector with the different resolutions
geovar: Name of geodata variable in the objects. Must me the same for all of the surveys/censuses, if the data sets are not submitted as sf-objects
srvNames: Names for the different surveys or censuses if ifg is a list. Typically it could be survey years. Not necessary if ifg is a named list
vars: Variable(s) of interest that should be aggregated (necessary when ifg is used for individual farm specific anonymization rules)
weights: Extrapolation factor(s) (weights) wi of unit i in the sample of units nc falling into a specific cell c. Weights are used for disclosure control measures. A weight of 1 will be used if missing. If only one weight is given, it will be used for all variables. If the length is more than one, the length has to be equal to the number of variables. If the same weight is used for several variables, it must be repeated in the weights-vector
dummy: The name of a dummy variable for the number of records if a list is provided. Defaults to "RECORDS", but can be replaced by something more specific for particular usage, such as "HOLDING" for agricultural data
mincount: The minimum number of farms for a grid cell (threshold rule)
countFeatureOrTotal: Should the frequency limit be applied on records with a positive value for a certain feature, or on all records, independent of value of feature
nlarge: Parameter to be used if the nlarge(st) farms should count for maximum plim percent of the total value for the variable in the grid cell (see details of gridData)
plim: See nlarge
verbose: Indicates if some extra output should be printed. Usually TRUE/FALSE, but can also have a value of 2 for multiResGrid for even more output.
nclus: Number of clusters to use for parallel processing. No parallelization is used for nclus = 1.
clusType: The type of cluster; see makeCluster for more details. The default of makeCluster is used if type is missing or NA
domEstat: Should the dominance rule be applied as in the IFS handbook (TRUE), where the weights are rounded before finding the first nlarge contributors, or should it be the first nlarge contributors*weight, where also fractions are considered (FALSE)?
consistencyCheck: logical; whether consistency between the gridded values and the similar values from ifg should be checked. The gridded value is derived from rasterize and the second one from st_join. The two methods can in some cases treat border cases between grid cells differently.
outfile: File to direct the output in case of parallel processing, see makeCluster for more details.
splitlim: For large dataset - split the data set in batches of more or less splitlim size
checkDominance: Logical - should the dominance rule be applied?
checkReliability: Logical - should the prediction variance be checked, and used for the aggregation? This considerably increases computation time
userfun: This gives the possibility to add a user defined function with additional confidentiality rules which the grid cell has to pass, based on the individual records
strat: Column name defining the strata for stratified sampling, used if checkReliability is TRUE
confrules: Should the frequency rule (number of holdings) refer to the number of holdings with a value of the individual vars above zero ("individual") or the total number of holdings in the data set ("total")?
suppresslim: Parameter that can be used to avoid that almost empty grid cells are merged with cells with considerably higher number of observations. The value is a minimum share of the total potential new cell for a grid cell to be aggregated. See below for more details.
sumsmall: Logical; should the suppresslimSum value be applied on the sum of small grid cells within the lower resolution grid cell? Note that different combinations of suppreslim and suppreslimSum values might not give completely intuitive results.For instance, if both are equal, then a higher value can lead to more grid cells being left unaggregated for smaller grid sizes, leading to aggregation for a large grid cell
suppresslimSum: Parameter similar to suppreslim, but affecting the total of grid cells to be suppressed
reliabilitySplit: Logical or number - parameter to be used in calculation of the reliability (if checkReliability = TRUE). It can either give the number of groups, or if TRUE, it will create groups of approdcimately 50,000 records per group. If FALSE, the data set will not be split, independent on the size.
pseudoreg: A column with regions to be used to define pseudostrata if checkReliability is TRUE. This is used for the cases when one or more strata only has a single record (and the weight is different from one). This makes variance calculation impossible, so such strata are merged into a pseudostrata. If pseudoreg is given (for example a column with the country name, or NUTS2 region), the pseudostrata will be created separately for each pseudoreg region.
plotIntermediate: Logical or number - make a simple plot showing which grid cells have already passed the frequency rule. plotintermediate = TRUE, the function will wait 5 seconds after plotting before continuing, otherwise it will wait plotintermediate seconds.
addIntermediate: Logical; will add a list of all intermediate himgs and lohs (overlay of himg and the lower resolution grid) as an attribute to the object to be returned
locAdj: parameter to adjust the coordinates if they are exactly on the borders between grid cells. The values can either be FALSE, or "jitter" (adding a small random value to the coordinates, essentially spreading them randomly around the real location), "UR", "UL", "LR" or "LL", to describe which corner of the grid cell the location belong (upper right, upper left, lower right or lower left).
postProcess: Logical; should the postprocessing be done as part of creation of the multiresolution grid (TRUE), or be done in a separate step afterwards (FALSE). The second option is useful when wanting to check the confidential grid cells of the final map
rounding: either logical (FALSE) or an integer indicating the number of decimal places to be used. Negative values are allowed (such as the default value rounding to the closest 10). See also the details for digits in round.
remCols: Logical; Should intermediate columns be removed? Can be set to FALSE for further analyses. Temporary columns will not be removed if their names partly match the variable names of vars
...: Other parameters to underlying print functions
x: MRG-object, created by call to createMRGobject

Details

The function creates a single object, containing both the mapped data and the parameters for for further processing. This assures that all processing is done with the same variables.

Examples

Run this code

# \donttest{
library(sf)
library(dplyr)

# These are SYNTHETIC agricultural FSS data 
data(ifs_dk) # Census data

# Create spatial data
ifg = fssgeo(ifs_dk, locAdj = "LL")

ress = 1000*2^(1:7)
MRGobject = createMRGobject(ifg = ifg, ress = ress, var = "UAA")
# Run the adaptive grid function only with farm number as con, then plot results
himg1 = multiResGrid(MRGobject)

himg1 = multiResGrid(MRGobject)
# Parameters can be updated in the object or in the call to multiResGrid
MRGobject$suppresslim = 0.02
himg2 = multiResGrid(MRGobject)
himg3 = multiResGrid(MRGobject, suppresslim = 0.05)

# This examplifies how a list can be passed to the function, representing different
# survey years, which will then be used to create consistent grid cells for 
# the different survey years. The differences in this example are just some random 
# changes to farms and areas
ifg2020 = ifg
nd = dim(ifg2020)[1]
rmult = function(x) {x*runif(length(x), 0.95, 1.05)}
ifg2010 = ifg2020 %>% slice(sample(1:nd, floor(nd*0.9))) %>% 
                      mutate_at(c("UAA", "UAAXK0000_ORG"), rmult)
ifg2015 = ifg2020 %>% slice(sample(1:nd, floor(nd*0.95))) %>% 
                      mutate_at(c("UAA", "UAAXK0000_ORG"), rmult)
# srvNames are not necessary when the list is named as here, 
# but could be passed as srvNames = c(2010, 2015, 2020)
MRGobject2 = createMRGobject(ifg = list("2010" = ifg2010, "2015" = ifg2015, "2020" = ifg2020), 
          dummy = "HOLDING", vars = c("UAA", "UAAXK0000_ORG"), ress = ress)
himg4 = multiResGrid(MRGobject2)
# }

Run the code above in your browser using DataLab