representr (version 0.1.1)

pp_weights: Get posterior weights for each record post record-linkage using posterior prototyping.

Description

Get posterior weights for each record post record-linkage using posterior prototyping.

Usage

pp_weights(
  data,
  posterior_linkage,
  rep_method,
  parallel = TRUE,
  cores = NULL,
  ...,
  scale = FALSE
)

Arguments

data

A data frame of records to be represented.

posterior_linkage

A matrix of size m x n, indicating the posterior cluster ids post-record linkage, each row represents the cluster assignment for each record in data for 1 iteration of the sampler.

rep_method

Which method to use for representation. Valid options include "proto_minimax" and "proto_random".

parallel

Logical flag if to use parallel computation or not (via foreach).

cores

If specified, the number of cores to use with foreach.

...

Additional parameters sent to cluster representation function. See minimax or random methods. If passing a probability to the random method, must be list of the same length as the number of iterations in lambda and within each must be a list of the same length as the number of clusters. Within each should be a vector of probabilities, the same length as the number of rows in the cluster prob[[iteration][[cluster]].

scale

If "proto_minimax" method is specified, logical flag to indicate if the column-type distance function should be scaled so that each distance takes value in [0, 1]. Defaults to FALSE.

Examples

Run this code
# NOT RUN {
data(rl_reg1)

# make a fake posterior distribution for the linkage
m <- 10
n <- nrow(rl_reg1)
post_link <- matrix(sample(seq_len(n), n*m, replace = TRUE), nrow = m)

# get the posterior prototyping weights
col_type <- c("string", "string", "numeric", "numeric", "numeric", "categorical", "ordinal",
    "numeric", "numeric")
orders <- list(education = c("Less than a high school diploma", "High school graduates, no college",
    "Some college or associate degree", "Bachelor's degree only", "Advanced degree"))
weights <- c(.25, .25, .05, .05, .1, .15, .05, .05, .05)
pp_weight <- pp_weights(rl_reg1, post_link, "proto_minimax", distance = dist_col_type,
    col_type = col_type, weights = weights, orders = orders, scale = TRUE, parallel = FALSE)

# threshold by posterior prototyping weights
head(rl_reg1[pp_weight > 0.5, ])

# }

Run the code above in your browser using DataCamp Workspace