est_bilog: Item Calibration via BILOG-MG

Description

The function est_bilog facilitates item calibration through BILOG-MG. It offers two modes of operation: executing BILOG-MG in batch mode or processing pre-generated BILOG-MG output files. When using the former, ensure BILOG-MG is installed in the directory specified by bilog_exe_folder.

In the latter case, if the necessary BILOG-MG files (e.g., "<analysis_name>.PAR", "<analysis_name>.PH1", etc.) exist and overwrite = FALSE, there is no need for the BILOG-MG program itself. This function is capable of parsing BILOG-MG output without it.

Both BILOG-MG 3.0 and BILOG-MG 4.0 are supported. Refer to the bilog_exe_folder argument for guidance on selecting the desired version.

Usage

est_bilog(
  x = NULL,
  model = "3PL",
  target_dir = getwd(),
  analysis_name = "bilog_calibration",
  items = NULL,
  examinee_id_var = NULL,
  group_var = NULL,
  logistic = TRUE,
  num_of_alternatives = NULL,
  criterion = 0.01,
  num_of_quadrature = 81,
  max_em_cycles = 100,
  newton = 20,
  reference_group = NULL,
  fix = NULL,
  scoring_options = c("METHOD=1", "NOPRINT"),
  calib_options = c("NORMAL"),
  prior_ability = NULL,
  prior_ip = NULL,
  overwrite = FALSE,
  show_output_on_console = TRUE,
  bilog_exe_folder = file.path("C:/Program Files/BILOGMG")
)

Value

A list with following elements is returned:

A list with the following elements is returned:

"ip": An Itempool-class object holding the item parameters. Check ...$converged to ensure the model has converged before using ip. This element is not created when model = "CTT".
"score": A data frame object containing information on examinee scores such as items attempted (tried), items answered correctly (right), estimated examinee scores (ability), standard errors of ability estimates (se), and response string probabilities (prob). This element is not created when model = "CTT".
"ctt": Classical Test Theory (CTT) statistics, including p-values, biserial, and point-biserial estimates calculated by BILOG-MG. If there are groups, group-specific CTT statistics can be found in ctt$group$GROUP-NAME. Overall statistics for the entire group are located at ctt$overall.
"failed_items": A data frame containing items that could not be estimated.
"syntax": The syntax file.
"em_cycles": E-M Cycles of the calibration.
"newton_cycles": Newton Cycles of the calibration
"cycle": The number of cycles run before calibration converges or fails to converge.
"largest_change": The largest change observed between the last two cycles.
"neg_2_log_likelihood": -2 Log Likelihood value of the last step of the E-M cycles. See also $em_cycles. This value is NULL when the model does not converge. This element is not created when model = "CTT".
"posterior_dist": Posterior quadrature points and weights.
"input": A list object that stores the arguments passed to the function.

Arguments

x

Either a data.frame, matrix, or a Response_set-class object. Set this to NULL if you only intend to read BILOG-MG output from target_dir.

model

Specifies the item model. Options include:

"1PL": One-parameter logistic model.

"2PL"

Two-parameter logistic model.

"3PL"

Three-parameter logistic model.

"CTT"

Return only Classical Test theory statistics such as p-values, point-biserial and biserial correlations.

The default is "3PL".

target_dir

The directory where BILOG-MG analysis and data files will be stored. The default is the current working directory (i.e., get_wd()).

analysis_name

A concise filename (without extension) used for the data files created for the analysis.

items

A vector of column names or numbers in x representing the responses. If no entry for item names is desired in the syntax file, set items = "none".

examinee_id_var

The column name or number containing individual subject IDs. If not provided (i.e., examinee_id_var = NULL), the program will check whether the data provided has row names and use them as subject IDs.

group_var

The column name or number containing group membership information for multi-group calibration. Ideally, the grouping variable should be represented by single-digit integers. If other data types are provided, integer values will be automatically assigned to the variables. The default is NULL, indicating no multi-group analysis will be performed.

logistic

A logical value indicating whether to use logistic calibration.

If TRUE, the calibration assumes the natural metric of the logistic response function in all calculations.
If FALSE, the logit is multiplied by a factor of 1.7 to obtain the metric of the normal-ogive model.

The default value is TRUE.

num_of_alternatives

An integer specifying the maximum number of response alternatives in the raw data. This value is used as an automatic starting value for estimating pseudo-guessing parameters.

The default value is NULL. For 3PL, the default is 5, and for 1PL and 2PL, it's 1000. This value will be represented in the BILOG-MG control file as: NALT = num_of_alternatives.

criterion

The convergence criterion for EM and Newton iterations. The default value is 0.01.

num_of_quadrature

The number of quadrature points used in MML estimation. The default value is 81. This value will be represented in the BILOG-MG control file as: NQPT = num_of_quadrature. If there are more than one group, the BILOG-MG default value is 20; otherwise, it's 10.

max_em_cycles

An integer (0, 1, ...) representing the maximum number of EM cycles. This value will be represented in the BILOG-MG control file as: CYCLES = max_em_cycles. The default value is 100.

newton

An integer (0, 1, ...) representing the number of Gauss-Newton iterations following EM cycles. This value will be represented in the BILOG-MG control file as: NEWTON = newton.

reference_group

A value indicating which group's ability distribution will be set to mean = 0 and standard deviation = 1. For example, if the group_var has values 1 and 2 representing two different groups, setting reference_group = 2 will result in the group with code 2 having an ability distribution with mean 0 and standard deviation 1.

When groups are assumed to come from a single population, set this value to 0.

The default value is `NULL`.

This value will be represented in the BILOG-MG control file as: `REFERENCE = reference_group`.

fix

Specifies whether the parameters of specific items are free to be estimated or should be held fixed at their starting values. This argument accepts a data.frame with an item_id column, in which items for which the item parameters will be held fixed; a, b, c parameter values. See the examples section for a demonstration.

scoring_options

A string vector of keywords/options to be included in the SCORE section of the BILOG-MG syntax. If scoring individual examinees is not needed, set this to NULL.

The default value is c("METHOD=1", "NOPRINT"), where scale scores are estimated using Maximum Likelihood estimation and the scoring process is not printed to the R console (if show_output_on_console = TRUE).

The primary option to add to this vector is "METHOD=n". The available options are:

"METHOD=1": Maximum Likelihood (ML)

"METHOD=2"

Expected a Posteriori (EAP)

"METHOD=3"

Maximum a Posteriori (MAP)

Additionally, you can include the following keywords:

"NOPRINT": Suppresses the display of scores on the R console.

"FIT": Computes the likelihood ratio chi-square goodness-of-fit statistic for each response pattern.

"NQPT=(list)", "IDIST=n", "PMN=(list)", "PSD=(list)", "RSCTYPE=n", "LOCATION=(list)", "SCALE=(list)", "INFO=n", "BIWEIGHT", "YCOMMON", "POP", "MOMENTS", "FILE", "READF", "REFERENCE=n", "NFORMS=n"

Refer to the BILOG-MG manual for detailed explanations of these keywords/options.

calib_options

A string vector of additional keywords/options for the CALIB section in the BILOG-MG syntax. This is in addition to the keywords NQPT, CYCLES, NEWTON, CRIT, and REFERENCE.

The default value is c("NORMAL").

Including "NORMAL" in calib_options assumes that the prior distributions of ability in the population follow a normal distribution.

Including "COMMON" estimates a common value for the lower asymptote for all items in the 3PL model.

If you're calibrating items using the "RASCH" model, set the argument model = "Rasch" instead of adding "RASCH" to calib_options.

Additional keywords/options that can be added to calib_options include:

- "PRINT=n" - "IDIST=n" - "PLOT=n" - "DIAGNOSIS=n" - "REFERENCE=n" - "SELECT=(list)" - "RIDGE=(a,b,c)" - "ACCEL=n" - "NSD=n" - "EMPIRICAL" - "FIXED" - "TPRIOR" - "SPRIOR" - "GPRIOR" - "NOTPRIOR" - "NOSPRIOR" - "NOGPRIOR" - "READPRIOR" - "NOFLOAT" - "FLOAT" - "NOADJUST" - "GROUP-PLOT" - "NFULL" - "CHI=(a,b)".

Refer to the BILOG-MG manual for detailed explanations of these keywords/options.

NOTE: Do not add the following keywords to calib_options as they are already included in other arguments: NQPT, CYCLES, NEWTON, CRIT, REFERENCE.

prior_ability

Prior ability refers to the quadrature points and weights representing the discrete finite distribution of ability for the groups. It should be structured as a list in the following format:

list(<GROUP-NAME-1> = list(points = ...., weights = ...), <GROUP-NAME-2> = list(points = ...., weights = ...), ... )

Here, <GROUP-NAME-1> refers to the name of the first group, <GROUP-NAME-2> refers to the name of the second group, and so on.

Please refer to the examples section for a practical implementation.

prior_ip

Specify prior distributions for item parameters. The default value is NULL, in which case BILOG-MG defaults will be used. To specify priors, provide a list containing one or more of the following elements:

"ALPHA": "'alpha' parameters for the beta prior distribution of lower asymptote (guessing) parameters"

"BETA"

"'beta' parameters for the beta prior distribution of lower asymptote (guessing) parameters."

"SMU"

prior means for slope parameters

"SSIGMA"

prior standard deviations for slope parameters

"TMU"

prior means for threshold parameters

"TSIGMA"

prior standard deviations for threshold parameters

Quoted descriptions were taken from the BILOG-MG manual.

Examples:

A specific set of priors: list(ALPHA = 4, BETA = 3, SMU = 1, SSIGMA = 1.648, TMU = 0, TSIGMA = 2)
A very strong prior for guessing which almost fixes all guessing parameters at 0.2: list(ALPHA = 1000000, BETA = 4000000)
Fix guessing at 0.25: list(ALPHA = 1000000, BETA = 3000000)

In general, one can adjust the alpha and beta parameters to achieve a desired outcome, considering that the mode of the beta distribution is calculated as:

$$mode = \frac{\alpha - 1}{\alpha + \beta - 2}$$

Additionally, setting SSIGMA or TSIGMA to a very small value effectively fixes the item parameters. For example, TSIGMA = 0.005 or SSIGMA = 0.001. Be aware that this may lead to convergence issues.

Note: A non-null prior_ip value will automatically add the READPRIOR option to the CALIB section.

overwrite

If set to TRUE, any existing BILOG-MG analysis files with the same name in the target path will be overwritten.

show_output_on_console

A logical value indicating whether to capture and display the output of the command on the R console. The default is TRUE.

bilog_exe_folder

The directory containing the Bilog-MG executable files. This function supports two versions: BILOG-MG 3 and BILOG-MG 4. For BILOG-MG version 3, the directory should include the files "blm1.exe", "blm2.exe", and "blm3.exe". The default location for version 3 is file.path("C:/Program Files/BILOGMG"). If you have version 4 installed, the argument should point to the directory where "BLM64.exe" is located, which is typically "C:/Program Files/BILOG-MG/x64".

Author

Emre Gonulates