est_bilog: Run BILOG-MG in batch mode

Description

est_bilog runs BILOG-MG in batch mode or reads BILOG-MG output generated by BILOG-MG program. In the first case, this function requires BILOG-MG already installed on your computer under bilog_exe_folder directory.

In the latter case, where appropriate BILOG-MG files are present (i.e. "<analysis_name>.PAR", "<analysis_name>.PH1", "<analysis_name>.PH2" and "<analysis_name>.PH3" files exist) and overwrite = FALSE, there is no need for BILOG-MG program. This function can read BILOG-MG output without BILOG-MG program.

Usage

est_bilog(
  x = NULL,
  model = "3PL",
  target_dir = getwd(),
  analysis_name = "bilog_calibration",
  items = NULL,
  examinee_id_var = NULL,
  group_var = NULL,
  logistic = TRUE,
  num_of_alternatives = NULL,
  criterion = 0.01,
  num_of_quadrature = 81,
  max_em_cycles = 100,
  newton = 20,
  reference_group = NULL,
  fix = NULL,
  scoring_options = c("METHOD=1", "NOPRINT"),
  calib_options = c("NORMAL"),
  prior_ability = NULL,
  prior_ip = NULL,
  overwrite = FALSE,
  show_output_on_console = TRUE,
  bilog_exe_folder = file.path("C:/Program Files/BILOGMG")
)

Value

A list of following objects:

"ip": An Itempool-class object holding the item parameters. Please check whether model converged (using ...$converged) before interpreting/using ip. This element will not be created when model = "CTT".
"score": A data frame object that holds the number of item examinee has attempted (tried), the number of item examinee got right (right), the estimated scores of examinees (ability), the standard errors of ability estimates (se), and the probability of the response string (prob). This element will not be created when model = "CTT".
"ctt": The Classical Test Theory (CTT) stats such as p-value, biserial, point-biserial estimated by BILOG-MG. If there are groups, then the CTT statistics for groups can be found in ctt$group$GROUP-NAME. Overall statistics for the whole group is at ctt$overall.
"failed_items": A data frame consist of items that cannot be estimated.
"syntax": The syntax file.
"converged": A logical value indicating whether a model has been converged or not. If the value is TRUE, model has been converged. This element will not be created when model = "CTT".
"cycle": Number of cycles run before calibration converge or fail to converge.
"largest_change": Largest change between the last two cycles.
"neg_2_log_likelihood": -2 Log Likelihood value. This value is NULL, when model does not converge. This element will not be created when model = "CTT".
"input": A list object that stores the arguments that are passed to the function.

Arguments

x

Either a data.frame, matrix or Response_set-class object. When the data is not necessary, i.e. user only wants to read the BILOG-MG output from the target_dir, then this can be set to NULL.

model

The model of the items. The value is one of the following:

"1PL": One-parameter logistic model.

"2PL"

Two-parameter logistic model.

"3PL"

Three-parameter logistic model.

"CTT"

Return only Classical Test theory statistics such as p-values, point-biserial and biserial correlations.

The default value is "3PL".

target_dir

The directory/folder where the BILOG-MG analysis and data files will be saved. The default value is the current working directory, i.e. get_wd().

analysis_name

A short file name that will be used for the data files created for the analysis.

items

A vector of column names or numbers of the x that represents the responses. If, in the syntax file, no entry for item names are desired, then, simply write items = "none".

examinee_id_var

The column name or number that contains individual subject IDs. If none is provided (i.e. examinee_id_var = NULL), the program will check whether the data provided has row names.

group_var

The column name or number that contains group membership information if multi-group calibration is desired. Ideally, it grouping variable is represented by single digit integers. If other type of data provided, an integer value will automatically assigned to the variables. The default value is NULL, where no multi-group analysis will be performed.

logistic

A logical value. If TRUE, LOGISTIC keyword will be added to the BILOG-MG command file which means the calibration will assume the natural metric of the logistic response function in all calculations. If FALSE, the logit is multiplied by D = 1.7 to obtain the metric of the normal-ogive model. The default value is TRUE.

num_of_alternatives

An integer specifying the maximum number of response alternatives in the raw data. 1/num_of_alternatives is used by the analysis as automatic starting value for estimating the pseudo-guessing parameters.

The default value is NULL. In this case, for 3PL, 5 will be used and for 1PL and 2PL, 1000 will be used.

This value will be represented in BILOG-MG control file as: NALT = num_of_alternatives.

criterion

Convergence criterion for EM and Newton iterations. The default value is 0.01.

num_of_quadrature

The number of quadrature points in MML estimation. The default value is 81. This value will be represented in BILOG-MG control file as: NQPT = num_of_quadrature. The BILOG-MG default value is 20 if there are more than one group, 10 otherwise.

max_em_cycles

An integer (0, 1, ...) representing the maximum number of EM cycles. This value will be represented in BILOG-MG control file as: CYCLES = max_em_cycles. The default value is 100.

newton

An integer (0, 1, ...) representing the number of Gauss-Newton iterations following EM cycles. This value will be represented in BILOG-MG control file as: NEWTON = newton.

reference_group

Represent which group's ability distribution will be set to mean = 0 and standard deviation = 1. For example, if the value is 1, then the group whose code is 1 will have ability distribution with mean 0 and standard deviation 1. When groups are assumed to coming from a single population, set this value to 0.

The default value is NULL.

This value will be represented in BILOG-MG control file as: REFERENCE = reference_group.

fix

This arguments helps to specify whether the parameters of specific items are free to be estimated or are to be held fixed at their starting values. This argument accepts a data.frame with an item_id column in which items for which the item parameters will be held fixed; a, b, c parameter values. See, examples section for a demonstration.

scoring_options

A string vector of keywords/options that will be added to the SCORE section in BILOG-MG syntax. Set the value of scoring_options to NULL if scoring of individual examinees is not necessary.

The default value is c("METHOD=1", "NOPRINT") where scale scores will be estimated using Maximum Likelihood estimation and the scoring process will not be printed to the R console (if show_output_on_console = TRUE).

The main option to be added to this vector is "METHOD=n". Following options are available:

"METHOD=1": Maximum Likelihood (ML)

"METHOD=2"

Expected a Posteriori (EAP)

"METHOD=3"

Maximum a Posteriori (MAP)

In addition to "METHOD=n" keyword, following keywords can be added:

"NOPRINT": Suppresses the display of the scores on the R console.

"FIT": likelihood ratio chi-square goodness-of-fit statistic for each response pattern will be computed.

"NQPT=(list)", "IDIST=n", "PMN=(list)", "PSD=(list)", "RSCTYPE=n", "LOCATION=(list)", "SCALE=(list)", "INFO=n", "BIWEIGHT", "YCOMMON", "POP", "MOMENTS", "FILE", "READF", "REFERENCE=n", "NFORMS=n"

See BILOG-MG manual for more details about these keywords/options.

calib_options

A string vector of keywords/options that will be added to the CALIB section in BILOG-MG syntax in addition to the keywords NQPT, CYCLES, NEWTON, CRIT, REFERENCE.

The default value is c("NORMAL").

When "NORMAL" is included in calib_options, the prior distributions of ability in the population is assumed to have normal distribution.

When "COMMON" is included in calib_options, a common value for the lower asymptote for all items in the 3PL model will be estimated.

If items will be calibrated using "RASCH" model, set model = "Rasch", instead of adding "RASCH" keyword to calib_options.

Following keywords/options can be added to calib_options:

"PRINT=n", "IDIST=n", "PLOT=n", "DIAGNOSIS=n", "REFERENCE=n", "SELECT=(list)", "RIDGE=(a,b,c)", "ACCEL=n", "NSD=n", "COMMON", "EMPIRICAL", "NORMAL", "FIXED", "TPRIOR", "SPRIOR", "GPRIOR", "NOTPRIOR", "NOSPRIOR", "NOGPRIOR", "READPRIOR", "NOFLOAT", "FLOAT", "NOADJUST", "GROUP-PLOT", "NFULL", "CHI=(a,b)".

See BILOG-MG manual for more details about these keywords/options.

NOTE: Do not add any of the following keywords to calib_options since they will already be included:

NQPT, CYCLES, NEWTON, CRIT, REFERENCE

prior_ability

Prior ability is the quadrature points and weights of the discrete finite representations of the prior distribution for the groups. It should be a list in the following form:

list(<GROUP-NAME-1> = list(points = ...., weights = ...), <GROUP-NAME-2> = list(points = ...., weights = ...), ... )

GROUP-NAME-1 is the name of the first group, GROUP-NAME-2 is the name of the second group, etc.

See examples section for an example implementation.

prior_ip

Specify priors distributions for item parameters. The default value is NULL, where BILOG-MG defaults will be used. In order to specify priors, a list of one or more of the following elements needs to be provided:

"ALPHA": "'alpha' parameters for the beta prior distribution of lower asymptote (guessing) parameters"

"BETA"

"'beta' parameters for the beta prior distribution of lower asymptote (guessing) parameters."

"SMU"

prior means for slope parameters

"SSIGMA"

prior standard deviations for slope parameters

"TMU"

prior means for threshold parameters

"TSIGMA"

prior standard deviations for threshold parameters

Quoted descriptions were taken from BILOG-MG manual.

Here are couple examples: list(ALPHA = 4, BETA = 3, SMU = 1, SSIGMA = 1.648, TMU = 0, TSIGMA = 2)

A very strong prior for guessing which almost fixes all guessing parameters at 0.2:

list(ALPHA = 1000000, BETA = 4000000)

Fix guessing at 0.25: list(ALPHA = 1000000, BETA = 3000000)

More generally, one can play with the alpha and beta parameters to obtain desired number considering the mode of beta distribution is:

$$mode = \frac{\alpha - 1}{\alpha + \beta - 2}$$

Also, one can set SSIGMA or TSIGMA to a very small value to effectively fix the item parameters, for example set TSIGMA = 0.005 or SSIGMA = 0.001 to effectively fix those item parameters. Note that there might be convergence issues with these restrictions.

Note that a non-null prior_ip value will automatically add READPRIOR option to CALIB section.

overwrite

If TRUE and there are already a BILOG-MG analysis files in the target path with the same name, these file will be overwritten.

show_output_on_console

logical (not NA), indicates whether to capture the output of the command and show it on the R console. The default value is TRUE.

bilog_exe_folder

The location of the "blm1.exe", "blm2.exe" and "blm3.exe" files. The default location is file.path("C:/Program Files/BILOGMG").

Author

Emre Gonulates