The function est_bilog
facilitates item calibration through BILOG-MG.
It offers two modes of operation: executing BILOG-MG in batch mode or
processing pre-generated BILOG-MG output files. When using the former, ensure
BILOG-MG is installed in the directory specified by bilog_exe_folder
.
In the latter case, if the necessary BILOG-MG files (e.g.,
"<analysis_name>.PAR", "<analysis_name>.PH1", etc.) exist and overwrite
= FALSE
, there is no need for the BILOG-MG program itself. This function is
capable of parsing BILOG-MG output without it.
Both BILOG-MG 3.0 and BILOG-MG 4.0 are supported. Refer to the
bilog_exe_folder
argument for guidance on selecting the desired
version.
est_bilog(
x = NULL,
model = "3PL",
target_dir = getwd(),
analysis_name = "bilog_calibration",
items = NULL,
examinee_id_var = NULL,
group_var = NULL,
logistic = TRUE,
num_of_alternatives = NULL,
criterion = 0.01,
num_of_quadrature = 81,
max_em_cycles = 100,
newton = 20,
reference_group = NULL,
fix = NULL,
scoring_options = c("METHOD=1", "NOPRINT"),
calib_options = c("NORMAL"),
prior_ability = NULL,
prior_ip = NULL,
overwrite = FALSE,
show_output_on_console = TRUE,
bilog_exe_folder = file.path("C:/Program Files/BILOGMG")
)
A list
with following elements is returned:
A list
with the following elements is returned:
An Itempool-class
object holding the item
parameters. Check ...$converged
to ensure the model has
converged before using ip
. This element is not created when
model = "CTT"
.
A data frame object containing information on examinee
scores such as items
attempted (tried
), items answered correctly (right
),
estimated examinee scores (ability
), standard errors of ability
estimates (se
), and response string probabilities (prob
).
This element is not created when model = "CTT"
.
Classical Test Theory (CTT) statistics, including p-values,
biserial, and point-biserial estimates calculated by BILOG-MG. If there
are groups, group-specific CTT statistics can be found in
ctt$group$GROUP-NAME
. Overall statistics for the entire group
are located at ctt$overall
.
A data frame containing items that could not be estimated.
The syntax file.
E-M Cycles of the calibration.
Newton Cycles of the calibration
The number of cycles run before calibration converges or fails to converge.
The largest change observed between the last two cycles.
-2 Log Likelihood value of the last step of
the E-M cycles. See also $em_cycles
. This value is NULL
when the model does not converge. This element is not created when
model = "CTT"
.
Posterior quadrature points and weights.
A list object that stores the arguments passed to the function.
Either a data.frame
, matrix
, or a
Response_set-class
object. Set this to NULL
if you
only intend to read BILOG-MG output from target_dir
.
Specifies the item model. Options include:
"1PL"
One-parameter logistic model.
"2PL"
Two-parameter logistic model.
"3PL"
Three-parameter logistic model.
"CTT"
Return only Classical Test theory statistics such as p-values, point-biserial and biserial correlations.
The default is "3PL"
.
The directory where BILOG-MG analysis and data files will
be stored. The default is the current working directory (i.e.,
get_wd()
).
A concise filename (without extension) used for the data files created for the analysis.
A vector of column names or numbers in x
representing the
responses. If no entry for item names is desired in the syntax file, set
items = "none"
.
The column name or number containing individual
subject IDs. If not provided (i.e., examinee_id_var = NULL
), the
program will check whether the data provided has row names and use them as
subject IDs.
The column name or number containing group membership
information for multi-group calibration. Ideally, the grouping variable
should be represented by single-digit integers. If other data types are
provided, integer values will be automatically assigned to the variables.
The default is NULL
, indicating no multi-group analysis will be
performed.
A logical value indicating whether to use logistic calibration.
If TRUE
, the calibration assumes the natural metric of the
logistic response function in all calculations.
If FALSE
, the logit is multiplied by a factor of 1.7 to
obtain the metric of the normal-ogive model.
The default value is TRUE
.
An integer specifying the maximum number of response alternatives in the raw data. This value is used as an automatic starting value for estimating pseudo-guessing parameters.
The default value is NULL
. For 3PL, the default is 5, and for 1PL
and 2PL, it's 1000. This value will be represented in the BILOG-MG control
file as: NALT = num_of_alternatives
.
The convergence criterion for EM and Newton iterations. The default value is 0.01.
The number of quadrature points used in MML
estimation. The default value is 81. This value will be represented in the
BILOG-MG control file as: NQPT = num_of_quadrature
. If there are
more than one group, the BILOG-MG default value is 20; otherwise, it's 10.
An integer (0, 1, ...) representing the maximum number
of EM cycles. This value will be represented in the BILOG-MG control file
as: CYCLES = max_em_cycles
. The default value is 100.
An integer (0, 1, ...) representing the number of Gauss-Newton
iterations following EM cycles. This value will be represented in the
BILOG-MG control file as: NEWTON = newton
.
A value indicating which group's ability distribution
will be set to mean = 0 and standard deviation = 1. For example, if the
group_var
has values 1 and 2 representing two different groups,
setting reference_group = 2
will result in the group with code 2
having an ability distribution with mean 0 and standard deviation 1.
When groups are assumed to come from a single population, set this value to 0.
The default value is `NULL`.
This value will be represented in the BILOG-MG control file as: `REFERENCE = reference_group`.
Specifies whether the parameters of specific items are free to be
estimated or should be held fixed at their starting values. This argument
accepts a data.frame
with an item_id
column, in which items
for which the item parameters will be held fixed; a
, b
,
c
parameter values. See the examples section for a demonstration.
A string vector of keywords/options to be included in
the SCORE
section of the BILOG-MG syntax. If scoring individual
examinees is not needed, set this to NULL
.
The default value is c("METHOD=1", "NOPRINT")
, where scale scores
are estimated using Maximum Likelihood estimation and the scoring process
is not printed to the R console (if show_output_on_console = TRUE
).
The primary option to add to this vector is "METHOD=n"
. The
available options are:
Maximum Likelihood (ML)
Expected a Posteriori (EAP)
Maximum a Posteriori (MAP)
Additionally, you can include the following keywords:
"NOPRINT"
: Suppresses the display of scores on the R console.
"FIT"
: Computes the likelihood ratio chi-square goodness-of-fit
statistic for each response pattern.
"NQPT=(list)"
, "IDIST=n"
, "PMN=(list)"
,
"PSD=(list)"
, "RSCTYPE=n"
, "LOCATION=(list)"
,
"SCALE=(list)"
, "INFO=n"
, "BIWEIGHT"
,
"YCOMMON"
, "POP"
, "MOMENTS"
, "FILE"
,
"READF"
, "REFERENCE=n"
, "NFORMS=n"
Refer to the BILOG-MG manual for detailed explanations of these keywords/options.
A string vector of additional keywords/options for the
CALIB
section in the BILOG-MG syntax. This is in addition to the
keywords NQPT
, CYCLES
, NEWTON
, CRIT
, and
REFERENCE
.
The default value is c("NORMAL")
.
Including "NORMAL"
in calib_options
assumes that the prior
distributions of ability in the population follow a normal distribution.
Including "COMMON"
estimates a common value for the lower asymptote
for all items in the 3PL model.
If you're calibrating items using the "RASCH"
model, set
the argument model = "Rasch"
instead of adding "RASCH"
to
calib_options
.
Additional keywords/options that can be added to calib_options
include:
- "PRINT=n"
- "IDIST=n"
- "PLOT=n"
- "DIAGNOSIS=n"
- "REFERENCE=n"
- "SELECT=(list)"
- "RIDGE=(a,b,c)"
- "ACCEL=n"
- "NSD=n"
- "EMPIRICAL"
- "FIXED"
- "TPRIOR"
- "SPRIOR"
- "GPRIOR"
- "NOTPRIOR"
- "NOSPRIOR"
- "NOGPRIOR"
- "READPRIOR"
- "NOFLOAT"
- "FLOAT"
- "NOADJUST"
- "GROUP-PLOT"
- "NFULL"
- "CHI=(a,b)"
.
Refer to the BILOG-MG manual for detailed explanations of these keywords/options.
NOTE: Do not add the following keywords to calib_options
as they are
already included in other arguments: NQPT
, CYCLES
,
NEWTON
, CRIT
, REFERENCE
.
Prior ability refers to the quadrature points and weights representing the discrete finite distribution of ability for the groups. It should be structured as a list in the following format:
list(<GROUP-NAME-1> = list(points = ...., weights = ...),
<GROUP-NAME-2> = list(points = ...., weights = ...),
...
)
Here, <GROUP-NAME-1> refers to the name of the first group, <GROUP-NAME-2> refers to the name of the second group, and so on.
Please refer to the examples section for a practical implementation.
Specify prior distributions for item parameters. The default
value is NULL
, in which case BILOG-MG defaults will be used. To
specify priors, provide a list containing one or more of the following
elements:
"ALPHA"
"'alpha' parameters for the beta prior distribution of lower asymptote (guessing) parameters"
"BETA"
"'beta' parameters for the beta prior distribution of lower asymptote (guessing) parameters."
"SMU"
prior means for slope parameters
"SSIGMA"
prior standard deviations for slope parameters
"TMU"
prior means for threshold parameters
"TSIGMA"
prior standard deviations for threshold parameters
Quoted descriptions were taken from the BILOG-MG manual.
Examples:
A specific set of priors:
list(ALPHA = 4, BETA = 3, SMU = 1, SSIGMA = 1.648, TMU = 0, TSIGMA = 2)
A very strong prior for guessing which almost fixes all guessing
parameters at 0.2: list(ALPHA = 1000000, BETA = 4000000)
Fix guessing at 0.25: list(ALPHA = 1000000, BETA = 3000000)
In general, one can adjust the alpha and beta parameters to achieve a desired outcome, considering that the mode of the beta distribution is calculated as:
$$mode = \frac{\alpha - 1}{\alpha + \beta - 2}$$
Additionally, setting SSIGMA
or TSIGMA
to a very small value
effectively fixes the item parameters. For example, TSIGMA = 0.005
or SSIGMA = 0.001
. Be aware that this may lead to convergence
issues.
Note: A non-null prior_ip
value will automatically add the
READPRIOR
option to the CALIB
section.
If set to TRUE
, any existing BILOG-MG analysis files
with the same name in the target path will be overwritten.
A logical value indicating whether to capture
and display the output of the command on the R console. The default is
TRUE
.
The directory containing the Bilog-MG executable
files. This function supports two versions: BILOG-MG 3 and BILOG-MG 4. For
BILOG-MG version 3, the directory should include the files
"blm1.exe"
, "blm2.exe"
, and "blm3.exe"
. The default
location for version 3 is file.path("C:/Program Files/BILOGMG")
. If
you have version 4 installed, the argument should point to the directory
where "BLM64.exe"
is located, which is typically
"C:/Program Files/BILOG-MG/x64"
.
Emre Gonulates