Learn R Programming

RJafroc (version 1.0.1)

RJafroc-package: Modeling, Analysis, Validation and Visualization of Observer Performance Studies in Diagnostic Radiology

Description

Software for assessing medical imaging systems, radiologists or computer aided detection (CAD) algorithms. Models of observer performance include the binormal model, the contaminated binormal model (CBM) and the radiological search model (RSM). The software and its applications are described in a book: Chakraborty DP: Observer Performance Methods for Diagnostic Imaging - Foundations, Modeling, and Applications with R-Based Examples. Taylor-Francis LLC; 2017. The observer performance data collection paradigms include receiver operating characteristic (ROC) and its location specific extensions, primarily free-response ROC (FROC) and the location ROC (LROC). ROC data consists of single ratings per images. The rating is the perceived confidence level that the image is of a diseased patient. FROC data consists of a variable number (including zero) of mark-rating pairs per image, where a mark is the location of a clinically reportable suspicious region and the rating is the corresponding confidence level that it is a true lesion. LROC data consists of a rating and a forced localization of the most suspicious region on every image. In this software higher ratings always represent greater confidence in presence of disease. The software supersedes the Windows version of JAFROC software V4.2.1, http://www.devchakraborty.com. This package is incompatible with RJafroc version 0.1.1. In addition to improvements and new functions, existing functions have been renamed for better organization. Data file related function names are preceded by Df, curve fitting functions by Fit, included data sets by dataset, plotting functions by Plot, significance testing functions by St, sample size related functions by Ss, data simulation functions by Simulate and utility functions by Util. The software implements a number of figures of merit (FOMs) for quantifying performance, functions for visualizing empirical operating characteristics: e.g., ROC, FROC, alternative FROC (AFROC) and weighted AFROC (wAFROC). Three ROC ratings data curve-fitting algorithms are implemented: the Binormal Model (BM), the Contaminated Binormal Model (CBM) and the Radiological Search Model (RSM). Unlike the BM, the CBM and the RSM predict "proper" ROC curves that do not cross the chance diagonal or display inappropriate "hooks" near the upper right corner of the plots. RSM fitting additionally yields measures of search and lesion-classification performances in addition to the usual case-classification accuracy measured by the area under the ROC curve. Search performance is the ability to find lesions while avoiding finding non-lesions. Lesion-classification performance is the ability to correctly classify found lesions from found non-lesions. For fully crossed study designs, termed multiple-reader multiple-case (MRMC), significance testing of reader-averaged FOM differences between modalities is implemented via both Dorfman-Berbaum-Metz and the Obuchowski-Rockette methods. Also implemented are single treatment analyses, which allow comparison of performance of a group of radiologists to a specified value, or comparison to CAD to a group of radiologists interpreting the same cases. A crossed-modality analysis is implemented wherein there are two crossed treatment factors and the desire is to determined performance in each treatment factor averaged over all levels of the other factor. Sample size estimation tools are provided for ROC and FROC studies; these use estimates of the relevant variances from a pilot study to predict required numbers of readers and cases in a pivotal study to achieve a desired power. Utility and data file manipulation functions allow data to be read in any of the currently used input formats, including Excel, and the results of the analysis can be viewed in text or Excel output files. The methods are illustrated with several included datasets from the author's international collaborations. The package is used extensively in online appendices of the referenced book.

Arguments

Abbreviations and definitions

  • a: The separation or "a" parameter of the conventional binormal model

  • AFROC curve: plot of LLF (ordinate) vs. FPF, where FPF is inferred using highest rating of NL marks on non-diseased cases

  • AFROC: alternative FROC, see Chakraborty 1989

  • AFROC1 curve: plot of LLF (ordinate) vs. FPF1, where FPF1 is inferred using highest rating of NL marks on ALL cases

  • \(alpha\): The significance level \(\alpha\) of the test of the null hypothesis of no treatment effect

  • AUC: area under curve; e.g., ROC-AUC = area under ROC curve, an example of a FOM

  • b: The width or "b" parameter of the conventional binormal model

  • Binormal model: two unequal variance normal distributions, one at zero and one at \(mu\), for modeling ROC ratings, \(sigma\) is the std. dev. ratio of diseased to non-diseased distributions

  • CAD: computer aided detection algorithm

  • CBM: contaminated binormal model (CBM): two equal variance normal distributions for modeling ROC ratings, the diseased distribution is bimodal, with a peak at zero and one at \(\mu\), the integrated fraction at \(\mu\) is \(\alpha\) (not to be confused with \(\alpha\) of NH testing)

  • CI: The (1-\(\alpha\)) confidence interval for the stated statistic

  • Crossed modality: a dataset containing two modality factors, with the levels of the two factors crossed, see paper by Thompson et al

  • DBM: Dorfman-Berbaum-Metz, a significance testing method for detecting a treatment effect in MRMC studies

  • DBMH: Hillis' modification of the DBM method

  • ddf: Denominator degrees of freedom of appropriate \(F\)-test; the corresponding ndf is I - 1

  • Empirical AUC: trapezoidal area under curve, same as the Wilcoxon statistic for ROC paradigm

  • FN: false negative, a diseased case classified as non-diseased

  • FOM: figure of merit, a quantitative measure of performance, performance metric

  • FP: false positive, a non-diseased case classified as diseased

  • FPF: number of FPs divided by number of non-diseased cases

  • FROC curve: plot of LLF (ordinate) vs. NLF

  • FROC: free-response ROC (a data collection paradigm where each image yields a random number, 0, 1, 2,..., of mark-rating pairs)

  • FRRC: Analysis that treats readers as fixed and cases as random factors

  • I: total number of modalities, indexed by \(i\)

  • image/case: used interchangeably; a case can consist of several images of the same patient in the same modality

  • iMRMC: A text file format used for ROC data by FDA/CDRH researchers

  • individual dataset: A single modality single reader dataset.

  • Intrinsic: Used in connection with RSM; a parameter that is independent of the RSM \(\mu\) parameter, but whose meaning may not be as transparent as the corresponding physical parameter

  • J: total number of readers, indexed by \(j\)

  • JAFROC file format: A .xlsx file format, applicable to ROCX, FROC and LROC paradigms

  • JAFROC FOM: trapezoidal area under AFROC curve; this term is obsolete; use AFROC-AUC instead

  • JAFROC: jackknife AFROC: Windows software for analyzing observer performance data: no longer updated, replaced by current package; the name is a misnomer as the jackknife is used only for significance testing; alternatively, the bootstrap could be used; what distinguishes FROC from ROC analysis is the use of the AFROC-AUC as the FOM. With this change, the DBM or the OR method can be used for significance testing

  • JAFROC1 FOM: trapezoidal area under AFROC1 curve; this term is obsolete; use AFROC1-AUC instead

  • K: total number of cases, K = K1 + K2, indexed by \(k\)

  • K1: total number of non-diseased cases, indexed by \(k1\)

  • K2: total number of diseased cases, indexed by \(k2\)

  • LL: lesion localization i.e., a mark that correctly locates an existing localized lesion; TP is a special case, when the proximity criterion is lax (i.e., "acceptance radius" is large)

  • LLF: number of LLs divided by the total number of lesions

  • LROC: location receiver operating characteristic, a data collection paradigm where each image yields a single rating and one location

  • lrc/MRMC: A text file format used for ROC data by University of Iowa researchers

  • mark: the location of a suspected diseased region

  • maxLL: maximum number of lesions per case in dataset

  • maxNL: maximum number of NL marks per case in dataset

  • MRMC: multiple reader multiple case (each reader interprets each case in each modality, i.e. fully crossed study design)

  • ndf: Numerator degrees of freedom of appropriate \(F\)-test, usually number of treatments minus one

  • NH: The null hypothesis that all treatment effects are zero; rejected if the \(p\)-value is smaller than \(\alpha\)

  • NL: non-lesion localization, of which FP is a special case, i.e., a mark that does not correctly locate any existing localized lesion(s)

  • NLF: number of NLs divided by the total number of cases

  • Operating characteristic: A plot of normalized correct decisions on diseased cases along ordinate vs. normalized incorrect decisions on non-diseased cases

  • Operating point: A point on an operating characteristic, e.g., (FPF, TPF) represents an operating point on an ROC

  • OR: Obuchowski-Rockette, a significance testing method for detecting a treatment effect in MRMC studies

  • ORH: Hillis' modification of the OR method

  • Physical parameter: Used in connection with RSM; a parameter whose meaning is more transparent than the corresponding intrinsic parameter, but which depends on the RSM \(\mu\) parameter

  • Proximity criterion / acceptance radius: Used in connection with FROC (or LROC data); the "nearness" criterion is used to determine if a mark is close enough to a lesion to be counted as a LL (or correct localization); otherwise it is counted as a NL (or incorrect localization)

  • p-value: the probability, under the null hypothesis, that the observed treatment effects, or larger, could occur by chance

  • Proper: a proper fit does not inappropriately fall below the chance diagonal, does not display a "hook" near the upper right corner

  • PROPROC: Metz's binormal model based fitting of proper ROC curves

  • RSM, Radiological Search Model: two unit variance normal distributions for modeling NL and LL ratings; four parameters, \(\mu\), \(\nu\)', \(\lambda\)' and \(\zeta\)1

  • Rating: Confidence level assigned to a case; higher values indicate greater confidence in presence of disease; -Inf is allowed but NA is not allowed

  • Reader/observer/radiologist/CAD: used interchangeably

  • RJafroc: the current software

  • ROC: receiver operating characteristic, a data collection paradigm where each image yields a single rating and location information is ignored

  • ROC curve: plot of TPF (ordinate) vs. FPF, as threshold is varied; an example of an operating characteristic

  • ROCFIT: Metz software for binormal model based fitting of ROC data

  • ROI: region-of-interest (each case is divided into a number of ROIs and the reader assigns an ROC rating to each ROI)

  • FRRC: Analysis that treats readers as fixed and cases as random factors

  • RRFC: Analysis that treats readers as random and cases as fixed factors

  • RRRC: Analysis that treats both readers and cases as random factors

  • RSCORE-II: original software for binormal model based fitting of ROC data

  • RSM: Radiological search model, also method for fitting a proper ROC curve to ROC data

  • RSM-\(\zeta\)1: Lowest reporting threshold, determines if suspicious region is actually marked

  • RSM-\(\lambda\): Intrinsic parameter of RSM corresponding to \(\lambda\)', independent of \(\mu\)

  • RSM-\(\lambda\)': Physical Poisson parameter of RSM, average number of latent NLs per case; depends on \(\mu\)

  • RSM-\(\mu\): separation of the unit variance distributions of RSM

  • RSM-\(\nu\): Intrinsic parameter of RSM, corresponding to \(\nu\)', independent of \(\mu\)

  • RSM-\(\nu\)': binomial parameter of RSM, probability that lesion is found

  • SE: sensitivity, same as \(TPF\)

  • Significance testing: determining the p-value of a statistical test

  • SP: specificity, same as \(1-FPF\)

  • Threshold: Reporting criteria: if confidence exceeds a threshold value, report case as diseased, otherwise report non-diseased

  • TN: true negative, a non-diseased case classified as non-diseased

  • TP: true positive, a diseased case classified as diseased

  • TPF: number of TPs divided by number of diseased cases

  • Treatment/modality: used interchangeably, for example, computed tomography (CT) images vs. magnetic resonance images (MRI)

  • wAFROC curve: plot of weighted LLF (ordinate) vs. FPF, where FPF is inferred using highest rating of NL marks on non-diseased cases ONLY

  • wAFROC1 curve: plot of weighted LLF (ordinate) vs. FPF1, where FPF1 is inferred using highest rating of NL marks on ALL cases

  • wJAFROC FOM: weighted trapezoidal area under AFROC curve: this term is obsolete; use wAFROC-AUC instead;this is the recommended FOM

  • wJAFROC1 FOM: weighted trapezoidal area under AFROC1 curve: only use if there are zero non-diseased cases is always number of treatments minus one

Dataset

Dataset, an object, can be created by the user or read from an external text of Excel file. The dataset is a list generally containing 8 elements (9 elements for crossed-modality or LROC datasets): Note: -Inf is used to indicate the ratings of unmarked lesions and/or to indicate unavailable array items. An example of the latter would be if the maximum number of NLs in a dataset was 4, but some images had fewer than 4 NLs, in which case the corresponding "empty" positions would be filled with -Infs. Do not use NA to denote a rating.

Note: the word "dataset" used in this package always represents an R object with one of the following structures:

General data structure, example dataset02 and dataset05

  • NL: a floating-point array with dimension c(I, J, K, maxNL) containing the ratings of NL marks. The first K1 locations of the third index corresponds to NL marks on non-diseased cases and the remaining locations correspond to NL marks on diseased cases. The 4th dimension allows for the possibility of multiple NL marks on a case. For FROC datasets unavailable NL ratings are assigned -Inf. For ROC datasets FP ratings are assigned to the first K1 elements of NL[,,1:K1,1] and the remaining K2 elements of NL[,,(K1+1):K,1] are set to -Inf. When converting from FROC to ROC data the software assigns -Inf to cases with no marks.

  • LL: a floating-point array with dimension c(I, J, K2, maxLL) that contains the ratings of all LL marks. For ROC datasets TP ratings are assigned to LL[,,1:K2,1].

  • lesionNum: a integer vector with length K2, whose elements indicate the number of lesions in each diseased case.

  • lesionID: an integer array with dimension [K2, maxLL]. Its contents label lesions on diseased cases. For example, dataset05$lesionID[40,] is c(1,2,-Inf), meaning the 40th diseased case in this dataset has two lesions, labeled 1 and 2. The lesionID of an LL in the TP or LL worksheet must correspond to the lesionID for that case in the Truth worksheet. For example, if the lesionID for the 40th diseased case in the TP or LL worksheet is 2, then the associated rating must correspond to the lesion labeled 2 in the Truth worksheet, etc.

  • lesionWeight: a floating point array with dimension c(K2, maxLL), representing the relative importance of detecting each lesion. For each case, the weights sum to unity. If zero is assigned to the Weight field in the Truth worksheet, the software automatically assigns equal weighting, e.g., dataset05$lesionWeight[40,] is c(0.5,0.5), corresponding to equal weights (1/2) to each lesion on an image with two lesions.

  • dataType: a string variable: "ROC", "ROI" or "FROC".

  • modalityID: a string vector of length \(I\), which labels the modalities in the dataset.

  • readerID: a string vector of length \(J\), which labels the readers. For example, NL[1, 2, , ] indicates the NL-rating of the reader identified with the second label in readerID using the modality identified with the first label in modalityID.

LROC data structure, example datasetCadLroc

  • NL: a floating-point array with dimension c(I, J, K, 1) that contains the ratings of FP marks. For the third index, the first K1 elements contain valid ratings while the rest are filled with -Infs.

  • LLCl: a floating-point array with dimension c(I, J, K2, 1) that contains the ratings of all correct localization (CL) marks. A -Inf indicates a case with no CL mark.

  • LLIl: a floating-point array with dimension c(I, J, K2, 1) that contains the ratings of all incorrect localization (IL) marks. A -Inf indicates a case with no IL mark.

  • lesionNum: same as general case.

  • lesionID: lesionID: an integer vector with length K2 containing ones.

  • lesionWeight: a floating point array with dimension c(K2, 1) containing ones.

  • dataType: a string variable: "LROC".

  • modalityID: same as general case.

  • readerID: same as general case.

Crossed modality data structure, example datasetCrossedModality

  • NL: a floating-point array with dimension c(I1, I2, J, K, maxNL) that contains the ratings of NL marks. Note the existence of two modality indices.

  • LL: a floating-point array with dimension c(I1, I2, J, K2, maxLL)that contains the ratings of all LL marks. Note the existence of two modality indices.

  • lesionNum: same as general case.

  • lesionID: same as general case.

  • lesionWeight: same as general case.

  • dataType: a string variable: "ROC" or "FROC".

  • modalityID1: same as general case, corresponding to first modality factor.

  • modalityID2: same as general case, corresponding to second modality factor.

  • readerID: same as general case.

Data file format

The package reads JAFROC, MRMC (ROC data only) and iMRMC (ROC data only) data files. The data can be imported by using the function DfReadDataFile.

  • JAFROC data file format The JAFROC data file is an Excel file containing three worksheets (*.xls and *.xlsx are supported): (1) the Truth worksheet, (2) the TP or LL worksheet and (3) the FP or NL worksheet. Except for the Truth worksheet, where each case must occur at least once, the number of rows in the other worksheets is variable.

    1. Truth worksheet consists of

      • CaseID, an integer field uniquely labeling the cases (images). It must occur at least once for each case, and since a case may have multiple lesions, it can occur multiple times, once for each lesion.

      • LesionID, an integer field uniquely labeling the lesions in each case. This field is zero for non-diseased cases.

      • Weight, a floating-point field, which is the relative importance of detecting each lesion. This field is zero for non-diseased cases and for equally weighted lesions; otherwise the weights must sum to unity for each case. Unless a weighted figure of merit is selected, this field is irrelevant.

    2. TP worksheet consists of

      • ReaderID, a string field uniquely labeling the readers (radiologists).

      • ModalityID, a string field uniquely labeling the modalities.

      • CaseID, see Truth worksheet. A non-diseased case in this field will generate an error.

      • LesionID, see Truth worksheet. An entry in this field that does not appear in the Truth worksheet will generate an error. It is the user's responsibility to ensure that the entries in the Truth and TP worksheets correspond to the same physical lesions.

      • TP_Rating, a positive floating-point field denoting the rating assigned to a particular lesion-localization mark, with higher numbers represent greater confidence that the location is actually a lesion.

    3. FP worksheet consists of

      • ReaderID, see TP worksheet.

      • ModalityID, see TP worksheet.

      • CaseID, see TP worksheet.

      • FP_Rating, a positive floating-point field denoting the rating assigned to a particular non-lesion-localization mark, with higher numbers represent greater confidence that the location is actually a lesion.

  • MRMC data file format / LABMRMC format

    • Input format for MRMC. This format is described in the Medical Image Perception Laboratory website, currently http://perception.radiology.uiowa.edu/.

    • LABMRMC data format. The data file includes following parts. The file must be saved as plain text file with *.lrc extension. All items in the file are separated by one or more blank spaces.

      1. The first line is a free text description of the file.

      2. The second line is the name or ID of the first reader.

      3. The third line has the names or IDs of all the modalities. Each name or ID must be enclosed by double quotes(" ").

      4. The fourth line must have the letter (l or s) or word (large or small) for each modality. The letter or word indicates that smaller or larger ratings represent stronger confidence of presence of disease.

      5. The following lines contain the ratings in all modalities, separated by spaces or tabs, of the non-diseased cases, one case per line. The cases must appear in the same order for all readers. Missing value is not allowed.

      6. After the last non-diseased case insert a line containing the asterisk (*) symbol.

      7. Repeat steps 5 and 6 for the diseased cases.

      8. Repeat steps 2, 5, 6 and 7 for the remaining readers.

      9. The last line of the data file must be a pound symbol (#).

    iMRMC data format This is described in the iMRMC website, currently https://code.google.com/p/imrmc/.

Df: Datafile Related Functions

Fitting Functions

  • FitBinormalRoc: Fit the binormal model to ROC data (R equivalent of ROCFIT or RSCORE).

  • FitCbmRoc: Fit the contaminated binormal model (CBM) to ROC data.

  • FitRsmRoc: Fit the radiological search model (RSM) to ROC data.

  • FitRsmRoc: Fit the radiological search model (RSM) to ROC data.

Plotting Functions

Simulation Functions

Sample size Functions

  • SsFROCPowerGivenJK: Calculate statistical power given numbers of readers J and cases K from a pilot FROC dataset.

  • SsPowerGivenJK: Calculate statistical power given numbers of readers J and cases K.

  • SsPowerTable: Generate a power table.

  • SsSampleSizeKGivenJ: Calculate number of cases K, for specified number of readers J, to achieve desired power for an ROC study.

Significance Testing Functions

Miscellaneous and Utility Functions

Details

Package: RJafroc
Type: Package
Version: 1.0.1
Date: 2017-08-31
License: GPL-3
URL: http://www.devchakraborty.com

References

Basics of ROC

Metz, CE (1978). Basic principles of ROC analysis. In Seminars in nuclear medicine (Vol. 8, pp. 283--298). Elsevier.

Metz, CE (1986). ROC Methodology in Radiologic Imaging. Investigative Radiology, 21(9), 720.

Metz, CE (1989). Some practical issues of experimental design and data analysis in radiological ROC studies. Investigative Radiology, 24(3), 234.

Metz, CE (2008). ROC analysis in medical imaging: a tutorial review of the literature. Radiological Physics and Technology, 1(1), 2--12.

Wagner, R. F., Beiden, S. V, Campbell, G., Metz, CE, & Sacks, W. M. (2002). Assessment of medical imaging and computer-assist systems: lessons from recent experience. Academic Radiology, 9(11), 1264--77.

Wagner, R. F., Metz, CE, & Campbell, G. (2007). Assessment of medical imaging systems and computer aids: a tutorial review. Academic Radiology, 14(6), 723--48.

DBM/OR methods and extensions

DORFMAN, D. D., BERBAUM, KS, & Metz, CE (1992). Receiver operating characteristic rating analysis: generalization to the population of readers and patients with the jackknife method. Investigative Radiology, 27(9), 723.

Obuchowski, NA, & Rockette, HE (1994). HYPOTHESIS TESTING OF DIAGNOSTIC ACCURACY FOR MULTIPLE READERS AND MULTIPLE TESTS: AN ANOVA APPROACH WITH DEPENDENT OBSERVATIONS. Communications in Statistics-Simulation and Computation, 24(2), 285--308.

Hillis, SL, Berbaum, KS, & Metz, CE (2008). Recent developments in the Dorfman-Berbaum-Metz procedure for multireader ROC study analysis. Academic Radiology, 15(5), 647--61.

Hillis, SL, Obuchowski, NA, & Berbaum, KS (2011). Power Estimation for Multireader ROC Methods: An Updated and Unified Approach. Acad Radiol, 18, 129--142.

Hillis, SL SL (2007). A comparison of denominator degrees of freedom methods for multiple observer ROC analysis. Statistics in Medicine, 26(3), 596--619.

FROC paradigm

Chakraborty DP. Maximum Likelihood analysis of free-response receiver operating characteristic (FROC) data. Med Phys. 1989;16(4):561--568.

Chakraborty, DP, & Berbaum, KS (2004). Observer studies involving detection and localization: modeling, analysis, and validation. Medical Physics, 31(8), 1--18.

Chakraborty, DP (2006). A search model and figure of merit for observer data acquired according to the free-response paradigm. Physics in Medicine and Biology, 51(14), 3449--62.

Chakraborty, DP (2006). ROC curves predicted by a model of visual search. Physics in Medicine and Biology, 51(14), 3463--82.

Chakraborty, DP (2011). New Developments in Observer Performance Methodology in Medical Imaging. Seminars in Nuclear Medicine, 41(6), 401--418.

Chakraborty, DP (2013). A Brief History of Free-Response Receiver Operating Characteristic Paradigm Data Analysis. Academic Radiology, 20(7), 915--919.

Chakraborty, DP, & Yoon, H.-J. (2008). Operating characteristics predicted by models for diagnostic tasks involving lesion localization. Medical Physics, 35(2), 435.

Thompson JD, Chakraborty DP, Szczepura K, et al. (2016) Effect of reconstruction methods and x-ray tube current-time product on nodule detection in an anthropomorphic thorax phantom: a crossed-modality JAFROC observer study. Medical Physics. 43(3):1265-1274.

Zhai X, Chakraborty DP. (2017) A bivariate contaminated binormal model for robust fitting of proper ROC curves to a pair of correlated, possibly degenerate, ROC datasets. Medical Physics. doi: 10.1002/mp.12263:2207--2222.

Hillis SL, Chakraborty DP, Orton CG. ROC or FROC? It depends on the research question. Medical Physics. 2017.

Chakraborty DP, Nishikawa RM, Orton CG. Due to potential concerns of bias and conflicts of interest, regulatory bodies should not do evaluation methodology research related to their regulatory missions. Medical Physics. 2017.

Dobbins III JT, McAdams HP, Sabol JM, Chakraborty DP, et al. (2016) Multi-Institutional Evaluation of Digital Tomosynthesis, Dual-Energy Radiography, and Conventional Chest Radiography for the Detection and Management of Pulmonary Nodules. Radiology. 282(1):236-250.

Warren LM, Mackenzie A, Cooke J, et al. Effect of image quality on calcification detection in digital mammography. Medical Physics. 2012;39(6):3202-3213.

Chakraborty DP, Zhai X. On the meaning of the weighted alternative free-response operating characteristic figure of merit. Medical physics. 2016;43(5):2548-2557.

Chakraborty DP. (2017) Observer Performance Methods for Diagnostic Imaging - Foundations, Modeling, and Applications with R-Based Examples. Taylor-Francis, LLC.