Calculates statistically significant difference in co-occurrence counts.
corp_coco(A, B, nodes, collocates = NULL, fdr = 0.01) # Deprecated
coco(A, B, nodes, fdr = 0.01, collocates = NULL)
A data.table of the form
Classes ‘data.table’ and 'data.frame': 11 variables:
$ x : chr
$ y : chr
$ H_A : int
$ M_A : int
$ H_B : int
$ M_B : int
$ effect_size : num
$ CI_lower : num
$ CI_upper : num
$ p_value : num
$ p_adjusted : num
- attr(*, "sorted")= chr "x" "y"
- attr(*, ".internal.selfref")=<externalptr>
- attr(*, "coco_metadata")=List of 5
..$ nodes : chr
..$ collocates : chr
..$ fdr : num
..$ PACKAGE_VERSION:Classes 'package_version', 'numeric_version'
.. ..$ : int
..$ date : Date, format: "2016-11-01"
A corp_cooccurrence object.
For the deprecated coco function this is a
data.frame of co-occurrence counts as returned by
corp_get_counts.
A corp_cooccurrence object.
For the deprecated coco function this is a
data.frame of co-occurrence counts as returned by
corp_get_counts.
A character vector of node types or character string
representing a single node type.
A character vector of collocates types or character string
representing a single collocate type.
The collocates essentially act as a filter on the y column
of the returned data structure. collocates should be used to
target the testing; reducing the number of tests will reduce the loss
of power from the multiple test correction.
The desired level at which to control the False Discovery Rate.
Default value is 0.01.
The corp_coco function implements the method introduced in Wiegand and Hennessey et al. (2017a) (described in more detail from a linguistic perspective in Wiegand, 2019).
fdr indicates the level at which the False Discovery Rate will be
controlled because the method carries out a large number of tests.
For a description of the form of FDR used see Benjamini and Hochberg (1995).
For description of the p_adjusted column in the returned structure see
p.adjust.
The returned data structure is a data.table.
A data.table is also a data.frame and will behave exactly
as such if the data.table library is not loaded.
The returned data.table contains details of all the
co-occurrences for which there is evidence of a difference in
co-occurrence between the two supplied data sets.
The effect size is calculated as the log base 2 of the odds ratio.
The effects size and its confidence interval are captured in the
effect_size, CI_lower and CI_upper columns.
The p_value column contains the non-adjusted p-value from the
Fisher's Exact Test.
Y. Benjamini and Y. Hochberg (1995) Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological) 57 (1)289–300.
* Wiegand, V., Hennessey, A., Tench, C. R., & Mahlberg, M. (2017a, May 24). Comparing co-occurrences between corpora. 38th ICAME conference, Charles University, Prague. * Wiegand, V. (2019). A Corpus Linguistic Approach to Meaning-Making Patterns in Surveillance Discourse [PhD, University of Birmingham]. https://etheses.bham.ac.uk/id/eprint/9778