create_synthetic_units

Function for creating synthetic cases in order to balance the data for
training with TextEmbeddingClassifierNeuralNet. This is an auxiliary
function for use with get_synthetic_cases to allow parallel
computations.

In social and educational settings, the use of Artificial
Intelligence (AI) is a challenging task. Relevant data is often only
available in handwritten forms, or the use of data is restricted by
privacy policies. This often leads to small data sets. Furthermore, in the educational and social sciences,
data is often unbalanced in terms of
frequencies. To support educators as well as educational and social
researchers in using the potentials of AI for their work, this package
provides a unified interface for neural nets in 'keras',
'tensorflow', and 'pytorch' to deal with natural language problems. In addition,
the package ships with a shiny app, providing a graphical user interface.
This allows the usage of AI for people without skills in writing python/R scripts.
The tools integrate existing mathematical and statistical methods for dealing
with small data sets via pseudo-labeling (e.g. Lee (2013)
<https://www.researchgate.net/publication/280581078_Pseudo-Label_The_Simple_and_Efficient_Semi-Supervised_Learning_Method_for_Deep_Neural_Networks>,
Cascante-Bonilla et al. (2020) <doi:10.48550/arXiv.2001.06001>) and
imbalanced data via the creation of synthetic cases (e.g.
Bunkhumpornpat et al. (2012) <doi:10.1007/s10489-011-0287-y>).
Performance evaluation of AI is connected to measures from content
analysis which educational and social researchers are generally more
familiar with (e.g. Berding & Pargmann (2022) <doi:10.30819/5581>,
Gwet (2014) <ISBN:978-0-9708062-8-4>, Krippendorff (2019)
<doi:10.4135/9781071878781>). Estimation of energy consumption and CO2
emissions during model training is done with the 'python' library
'codecarbon'. Finally, all objects created with this package allow to
share trained AI models with other people.

Berding Florian

aifeducation

Artificial Intelligence for Education

Pargmann Julia

Riebenbauer Elisabeth

Rebmann Karin

Slopinski Andreas

create_synthetic_units function

<dl><dt>embedding</dt>
<dd>Named <code>data.frame</code> containing the text embeddings.
In most cases this object is taken from EmbeddedText$embeddings.</dd>
<dt>target</dt>
<dd>Named <code>factor</code> containing the labels/categories of the corresponding cases.</dd>
<dt>k</dt>
<dd><code>int</code> The number of nearest neighbors during sampling process.</dd>
<dt>max_k</dt>
<dd><code>int</code> The maximum number of nearest neighbors during sampling process.</dd>
<dt>method</dt>
<dd><code>vector</code> containing strings of the requested methods for generating new cases.
Currently "smote","dbsmote", and "adas" from the package smotefamily are available.</dd>
<dt>cat</dt>
<dd><code>string</code> The category for which new cases should be created.</dd>
<dt>cat_freq</dt>
<dd>Object of class <code>"table"</code> containing the absolute frequencies
of every category/label.</dd></dl>

Arguments

Create synthetic units — create_synthetic_units

<dl>

<dt>embedding</dt>
<dd>Named <code>data.frame</code> containing the text embeddings.
In most cases this object is taken from EmbeddedText$embeddings.</dd>


<dt>target</dt>
<dd>Named <code>factor</code> containing the labels/categories of the corresponding cases.</dd>


<dt>k</dt>
<dd><code>int</code> The number of nearest neighbors during sampling process.</dd>


<dt>max_k</dt>
<dd><code>int</code> The maximum number of nearest neighbors during sampling process.</dd>


<dt>method</dt>
<dd><code>vector</code> containing strings of the requested methods for generating new cases.
Currently "smote","dbsmote", and "adas" from the package smotefamily are available.</dd>


<dt>cat</dt>
<dd><code>string</code> The category for which new cases should be created.</dd>


<dt>cat_freq</dt>
<dd>Object of class <code>"table"</code> containing the absolute frequencies
of every category/label.</dd>

</dl>

create_synthetic_units: Create synthetic units

Description

Usage

Value

Arguments

See Also