get_synthetic_cases

This function creates synthetic cases for balancing the training with an
object of the class TextEmbeddingClassifierNeuralNet.

In social and educational settings, the use of Artificial
Intelligence (AI) is a challenging task. Relevant data is often only
available in handwritten forms, or the use of data is restricted by
privacy policies. This often leads to small data sets. Furthermore, in the educational and social sciences,
data is often unbalanced in terms of
frequencies. To support educators as well as educational and social
researchers in using the potentials of AI for their work, this package
provides a unified interface for neural nets in 'keras',
'tensorflow', and 'pytorch' to deal with natural language problems. In addition,
the package ships with a shiny app, providing a graphical user interface.
This allows the usage of AI for people without skills in writing python/R scripts.
The tools integrate existing mathematical and statistical methods for dealing
with small data sets via pseudo-labeling (e.g. Lee (2013)
<https://www.researchgate.net/publication/280581078_Pseudo-Label_The_Simple_and_Efficient_Semi-Supervised_Learning_Method_for_Deep_Neural_Networks>,
Cascante-Bonilla et al. (2020) <doi:10.48550/arXiv.2001.06001>) and
imbalanced data via the creation of synthetic cases (e.g.
Bunkhumpornpat et al. (2012) <doi:10.1007/s10489-011-0287-y>).
Performance evaluation of AI is connected to measures from content
analysis which educational and social researchers are generally more
familiar with (e.g. Berding & Pargmann (2022) <doi:10.30819/5581>,
Gwet (2014) <ISBN:978-0-9708062-8-4>, Krippendorff (2019)
<doi:10.4135/9781071878781>). Estimation of energy consumption and CO2
emissions during model training is done with the 'python' library
'codecarbon'. Finally, all objects created with this package allow to
share trained AI models with other people.

Berding Florian

aifeducation

Artificial Intelligence for Education

Pargmann Julia

Riebenbauer Elisabeth

Rebmann Karin

Slopinski Andreas

get_synthetic_cases function

<dl><dt>embedding</dt>
<dd>Named <code>data.frame</code> containing the text embeddings.
In most cases, this object is taken from EmbeddedText$embeddings.</dd>
<dt>times</dt>
<dd><code>int</code> for the number of sequences/times.</dd>
<dt>features</dt>
<dd><code>int</code> for the number of features within each sequence.</dd>
<dt>target</dt>
<dd>Named <code>factor</code> containing the labels of the corresponding embeddings.</dd>
<dt>method</dt>
<dd><code>vector</code> containing strings of the requested methods for generating new cases.
Currently "smote","dbsmote", and "adas" from the package smotefamily are available.</dd>
<dt>max_k</dt>
<dd><code>int</code> The maximum number of nearest neighbors during sampling process.</dd></dl>

Arguments

Create synthetic cases for balancing training data — get_synthetic_cases

<dl>

<dt>embedding</dt>
<dd>Named <code>data.frame</code> containing the text embeddings.
In most cases, this object is taken from EmbeddedText$embeddings.</dd>


<dt>times</dt>
<dd><code>int</code> for the number of sequences/times.</dd>


<dt>features</dt>
<dd><code>int</code> for the number of features within each sequence.</dd>


<dt>target</dt>
<dd>Named <code>factor</code> containing the labels of the corresponding embeddings.</dd>


<dt>method</dt>
<dd><code>vector</code> containing strings of the requested methods for generating new cases.
Currently "smote","dbsmote", and "adas" from the package smotefamily are available.</dd>


<dt>max_k</dt>
<dd><code>int</code> The maximum number of nearest neighbors during sampling process.</dd>

</dl>

get_synthetic_cases: Create synthetic cases for balancing training data

Description

Usage

Value

Arguments

See Also