get_train_test_split

This function creates a train and validation sample based on stratified random
sampling. The relative frequencies of each category in the train and validation sample
equal the relative frequencies of the initial data (proportional stratified sampling).

internal

In social and educational settings, the use of Artificial
Intelligence (AI) is a challenging task. Relevant data is often only
available in handwritten forms, or the use of data is restricted by
privacy policies. This often leads to small data sets. Furthermore, in the educational and social sciences,
data is often unbalanced in terms of
frequencies. To support educators as well as educational and social
researchers in using the potentials of AI for their work, this package
provides a unified interface for neural nets in 'keras',
'tensorflow', and 'pytorch' to deal with natural language problems. In addition,
the package ships with a shiny app, providing a graphical user interface.
This allows the usage of AI for people without skills in writing python/R scripts.
The tools integrate existing mathematical and statistical methods for dealing
with small data sets via pseudo-labeling (e.g. Lee (2013)
<https://www.researchgate.net/publication/280581078_Pseudo-Label_The_Simple_and_Efficient_Semi-Supervised_Learning_Method_for_Deep_Neural_Networks>,
Cascante-Bonilla et al. (2020) <doi:10.48550/arXiv.2001.06001>) and
imbalanced data via the creation of synthetic cases (e.g.
Bunkhumpornpat et al. (2012) <doi:10.1007/s10489-011-0287-y>).
Performance evaluation of AI is connected to measures from content
analysis which educational and social researchers are generally more
familiar with (e.g. Berding & Pargmann (2022) <doi:10.30819/5581>,
Gwet (2014) <ISBN:978-0-9708062-8-4>, Krippendorff (2019)
<doi:10.4135/9781071878781>). Estimation of energy consumption and CO2
emissions during model training is done with the 'python' library
'codecarbon'. Finally, all objects created with this package allow to
share trained AI models with other people.

Berding Florian

aifeducation

Artificial Intelligence for Education

Pargmann Julia

Riebenbauer Elisabeth

Rebmann Karin

Slopinski Andreas

get_train_test_split function

<dl><dt>embedding</dt>
<dd>Object of class EmbeddedText.</dd>
<dt>target</dt>
<dd>Named <code>factor</code> containing the labels of every case.</dd>
<dt>val_size</dt>
<dd><code>double</code> Ratio between 0 and 1 indicating the relative
frequency of cases which should be used as validation sample.</dd></dl>

Arguments

Function for splitting data into a train and validation sample — get_train_test_split

<dl>

<dt>embedding</dt>
<dd>Object of class EmbeddedText.</dd>


<dt>target</dt>
<dd>Named <code>factor</code> containing the labels of every case.</dd>


<dt>val_size</dt>
<dd><code>double</code> Ratio between 0 and 1 indicating the relative
frequency of cases which should be used as validation sample.</dd>

</dl>

get_train_test_split: Function for splitting data into a train and validation sample

Description

Usage

Value

Arguments

See Also