bow_pp_create_vocab_draft

Function for creating a first draft of a vocabulary
This function creates a list of tokens which refer to specific
universal part-of-speech tags (UPOS) and provides the corresponding lemmas.

In social and educational settings, the use of Artificial
Intelligence (AI) is a challenging task. Relevant data is often only
available in handwritten forms, or the use of data is restricted by
privacy policies. This often leads to small data sets. Furthermore, in the educational and social sciences,
data is often unbalanced in terms of
frequencies. To support educators as well as educational and social
researchers in using the potentials of AI for their work, this package
provides a unified interface for neural nets in 'keras',
'tensorflow', and 'pytorch' to deal with natural language problems. In addition,
the package ships with a shiny app, providing a graphical user interface.
This allows the usage of AI for people without skills in writing python/R scripts.
The tools integrate existing mathematical and statistical methods for dealing
with small data sets via pseudo-labeling (e.g. Lee (2013)
<https://www.researchgate.net/publication/280581078_Pseudo-Label_The_Simple_and_Efficient_Semi-Supervised_Learning_Method_for_Deep_Neural_Networks>,
Cascante-Bonilla et al. (2020) <doi:10.48550/arXiv.2001.06001>) and
imbalanced data via the creation of synthetic cases (e.g.
Bunkhumpornpat et al. (2012) <doi:10.1007/s10489-011-0287-y>).
Performance evaluation of AI is connected to measures from content
analysis which educational and social researchers are generally more
familiar with (e.g. Berding & Pargmann (2022) <doi:10.30819/5581>,
Gwet (2014) <ISBN:978-0-9708062-8-4>, Krippendorff (2019)
<doi:10.4135/9781071878781>). Estimation of energy consumption and CO2
emissions during model training is done with the 'python' library
'codecarbon'. Finally, all objects created with this package allow to
share trained AI models with other people.

Berding Florian

aifeducation

Artificial Intelligence for Education

Pargmann Julia

Riebenbauer Elisabeth

Rebmann Karin

Slopinski Andreas

bow_pp_create_vocab_draft function

<dl><dt>path_language_model</dt>
<dd><code>string</code> Path to a udpipe language model that
should be used for tagging and lemmatization.</dd>
<dt>data</dt>
<dd><code>vector</code> containing the raw texts.</dd>
<dt>upos</dt>
<dd><code>vector</code> containing the universal part-of-speech tags which
should be used to build the vocabulary.</dd>
<dt>label_language_model</dt>
<dd><code>string</code> Label for the udpipe language model used.</dd>
<dt>language</dt>
<dd><code>string</code> Name of the language (e.g., English, German)</dd>
<dt>chunk_size</dt>
<dd><code>int</code> Number of raw texts which should be processed at once.</dd>
<dt>trace</dt>
<dd><code>bool</code> <code>TRUE</code> if information about the progress should be printed to console.</dd></dl>

Arguments

Function for creating a first draft of a vocabulary
This function creates a list of tokens which refer to specific
universal part-of-speech tags (UPOS) and provides the corresponding lemmas. — bow_pp_create_vocab_draft

<dl>

<dt>path_language_model</dt>
<dd><code>string</code> Path to a udpipe language model that
should be used for tagging and lemmatization.</dd>


<dt>data</dt>
<dd><code>vector</code> containing the raw texts.</dd>


<dt>upos</dt>
<dd><code>vector</code> containing the universal part-of-speech tags which
should be used to build the vocabulary.</dd>


<dt>label_language_model</dt>
<dd><code>string</code> Label for the udpipe language model used.</dd>


<dt>language</dt>
<dd><code>string</code> Name of the language (e.g., English, German)</dd>


<dt>chunk_size</dt>
<dd><code>int</code> Number of raw texts which should be processed at once.</dd>


<dt>trace</dt>
<dd><code>bool</code> <code>TRUE</code> if information about the progress should be printed to console.</dd>

</dl>

Function for creating a first draft of a vocabulary
This function creates a list of tokens which refer to specific
universal part-of-speech tags (UPOS) and provides the corresponding lemmas.

bow_pp_create_vocab_draft: Function for creating a first draft of a vocabulary This function creates a list of tokens which refer to specific universal part-of-speech tags (UPOS) and provides the corresponding lemmas.

Description

Usage

Value

Arguments

See Also