Learn R Programming

occupationMeasurement (version 0.3.2)

preprocess_string: Preprocess a string, removing special characters and handling abbreviations.

Description

Replace some common characters / character sequences (e.g., Ä, Ü, "DIPL.-ING.") with their uppercase equivalents and removes punctuation, empty spaces and the word "Diplom".

Usage

preprocess_string(verbatim, lang = "de")

Value

The same character vector after processing

Arguments

verbatim

The character vector to process.

lang

The language the text is in. Currently only German is supported. Defaults to "de" (German).

Details

charToRaw() helps to find UTF-8 characters.

Examples

Run this code
data.table::setDTthreads(1)

if (FALSE) {
preprocess_string(c(
  "Verkauf von B\u00fcchern, Schreibwaren",
  "Fach\u00e4rztin f\u00fcr Kinder- und Jugendmedizin im \u00f6ffentlichen Gesundheitswesen",
  "Industriemechaniker",
  "Dipl.-Ing. - Agrarwirtschaft (Landwirtschaft)"
))
}

Run the code above in your browser using DataLab