udpipe (version 0.1.1)

udpipe_download_model: Download an UDPipe model provided by the UDPipe community for a specific language of choice

Description

Ready-made models for 50 languages trained on 67 treebanks are provided by UDPipe at https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-2364 in one zip file. You can either download these manually in order to use it for annotation purposes or use udpipe_download_model to download these models for a specific language of choice.

For your convenience, these models are also made available at https://github.com/jwijffels/udpipe.models.ud.2.0 under the CC-BY-NC-SA licence. This function downloads the models from that location, so if you use this function you are complying to that license. If you want to train models for commercial purposes, you can easily do this with udpipe_train

Usage

udpipe_download_model(language = c("ancient_greek-proiel", "ancient_greek",
  "arabic", "basque", "belarusian", "bulgarian", "catalan", "chinese", "coptic",
  "croatian", "czech-cac", "czech-cltt", "czech", "danish", "dutch-lassysmall",
  "dutch", "english-lines", "english-partut", "english", "estonian",
  "finnish-ftb", "finnish", "french-partut", "french-sequoia", "french",
  "galician-treegal", "galician", "german", "gothic", "greek", "hebrew",
  "hindi", "hungarian", "indonesian", "irish", "italian", "japanese", "kazakh",
  "korean", "latin-ittb",      "latin-proiel", "latin", "latvian", "lithuanian",
  "norwegian-bokmaal", "norwegian-nynorsk", "old_church_slavonic", "persian",
  "polish", "portuguese-br", "portuguese", "romanian", "russian-syntagrus",
  "russian", "sanskrit", "slovak", "slovenian-sst", "slovenian",
  "spanish-ancora", "spanish", "swedish-lines", "swedish", "tamil", "turkish",
  "ukrainian", "urdu", "uyghur", "vietnamese"), model_dir = getwd())

Arguments

language

a character stirng with a language. Possible values are: ancient_greek-proiel, ancient_greek, arabic, basque, belarusian, bulgarian, catalan, chinese, coptic, croatian, czech-cac, czech-cltt, czech, danish, dutch-lassysmall, dutch, english-lines, english-partut, english, estonian, finnish-ftb, finnish, french-partut, french-sequoia, french, galician-treegal, galician, german, gothic, greek, hebrew, hindi, hungarian, indonesian, irish, italian, japanese, kazakh, korean, latin-ittb, latin-proiel, latin, latvian, lithuanian, norwegian-bokmaal, norwegian-nynorsk, old_church_slavonic, persian, polish, portuguese-br, portuguese, romanian, russian-syntagrus, russian, sanskrit, slovak, slovenian-sst, slovenian, spanish-ancora, spanish, swedish-lines, swedish, tamil, turkish, ukrainian, urdu, uyghur, vietnamese

model_dir

a path where the model will be downloaded to. Defaults to the current working directory

Value

A data.frame with 1 row and 2 columns:

  • language: The language as provided by the input parameter language

  • file_model: The path to the file on disk where the model was downloaded to

Details

Pre-trained Universal Dependencies 2.0 models on all UD treebanks are made available at https://ufal.mff.cuni.cz/udpipe, namely at https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-2364. At the time of writing this consists of models made available on 50 languages, namely: ancient_greek, arabic, basque, belarusian, bulgarian, catalan, chinese, coptic, croatian, czech, danish, dutch, english, estonian, finnish, french, galician, german, gothic, greek, hebrew, hindi, hungarian, indonesian, irish, italian, japanese, kazakh, korean, latin, latvian, lithuanian, norwegian, old_church_slavonic, persian, polish, portuguese, romanian, russian, sanskrit, slovak, slovenian, spanish, swedish, tamil, turkish, ukrainian, urdu, uyghur, vietnamese. Mark that these models are made available under the CC BY-NC-SA 4.0 license.

These models are also provided in an R package for your convenience at https://github.com/jwijffels/udpipe.models.ud.2.0

References

https://ufal.mff.cuni.cz/udpipe, https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-2364

See Also

udpipe_load_model

Examples

Run this code
# NOT RUN {
x <- udpipe_download_model(language = "sanskrit", model_dir = tempdir())
x
x$file_model
# }
# NOT RUN {
x <- udpipe_download_model(language = "dutch")
x <- udpipe_download_model(language = "dutch-lassysmall")
x <- udpipe_download_model(language = "russian")
x <- udpipe_download_model(language = "french")
x <- udpipe_download_model(language = "english")
x <- udpipe_download_model(language = "german")
x <- udpipe_download_model(language = "spanish")
# }

Run the code above in your browser using DataCamp Workspace