udpipe (version 0.3)

udpipe_download_model: Download an UDPipe model provided by the UDPipe community for a specific language of choice

Description

Ready-made models for 52 languages trained on 69 treebanks are provided to you. Some or these models were provided by the UDPipe community. Other models were build using this R package. You can either download these models manually in order to use it for annotation purposes or use udpipe_download_model to download these models for a specific language of choice.

The models provided by the UDPipe community are made available for your convenience at https://github.com/jwijffels/udpipe.models.ud.2.0 under the CC-BY-NC-SA licence. This function downloads the models by default from that location, so if you use this function you are complying to that license.

If you are working in a commercial setting, you can also choose to download models from https://github.com/bnosac/udpipe.models.ud. That repository contains models build with this R package on open data which allows for commercial usage. The license of these models is mostly CC-BY-SA. Visit that github repository for details on the licenses of the language of your choice. And contact www.bnosac.be if you need support on these models or require models tuned to your needs.

If you need to train models yourself for commercial purposes or if you want to improve models, you can easily do this with udpipe_train which is explained in detail in the package vignette.

Usage

udpipe_download_model(language = c("afrikaans", "ancient_greek-proiel",
  "ancient_greek", "arabic", "basque", "belarusian", "bulgarian", "catalan",
  "chinese", "coptic", "croatian", "czech-cac", "czech-cltt", "czech", "danish",
  "dutch-lassysmall", "dutch", "english-lines", "english-partut", "english",
  "estonian", "finnish-ftb", "finnish", "french-partut", "french-sequoia",
  "french", "galician-treegal", "galician", "german", "gothic", "greek",
  "hebrew", "hindi", "hungarian", "indonesian", "irish", "italian", "japanese",
  "kazakh", "korean",      "latin-ittb", "latin-proiel", "latin", "latvian",
  "lithuanian", "norwegian-bokmaal", "norwegian-nynorsk", "old_church_slavonic",
  "persian", "polish", "portuguese-br", "portuguese", "romanian",
  "russian-syntagrus", "russian", "sanskrit", "serbian", "slovak",
  "slovenian-sst", "slovenian", "spanish-ancora", "spanish", "swedish-lines",
  "swedish", "tamil", "turkish", "ukrainian", "urdu", "uyghur", "vietnamese"),
  model_dir = getwd(),
  udpipe_model_repo = c("jwijffels/udpipe.models.ud.2.0",
  "bnosac/udpipe.models.ud"))

Arguments

language

a character string with a language. Possible values are: afrikaans, ancient_greek-proiel, ancient_greek, arabic, basque, belarusian, bulgarian, catalan, chinese, coptic, croatian, czech-cac, czech-cltt, czech, danish, dutch-lassysmall, dutch, english-lines, english-partut, english, estonian, finnish-ftb, finnish, french-partut, french-sequoia, french, galician-treegal, galician, german, gothic, greek, hebrew, hindi, hungarian, indonesian, irish, italian, japanese, kazakh, korean, latin-ittb, latin-proiel, latin, latvian, lithuanian, norwegian-bokmaal, norwegian-nynorsk, old_church_slavonic, persian, polish, portuguese-br, portuguese, romanian, russian-syntagrus, russian, sanskrit, serbian, slovak, slovenian-sst, slovenian, spanish-ancora, spanish, swedish-lines, swedish, tamil, turkish, ukrainian, urdu, uyghur, vietnamese. The models are downloaded from the location specified in argument udpipe_model_repo. Namely:

  • udpipe_model_repo 'jwijffels/udpipe.models.ud.2.0' contains models for all above enumerated languages except afrikaans and serbian

  • udpipe_model_repo 'bnosac/udpipe.models.ud' contains models for the following languages: afrikaans, croatian, czech-cac, dutch, english, finnish, french-sequoia, irish, norwegian-bokmaal, persian, polish, portuguese, romanian, serbian, slovak, spanish-ancora, swedish

model_dir

a path where the model will be downloaded to. Defaults to the current working directory

udpipe_model_repo

location where the models will be downloaded from. Either 'jwijffels/udpipe.models.ud.2.0' or 'bnosac/udpipe.models.ud'. Defaults to 'jwijffels/udpipe.models.ud.2.0'.

  • 'jwijffels/udpipe.models.ud.2.0' contains models released under the CC-BY-NC-SA license

  • 'bnosac/udpipe.models.ud' contains models mainly released under the CC-BY-SA license

Visit https://github.com/jwijffels/udpipe.models.ud.2.0 and https://github.com/bnosac/udpipe.models.ud for further details.

Value

A data.frame with 1 row and 3 columns:

  • language: The language as provided by the input parameter language

  • file_model: The path to the file on disk where the model was downloaded to

  • url: The URL where the model was downloaded from

Details

Pre-trained Universal Dependencies 2.0 models on all UD treebanks are made available at https://ufal.mff.cuni.cz/udpipe, namely at https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-2364. At the time of writing this consists of models made available on 50 languages, namely: ancient_greek, arabic, basque, belarusian, bulgarian, catalan, chinese, coptic, croatian, czech, danish, dutch, english, estonian, finnish, french, galician, german, gothic, greek, hebrew, hindi, hungarian, indonesian, irish, italian, japanese, kazakh, korean, latin, latvian, lithuanian, norwegian, old_church_slavonic, persian, polish, portuguese, romanian, russian, sanskrit, slovak, slovenian, spanish, swedish, tamil, turkish, ukrainian, urdu, uyghur, vietnamese. Mark that these models are made available under the CC BY-NC-SA 4.0 license. These models are also provided in an R package for your convenience at https://github.com/jwijffels/udpipe.models.ud.2.0

Pre-trained Universal Dependencies 2.1 models on UD treebanks which allow for commercial usage (mainly by using data which is released under the CC-BY-SA license, but also some are released under the GPL-3 and LGPL-LR license) are made available at https://github.com/bnosac/udpipe.models.ud. At the time of writing this consists of models made available on 17 languages, namely: afrikaans, croatian, czech-cac, dutch, english, finnish, french-sequoia, irish, norwegian-bokmaal, persian, polish, portuguese, romanian, serbian, slovak, spanish-ancora, swedish. Visit that repository for more details on the license of these.

References

https://ufal.mff.cuni.cz/udpipe, https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-2364, https://github.com/jwijffels/udpipe.models.ud.2.0, https://github.com/bnosac/udpipe.models.ud

See Also

udpipe_load_model

Examples

Run this code
# NOT RUN {
x <- udpipe_download_model(language = "sanskrit", model_dir = tempdir())
x
x$file_model
# }
# NOT RUN {
x <- udpipe_download_model(language = "dutch")
x <- udpipe_download_model(language = "dutch-lassysmall")
x <- udpipe_download_model(language = "russian")
x <- udpipe_download_model(language = "french")
x <- udpipe_download_model(language = "english")
x <- udpipe_download_model(language = "german")
x <- udpipe_download_model(language = "spanish")

x <- udpipe_download_model(language = "english", udpipe_model_repo = "bnosac/udpipe.models.ud")
x <- udpipe_download_model(language = "dutch", udpipe_model_repo = "bnosac/udpipe.models.ud")
x <- udpipe_download_model(language = "afrikaans", udpipe_model_repo = "bnosac/udpipe.models.ud")
x <- udpipe_download_model(language = "spanish-ancora", 
                           udpipe_model_repo = "bnosac/udpipe.models.ud")
# }

Run the code above in your browser using DataLab