Ready-made models for 50 languages trained on 67 treebanks are provided by UDPipe at https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-2364
in one zip file. You can either download these manually in order to use it for annotation purposes
or use udpipe_download_model
to download these models for a specific language of choice.
For your convenience, these models are also made available at https://github.com/jwijffels/udpipe.models.ud.2.0 under the CC-BY-NC-SA
licence. This function downloads the models from that location, so if you use this function you are complying to that license.
If you want to train models for commercial purposes, you can easily do this with udpipe_train
udpipe_download_model(language = c("ancient_greek-proiel", "ancient_greek",
"arabic", "basque", "belarusian", "bulgarian", "catalan", "chinese", "coptic",
"croatian", "czech-cac", "czech-cltt", "czech", "danish", "dutch-lassysmall",
"dutch", "english-lines", "english-partut", "english", "estonian",
"finnish-ftb", "finnish", "french-partut", "french-sequoia", "french",
"galician-treegal", "galician", "german", "gothic", "greek", "hebrew",
"hindi", "hungarian", "indonesian", "irish", "italian", "japanese", "kazakh",
"korean", "latin-ittb", "latin-proiel", "latin", "latvian", "lithuanian",
"norwegian-bokmaal", "norwegian-nynorsk", "old_church_slavonic", "persian",
"polish", "portuguese-br", "portuguese", "romanian", "russian-syntagrus",
"russian", "sanskrit", "slovak", "slovenian-sst", "slovenian",
"spanish-ancora", "spanish", "swedish-lines", "swedish", "tamil", "turkish",
"ukrainian", "urdu", "uyghur", "vietnamese"), model_dir = getwd())
a character stirng with a language. Possible values are: ancient_greek-proiel, ancient_greek, arabic, basque, belarusian, bulgarian, catalan, chinese, coptic, croatian, czech-cac, czech-cltt, czech, danish, dutch-lassysmall, dutch, english-lines, english-partut, english, estonian, finnish-ftb, finnish, french-partut, french-sequoia, french, galician-treegal, galician, german, gothic, greek, hebrew, hindi, hungarian, indonesian, irish, italian, japanese, kazakh, korean, latin-ittb, latin-proiel, latin, latvian, lithuanian, norwegian-bokmaal, norwegian-nynorsk, old_church_slavonic, persian, polish, portuguese-br, portuguese, romanian, russian-syntagrus, russian, sanskrit, slovak, slovenian-sst, slovenian, spanish-ancora, spanish, swedish-lines, swedish, tamil, turkish, ukrainian, urdu, uyghur, vietnamese
a path where the model will be downloaded to. Defaults to the current working directory
A data.frame with 1 row and 2 columns:
language: The language as provided by the input parameter language
file_model: The path to the file on disk where the model was downloaded to
Pre-trained Universal Dependencies 2.0 models on all UD treebanks are made available at https://ufal.mff.cuni.cz/udpipe, namely at https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-2364. At the time of writing this consists of models made available on 50 languages, namely: ancient_greek, arabic, basque, belarusian, bulgarian, catalan, chinese, coptic, croatian, czech, danish, dutch, english, estonian, finnish, french, galician, german, gothic, greek, hebrew, hindi, hungarian, indonesian, irish, italian, japanese, kazakh, korean, latin, latvian, lithuanian, norwegian, old_church_slavonic, persian, polish, portuguese, romanian, russian, sanskrit, slovak, slovenian, spanish, swedish, tamil, turkish, ukrainian, urdu, uyghur, vietnamese. Mark that these models are made available under the CC BY-NC-SA 4.0 license.
These models are also provided in an R package for your convenience at https://github.com/jwijffels/udpipe.models.ud.2.0
https://ufal.mff.cuni.cz/udpipe, https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-2364
# NOT RUN {
x <- udpipe_download_model(language = "sanskrit", model_dir = tempdir())
x
x$file_model
# }
# NOT RUN {
x <- udpipe_download_model(language = "dutch")
x <- udpipe_download_model(language = "dutch-lassysmall")
x <- udpipe_download_model(language = "russian")
x <- udpipe_download_model(language = "french")
x <- udpipe_download_model(language = "english")
x <- udpipe_download_model(language = "german")
x <- udpipe_download_model(language = "spanish")
# }
Run the code above in your browser using DataCamp Workspace