Ready-made models for 52 languages trained on 69 treebanks are provided to you.
Some or these models were provided by the UDPipe community. Other models were build using this R package.
You can either download these models manually in order to use it for annotation purposes
or use udpipe_download_model
to download these models for a specific language of choice.
The models provided by the UDPipe community are made available for your convenience at https://github.com/jwijffels/udpipe.models.ud.2.0 under the CC-BY-NC-SA licence. This function downloads the models by default from that location, so if you use this function you are complying to that license.
If you are working in a commercial setting, you can also choose to download models from https://github.com/bnosac/udpipe.models.ud. That repository contains models build with this R package on open data which allows for commercial usage. The license of these models is mostly CC-BY-SA. Visit that github repository for details on the licenses of the language of your choice. And contact www.bnosac.be if you need support on these models or require models tuned to your needs.
If you need to train models yourself for commercial purposes or if you want to improve models,
you can easily do this with udpipe_train
which is explained in detail in the package vignette.
udpipe_download_model(language = c("afrikaans", "ancient_greek-proiel",
"ancient_greek", "arabic", "basque", "belarusian", "bulgarian", "catalan",
"chinese", "coptic", "croatian", "czech-cac", "czech-cltt", "czech", "danish",
"dutch-lassysmall", "dutch", "english-lines", "english-partut", "english",
"estonian", "finnish-ftb", "finnish", "french-partut", "french-sequoia",
"french", "galician-treegal", "galician", "german", "gothic", "greek",
"hebrew", "hindi", "hungarian", "indonesian", "irish", "italian", "japanese",
"kazakh", "korean", "latin-ittb", "latin-proiel", "latin", "latvian",
"lithuanian", "norwegian-bokmaal", "norwegian-nynorsk", "old_church_slavonic",
"persian", "polish", "portuguese-br", "portuguese", "romanian",
"russian-syntagrus", "russian", "sanskrit", "serbian", "slovak",
"slovenian-sst", "slovenian", "spanish-ancora", "spanish", "swedish-lines",
"swedish", "tamil", "turkish", "ukrainian", "urdu", "uyghur", "vietnamese"),
model_dir = getwd(),
udpipe_model_repo = c("jwijffels/udpipe.models.ud.2.0",
"bnosac/udpipe.models.ud"))
a character string with a language.
Possible values are:
afrikaans, ancient_greek-proiel, ancient_greek, arabic, basque, belarusian, bulgarian, catalan, chinese, coptic, croatian,
czech-cac, czech-cltt, czech, danish, dutch-lassysmall, dutch, english-lines, english-partut, english, estonian,
finnish-ftb, finnish, french-partut, french-sequoia, french, galician-treegal, galician, german, gothic,
greek, hebrew, hindi, hungarian, indonesian, irish, italian, japanese, kazakh, korean, latin-ittb, latin-proiel,
latin, latvian, lithuanian, norwegian-bokmaal, norwegian-nynorsk,
old_church_slavonic, persian, polish, portuguese-br,
portuguese, romanian, russian-syntagrus, russian, sanskrit, serbian, slovak, slovenian-sst, slovenian, spanish-ancora, spanish,
swedish-lines, swedish, tamil, turkish, ukrainian,
urdu, uyghur, vietnamese.
The models are downloaded from the location specified in argument udpipe_model_repo
. Namely:
udpipe_model_repo
'jwijffels/udpipe.models.ud.2.0' contains models for all above enumerated languages except afrikaans and serbian
udpipe_model_repo
'bnosac/udpipe.models.ud' contains models for the following languages: afrikaans, croatian, czech-cac, dutch, english, finnish, french-sequoia, irish, norwegian-bokmaal, persian, polish, portuguese, romanian, serbian, slovak, spanish-ancora, swedish
a path where the model will be downloaded to. Defaults to the current working directory
location where the models will be downloaded from. Either 'jwijffels/udpipe.models.ud.2.0' or 'bnosac/udpipe.models.ud'. Defaults to 'jwijffels/udpipe.models.ud.2.0'.
'jwijffels/udpipe.models.ud.2.0' contains models released under the CC-BY-NC-SA license
'bnosac/udpipe.models.ud' contains models mainly released under the CC-BY-SA license
Visit https://github.com/jwijffels/udpipe.models.ud.2.0 and https://github.com/bnosac/udpipe.models.ud for further details.
A data.frame with 1 row and 3 columns:
language: The language as provided by the input parameter language
file_model: The path to the file on disk where the model was downloaded to
url: The URL where the model was downloaded from
Pre-trained Universal Dependencies 2.0 models on all UD treebanks are made available at https://ufal.mff.cuni.cz/udpipe, namely at https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-2364. At the time of writing this consists of models made available on 50 languages, namely: ancient_greek, arabic, basque, belarusian, bulgarian, catalan, chinese, coptic, croatian, czech, danish, dutch, english, estonian, finnish, french, galician, german, gothic, greek, hebrew, hindi, hungarian, indonesian, irish, italian, japanese, kazakh, korean, latin, latvian, lithuanian, norwegian, old_church_slavonic, persian, polish, portuguese, romanian, russian, sanskrit, slovak, slovenian, spanish, swedish, tamil, turkish, ukrainian, urdu, uyghur, vietnamese. Mark that these models are made available under the CC BY-NC-SA 4.0 license. These models are also provided in an R package for your convenience at https://github.com/jwijffels/udpipe.models.ud.2.0
Pre-trained Universal Dependencies 2.1 models on UD treebanks which allow for commercial usage (mainly by using data which is released under the CC-BY-SA license, but also some are released under the GPL-3 and LGPL-LR license) are made available at https://github.com/bnosac/udpipe.models.ud. At the time of writing this consists of models made available on 17 languages, namely: afrikaans, croatian, czech-cac, dutch, english, finnish, french-sequoia, irish, norwegian-bokmaal, persian, polish, portuguese, romanian, serbian, slovak, spanish-ancora, swedish. Visit that repository for more details on the license of these.
https://ufal.mff.cuni.cz/udpipe, https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-2364, https://github.com/jwijffels/udpipe.models.ud.2.0, https://github.com/bnosac/udpipe.models.ud
# NOT RUN {
x <- udpipe_download_model(language = "sanskrit", model_dir = tempdir())
x
x$file_model
# }
# NOT RUN {
x <- udpipe_download_model(language = "dutch")
x <- udpipe_download_model(language = "dutch-lassysmall")
x <- udpipe_download_model(language = "russian")
x <- udpipe_download_model(language = "french")
x <- udpipe_download_model(language = "english")
x <- udpipe_download_model(language = "german")
x <- udpipe_download_model(language = "spanish")
x <- udpipe_download_model(language = "english", udpipe_model_repo = "bnosac/udpipe.models.ud")
x <- udpipe_download_model(language = "dutch", udpipe_model_repo = "bnosac/udpipe.models.ud")
x <- udpipe_download_model(language = "afrikaans", udpipe_model_repo = "bnosac/udpipe.models.ud")
x <- udpipe_download_model(language = "spanish-ancora",
udpipe_model_repo = "bnosac/udpipe.models.ud")
# }
Run the code above in your browser using DataLab