koRpus.set.lang.support(target, value)set.lang.support("hyphen", list("xyz"="xyz")). However,
this will only work if a) the language support script is a part of the koRpus package itself,
and b) the hyphen pattern is located in its data subdirectory.For your custom hyphenation patterns to be found automatically,
provide it as the value in the named list, e.g., set.lang.support("hyphen", list("xyz"=hyph.xyz)).
This will directly add the patterns to korpus' environment,
so it will be found when hyphenation is requested for language "xyz".
If you would like to provide hyphenation as part of a third party language package,
you must name the object hyph., save it to your package's data
subdirectory named hyph.,
and append package=" to the named list; e.g.,
set.lang.support("hyphen", list("xyz"=c("xyz",
package="koRpus.lang.xyz")). Only then koRpus will look for the pattern object in your package, not its own
data directory.
As you can see, you will also have to add a global word class and an explaination for each tag. The former is especially important for further steps like frequency analysis.
Please have a look at the existing language support files in the package sources, most of it should be almost self-explaining.
To add full new language support, say for Xyzedish, you basically have to call this function three times with different values, and provide respective hyphenation patterns. If you would like to re-use this language support, you should consider making it a package.
Be it a package or a script, it should contain all three calls to this function. If it succeeds, it will fill an internal environment with the information you have defined.
The function set.language.support() gets called three times because there's three functions of koRpus that need language support:
All the calls follow the same pattern -- first,
you name one of the three targets explained above, and second, you provide a named list as the value for the
respective target function.
set.lang.support("hyphen",
list("xyz"="xyz")
)Run the code above in your browser using DataLab