koRpus
.set.lang.support(target, value)
set.lang.support("hyphen", list("xyz"="xyz"))
. However,
this will only work if a) the language support script is a part of the koRpus
package itself,
and b) the hyphen pattern is located in its data
subdirectory.For your custom hyphenation patterns to be found automatically,
provide it as the value in the named list, e.g., set.lang.support("hyphen", list("xyz"=hyph.xyz))
.
This will directly add the patterns to korpus
' environment,
so it will be found when hyphenation is requested for language "xyz"
.
If you would like to provide hyphenation as part of a third party language package,
you must name the object hyph.
, save it to your package's data
subdirectory named hyph.
,
and append package="
to the named list; e.g.,
set.lang.support("hyphen", list("xyz"=c("xyz",
package="koRpus.lang.xyz"))
. Only then koRpus
will look for the pattern object in your package, not its own
data
directory.
As you can see, you will also have to add a global word class and an explaination for each tag. The former is especially important for further steps like frequency analysis.
Please have a look at the existing language support files in the package sources, most of it should be almost self-explaining.
To add full new language support, say for Xyzedish, you basically have to call this function three times with different values, and provide respective hyphenation patterns. If you would like to re-use this language support, you should consider making it a package.
Be it a package or a script, it should contain all three calls to this function. If it succeeds, it will fill an internal environment with the information you have defined.
The function set.language.support()
gets called three times because there's three functions of koRpus that need language support:
All the calls follow the same pattern -- first,
you name one of the three targets explained above, and second, you provide a named list as the value
for the
respective target
function.
set.lang.support("hyphen",
list("xyz"="xyz")
)
Run the code above in your browser using DataLab