Last chance! 50% off unlimited learning
Sale ends in
It calculates source-to-target and target-to-source alignments using IBM Model 1, as well as symmetric word alignment models such as intersection, union, or grow-diag.
Symmetrization(file_train1, file_train2,
method = c('union', 'intersection', 'grow-diag'),
nrec = -1, encode.sorc = 'unknown', encode.trgt = 'unknown',
iter = 4, minlen = 5, maxlen = 40, removePt = TRUE,
all = FALSE, f1 = 'fa', e1 = 'en')
# S3 method for symmet
print(x, ...)
the name of source language file in training set.
the name of target language file in training set.
character string specifying the symmetric word alignment method (union, intersection, or grow-diag alignment).
the number of sentences to be read.If -1, it considers all sentences.
encoding to be assumed for the source language. If the value is "latin1" or "UTF-8" it is used to mark character strings as known to be in Latin-1 or UTF-8. For more details please see scan
function.
encoding to be assumed for the target language. If the value is "latin1" or "UTF-8" it is used to mark character strings as known to be in Latin-1 or UTF-8. For more details please see scan
function.
the number of iterations for IBM Model 1.
a minimum length of sentences.
a maximum length of sentences.
logical. If TRUE
, it removes all punctuation marks.
logical. If TRUE
, it considers the third argument (lower = TRUE
) in culf
function.
it is a notation for the source language (default = 'fa'
).
it is a notation for the target language (default = 'en'
).
an object of class 'symmet'
.
further arguments passed to or from other methods.
Symmetrization
returns an object of class 'symmet'
.
An object of class 'symmet'
is a list containing the following components:
A number. (in second/minute/hour)
symmetric word alignment method (union, intersection, or grow-diag alignment).
A list of alignment for each sentence pair .
a vector of source sentences.
Here, word alignment is not only a map of the target language to the source language and it is considered as a symmetric alignment such as union, or intersection, or grow-diag alignment.
Koehn P. (2010), "Statistical Machine Translation.", Cambridge University, New York.
# NOT RUN {
# Since the extraction of bg-en.tgz in Europarl corpus is time consuming,
# so the aforementioned unzip files have been temporarily exported to
# http://www.um.ac.ir/~sarmad/... .
# }
# NOT RUN {
S1 = Symmetrization ('http://www.um.ac.ir/~sarmad/word.a/euro.bg',
'http://www.um.ac.ir/~sarmad/word.a/euro.en',
nrec = 200, encode.sorc = 'UTF-8')
S2 = Symmetrization ('http://www.um.ac.ir/~sarmad/word.a/euro.bg',
'http://www.um.ac.ir/~sarmad/word.a/euro.en',
nrec = 200, encode.sorc = 'UTF-8', method = 'grow-diag')
# }
Run the code above in your browser using DataLab