Learn R Programming

moranajp (version 0.9.7)

moranajp_all: Morphological analysis for a specific column in dataframe

Description

Using 'MeCab' for morphological analysis. Keep other colnames in dataframe.

Usage

moranajp_all(
  tbl,
  bin_dir = "",
  method = "mecab",
  text_col = "text",
  option = "",
  iconv = "",
  col_lang = "jp"
)

moranajp(tbl, bin_dir, method, text_col, option = "", iconv = "", col_lang)

remove_linebreaks(tbl, text_col)

separate_cols_ginza(tbl, col_lang)

make_input(tbl, text_col, iconv, brk = "BPMJP ")

make_cmd(method, bin_dir, option = "")

make_cmd_mecab(option = "")

out_cols_mecab(col_lang = "jp")

out_cols_ginza(col_lang = "jp")

out_cols_sudachi(col_lang = "jp")

out_cols_jp()

out_cols_en()

out_cols()

mecab_all(tbl, text_col = "text", bin_dir = "")

mecab(tbl, bin_dir)

Value

A tibble. Output of morphological analysis and added column "text_id".

A string

A string

A string

A character vector

A character vector

A character vector

A character vector

A character vector

A data.frame

Arguments

tbl

A tibble or data.frame.

bin_dir

A text. Directory of mecab.

method

A text. Method to use: "mecab", "ginza", "sudachi_a", "sudachi_b", "sudachi_c", or "chamame". "a", "b" and "c" specify the mode of splitting. "a" split shortest, "b" middle and "c" longest. See https://github.com/WorksApplications/Sudachi for detail. "chamame" use https://chamame.ninjal.ac.jp/ and rvest.

text_col

A text. Colnames for morphological analysis.

option

A text. Options for mecab. "-b" option is already set by moranajp. To see option, use "mecab -h" in command (win) or terminal (Mac).

iconv

A text. Convert encoding of MeCab output. Default (""): don't convert. "CP932_UTF-8": iconv(output, from = "Shift-JIS" to = "UTF-8") "EUC_UTF-8" : iconv(output, from = "eucjp", to = "UTF-8") iconv is also used to convert input text before running MeCab. "CP932_UTF-8": iconv(input, from = "UTF-8", to = "Shift-JIS")

col_lang

A text. "jp" or "en"

brk

A string of break point

Examples

Run this code
# \donttest{
  # sample data of Japanese sentences
  data(neko)
  neko <-
      neko |>
      unescape_utf()
  # chamame
  neko |>
    moranajp_all(method = "chamame") |>
        print(n=100)
# }
if (FALSE) {
  # Need to install 'mecab', 'ginza', or 'sudachi' in local PC

  # mecab
  bin_dir <- "d:/pf/mecab/bin"
  iconv <- "CP932_UTF-8"
  neko |>
    moranajp_all(text_col = "text", bin_dir = bin_dir, iconv = iconv) |>
        print(n=100)

  # ginza
  neko |>
    moranajp_all(text_col = "text", method = "ginza") |>
      print(n=100)

  # sudachi
  bin_dir <- "d:/pf/sudachi"
  iconv <- "CP932_UTF-8"
  neko |>
    moranajp_all(text_col = "text", bin_dir = bin_dir,
                 method = "sudachi_a", iconv = iconv) |>
        print(n=100)
}

Run the code above in your browser using DataLab