Learn R Programming

⚠️There's a newer version (3.1.0) of this package.Take me there.

cleanNLP (version 1.10.0)

A Tidy Data Model for Natural Language Processing

Description

Provides a set of fast tools for converting a textual corpus into a set of normalized tables. Users may make use of a Python back end with 'spaCy' or the Java back end 'CoreNLP' . A minimal back end with no external dependencies is also provided. Exposed annotation tasks include tokenization, part of speech tagging, named entity recognition, entity linking, sentiment analysis, dependency parsing, coreference resolution, and word embeddings. Summary statistics regarding token unigram, part of speech tag, and dependency type frequencies are also included to assist with analyses.

Copy Link

Version

Install

install.packages('cleanNLP')

Monthly Downloads

517

Version

1.10.0

License

LGPL-2

Maintainer

Taylor Arnold

Last Published

July 1st, 2017

Functions in cleanNLP (1.10.0)

download_core_nlp

Download java files needed for CoreNLP

extract_documents

Extract documents from an annotation object

get_coreference

Access coreferences from an annotation object

Access dependencies from an annotation object

Access tokens from an annotation object

Access word embedding vector from an annotation object

cleanNLP-package

cleanNLP: A Tidy Data Model for Natural Language Processing

combine_documents

Combine a set of annotations

Access sentence-level annotations

Construct the TF-IDF Matrix from Annotation or Data Frame

read_annotation

Read annotation files from disk

Run the annotation pipeline on a set of documents

Interface for initializing the coreNLP backend

Interface for initializing up the spaCy backend

Universal Part of Speech Code Frequencies

print.annotation

Print a summary of an annotation object

Universal Dependency Frequencies

Reset document ids

init_tokenizers

Interface for initializing the tokenizers backend

Reads a CoNLL-U or CoNLL-X File

One Table Summary of an Annotation Object

Access document meta data from an annotation object

Access named entities from an annotation object

Annotation of Barack Obama's State of the Union Addresses

Compute Principal Components and store as a Data Frame

Returns a CoNLL-U Document

Most frequent English words

write_annotation

Write annotation files to disk