Learn R Programming

⚠️There's a newer version (3.1.0) of this package.Take me there.

cleanNLP (version 1.10.0)

A Tidy Data Model for Natural Language Processing

Description

Provides a set of fast tools for converting a textual corpus into a set of normalized tables. Users may make use of a Python back end with 'spaCy' or the Java back end 'CoreNLP' . A minimal back end with no external dependencies is also provided. Exposed annotation tasks include tokenization, part of speech tagging, named entity recognition, entity linking, sentiment analysis, dependency parsing, coreference resolution, and word embeddings. Summary statistics regarding token unigram, part of speech tag, and dependency type frequencies are also included to assist with analyses.

Copy Link

Version

Install

install.packages('cleanNLP')

Monthly Downloads

452

Version

1.10.0

License

LGPL-2

Maintainer

Taylor Arnold

Last Published

July 1st, 2017

Functions in cleanNLP (1.10.0)

download_core_nlp

Download java files needed for CoreNLP
extract_documents

Extract documents from an annotation object
get_coreference

Access coreferences from an annotation object
get_dependency

Access dependencies from an annotation object
get_token

Access tokens from an annotation object
get_vector

Access word embedding vector from an annotation object
cleanNLP-package

cleanNLP: A Tidy Data Model for Natural Language Processing
combine_documents

Combine a set of annotations
get_sentence

Access sentence-level annotations
get_tfidf

Construct the TF-IDF Matrix from Annotation or Data Frame
read_annotation

Read annotation files from disk
run_annotators

Run the annotation pipeline on a set of documents
init_coreNLP

Interface for initializing the coreNLP backend
init_spaCy

Interface for initializing up the spaCy backend
pos_frequency

Universal Part of Speech Code Frequencies
print.annotation

Print a summary of an annotation object
dep_frequency

Universal Dependency Frequencies
doc_id_reset

Reset document ids
init_tokenizers

Interface for initializing the tokenizers backend
from_CoNLL

Reads a CoNLL-U or CoNLL-X File
get_combine

One Table Summary of an Annotation Object
get_document

Access document meta data from an annotation object
get_entity

Access named entities from an annotation object
obama

Annotation of Barack Obama's State of the Union Addresses
tidy_pca

Compute Principal Components and store as a Data Frame
to_CoNNL

Returns a CoNLL-U Document
word_frequency

Most frequent English words
write_annotation

Write annotation files to disk