Learn R Programming

⚠️There's a newer version (3.1.0) of this package.Take me there.

cleanNLP (version 2.3.0)

A Tidy Data Model for Natural Language Processing

Description

Provides a set of fast tools for converting a textual corpus into a set of normalized tables. Users may make use of the 'udpipe' back end with no external dependencies, a Python back end with 'spaCy' or the Java back end 'CoreNLP' . Exposed annotation tasks include tokenization, part of speech tagging, named entity recognition, entity linking, sentiment analysis, dependency parsing, coreference resolution, and word embeddings. Summary statistics regarding token unigram, part of speech tag, and dependency type frequencies are also included to assist with analyses.

Copy Link

Version

Install

install.packages('cleanNLP')

Monthly Downloads

518

Version

2.3.0

License

LGPL-2

Maintainer

Last Published

November 18th, 2018

Functions in cleanNLP (2.3.0)

cnlp_annotate

Run the annotation pipeline on a set of documents
cnlp_get_vector

Access word embedding vector from an annotation object
cnlp_init_corenlp

Interface for initializing the corenlp backend
cnlp_write_csv

Write annotation files to disk
cnlp_write_conll

Returns a CoNLL-U Document
cnlp_init_udpipe

Interface for initializing the udpipe backend
cnlp_quick

Quickly Compute Data Frame of Annotations
cnlp_utils_pca

Compute Principal Components and store as a Data Frame
cnlp_init_spacy

Interface for initializing the spacy backend
cnlp_utils_tfidf

Construct the TF-IDF Matrix from Annotation or Data Frame
cnlp_read_conll

Reads a CoNLL-U or CoNLL-X File
cnlp_init_tokenizers

Interface for initializing the tokenizers backend
cnlp_read_csv

Read annotation files from disk
pos_frequency

Universal Part of Speech Code Frequencies
cnlp_get_sentence

Access sentence-level annotations
print.annotation

Print a summary of an annotation object
renamed

Renamed functions
cnlp_get_token

Access tokens from an annotation object
un

Universal Declaration of Human Rights
dep_frequency

Universal Dependency Frequencies
obama

Annotation of Barack Obama's State of the Union Addresses
word_frequency

Most frequent English words
cleanNLP-package

cleanNLP: A Tidy Data Model for Natural Language Processing
cnlp_download_udpipe

Download model files needed for udpipe
cnlp_extract_documents

Extract documents from an annotation object
cnlp_get_dependency

Access dependencies from an annotation object
cnlp_get_coreference

Access coreferences from an annotation object
cnlp_get_document

Access document meta data from an annotation object
cnlp_get_entity

Access named entities from an annotation object
cnlp_combine_documents

Combine a set of annotations
cnlp_download_corenlp

Download java files needed for CoreNLP