Learn R Programming

deepMOU (version 0.1.1)

CNAE2: CNAE dataset on classes 4 and 9

Description

The data set CNAE2 is a subset of the original CNAE-9 data, that comprises 1080 documents categorized into 9 topics of free text business descriptions of Brazilian companies.

Specifically, CNAE2 contains only the documents belonging to topics "4" and "9". The data set is already pre-processed and provides the bag-of-words representation of the documents; the columns with null counts are removed leading to a matrix with 240 documents on a vocabulary with cardinality 357. This data set is highly sparse (98

Class labels are stored in cl_CNAE

Usage

data(CNAE2)

Arguments

Format

A matrix for the bag-of-words representation of the CNAE2 dataset.

Examples

Run this code
# NOT RUN {
x = data(CNAE2)
print(head(x))
# }

Run the code above in your browser using DataLab