wizard_read_data: a simplified function for reading data from files.

Description

A simple interface for reading in data from files and creating a corpus all in one step.

Usage

wizard_read_data(filename, tablename = NULL, filetype = "csv", 
virgin=FALSE, textColumns, codeColumn, trainSize, testSize, ...)

Arguments

filename

Character string of the name of the file, include path if the file is not located in the working directory.

tablename

Microsoft Access database only. The table name in the database.

filetype

Character vector specifying the file type. Options include "csv", "tab", "accdb", "mdb" to denote .csv files, text files, or Access databases.

virgin

A logical (TRUE or FALSE) specifying whether to treat the classification data as virgin data or not. Defaults to FALSE, specifying that classification data is not virgin data.

textColumns

The a cbind() of column(s) to use for training the algorithms (e.g. cbind(data$Title)).

codeColumn

A factor or vector of labels corresponding to each document in the matrix.

trainSize

A range (e.g. 1:1000) specifying the number of documents to use for training the models.

testSize

A range (e.g. 1001:2000) specifying the number of documents to use for classification.

...

Other parameters to be passed on to create_matrix.

Value

A corpus of class matrix_container-class that can be passed into other functions such as train_model, train_models, classify_model, classify_models, wizard_train_classify, and create_analytics.

Examples

Run this code

library(RTextTools)
corpus <- wizard_read_data(system.file("data/NYTimes.csv.gz",package="RTextTools"), 
textColumns=c("Title","Subject"), codeColumn="Topic.Code", trainSize=75, 
testSize=25, virgin=FALSE)

Run the code above in your browser using DataLab