read.xls: Read Excel files

Description

Read a Microsoft Excel file into a data frame

Usage

read.xls(xls, sheet=1, verbose=FALSE, pattern, na.strings=c("NA","#DIV/0!"),
         ..., method=c("csv","tsv","tab"), perl="perl")
xls2csv(xls, sheet=1, verbose=FALSE, blank.lines.skip=TRUE, ..., perl="perl")
xls2tab(xls, sheet=1, verbose=FALSE, blank.lines.skip=TRUE, ..., perl="perl")
xls2tsv(xls, sheet=1, verbose=FALSE, blank.lines.skip=TRUE, ..., perl="perl")
xls2sep(xls, sheet=1, verbose=FALSE, blank.lines.skip=TRUE, ...,
        method=c("csv","tsv","tab"), perl="perl")

Arguments

xls

path to the Microsoft Excel file. Supports "http://", "https://", and "ftp://" URLS.

sheet

name or number of the worksheet to read

verbose

logical flag indicating whether details should be printed as the file is processed.

pattern

if specified, them skip all lines before the first containing this string

perl

name of the perl executable to be called.

method

intermediate file format, "csv" for comma-separated and "tab" for tab-separated

na.strings

a character vector of strings which are to be interpreted as 'NA' values. See read.table for details.

blank.lines.skip

logical flag indicating whether blank lines in the orginal file should be ignored.

...

additional arguments to read.table. The defaults for read.csv() are used.

Value

"read.xls" returns a data frame.

"xls2sep" returns a temporary file in the specified format. "xls2csv" and "xls2tab" are simply wrappers for "xls2sep" specifying method as "csv" or "tab", respectively.

Details

This function works translating the named Microsoft Excel file into a temporary .csv or .tab file, using the xls2csv or xls2tab Perl script installed as part of this (gdata) package.

Caution: In the conversion to csv, strings will be quoted. This can be problem if you are trying to use the comment.char option of read.table since the first character of all lines (including comment lines) will be "\"" after conversion.

If you have quotes in your data which confuse the process you may wish to use read.xls(..., quote = ''). This will cause the quotes to be regarded as data and you will have to then handle the quotes yourself after reading the file in.

Caution: If you call "xls2csv" directly, is your responsibility to close and delete the file after using it.

References

http://www.analytics.washington.edu/statcomp/downloads/xls2csv

Examples

Run this code

# NOT RUN {
   # iris.xls is included in the gregmisc package for use as an example
   xlsfile <- file.path(path.package('gdata'),'xls','iris.xls')
   xlsfile

   iris <- read.xls(xlsfile) # defaults to csv format
   iris <- read.xls(xlsfile,method="csv") # specify csv format
   iris <- read.xls(xlsfile,method="tab") # specify tab format

   head(iris)  # look at the top few rows

   
# }
# NOT RUN {

  
# }
# NOT RUN {
   ## Example specifying exact Perl path for default MS-Windows install of
   ## ActiveState perl
   iris <- read.xls(xlsfile, perl="C:/perl/bin/perl.exe")
   
# }
# NOT RUN {
   
# }
# NOT RUN {
   ## Example specifying exact Perl path for Unix systems
   iris <- read.xls(xlsfile, perl="/usr/bin/perl")

   ## finding perl 
   ## (read.xls automatically calls findPerl so this is rarely needed)
   perl <- gdata:::findPerl("perl")
   iris <- read.xls(xlsfile, perl=perl)
   
# }
# NOT RUN {
   
# }
# NOT RUN {
   ## read xls file from net
   nba.url <- "http://mgtclass.mgt.unm.edu/Bose/Excel/Tutorial.05/Cases/NBA.xls"
   nba <- read.xls(nba.url)
   
# }
# NOT RUN {
   
# }
# NOT RUN {
   ## read xls file ignoring all lines prior to first containing State
   crime.url <- "http://www.jrsainfo.org/jabg/state_data2/Tribal_Data00.xls"
   crime <- read.xls(crime.url, pattern = "State")

   ## use of xls2csv - open con, print two lines, close con
   con <- xls2csv(crime.url)
   print(readLines(con, 2)) 
   file.remove(summary(con)$description)
   
# }
# NOT RUN {
   ## Examples demonstrating selection of specific 'sheets'
   ## from the example XLS file 'ExampleExcelFile.xls'
   exampleFile <- file.path(path.package('gdata'),'xls',
                            'ExampleExcelFile.xls')
   exampleFile2007 <- file.path(path.package('gdata'),'xls',
                            'ExampleExcelFile.xlsx')

   ## see the number and names of sheets:
   sheetCount(exampleFile)
   if( 'XLSX' %in% xlsFormats() )  ## if XLSX is supported..
     sheetCount(exampleFile2007)


   ## show names of shets in the file
   sheetNames(exampleFile)
   if( 'XLSX' %in% xlsFormats() )  ## if XLSX is supported..
   sheetNames(exampleFile2007)

   data <- read.xls(exampleFile)          ## default is first worksheet
   data <- read.xls(exampleFile, sheet=2) ## second worksheet by number
   data <- read.xls(exampleFile, sheet="Sheet Second",v=TRUE) ## and by name

   ## load the third worksheet, skipping the first two non-data lines...
   if( 'XLSX' %in% xlsFormats() )  ## if XLSX is supported..
     data <- read.xls(exampleFile2007, sheet="Sheet with initial text", skip=2)

   ## load a file containing data and column names using latin-1
   ## characters
   latinFile <- file.path(path.package('gdata'),'xls','latin-1.xls')
   latin1 <- read.xls(latinFile, fileEncoding="latin1")
   colnames(latin1)


# }

Run the code above in your browser using DataLab