Learn R Programming

Ecfun (version 0.1-4)

asNumericDF: Coerce to numeric dropping commas and info after a blank

Description

Delete a leading dollar sign plus commas (thousand separators) and drop information after a blank, then coerce to numeric. For a data.frame, apply to all columns, drop non-numeric columns, and order the rows by the orderBy. Some Excel imports include commas as thousand separators; this replaces any commas with char(0), ''. Similarly, if "%" is found as the last character in any field, drop the percent sign and divide the resulting numeric conversion by 100 to convert to proportion. Also, some character data includes footnote references following the year. Table F-1 from the US Census Bureau needs all three of these features: It needs orderBy, because the most recent year appears first, just the opposite of most other data sets where the most recent year appears last. It has footnote references following a character string indicating the year. And it includes commas as thousand separators.

Usage

asNumericChar(x)
asNumericDF(x, keep=function(x)any(!is.na(x)), orderBy=NA)

Arguments

x
For asNumericChar, this is a character vector to be converted to numeric after gsub(',', '', x). For asNumericDF, this is a data.frame with all character columns to be converted to numeri
keep
something to indicate which columns to keep
orderBy
Which columns to order the rows of x[, keep] by. Default is to keep the input order.

Value

  • all numeric data.frame

Details

1. Replace commas by nothing 2. strsplit on ' ' and take only the first part, thereby eliminating the footnote references. 3. Replace any blanks with NAs 4. as.numeric 5. lapply(x, 1-4) 6. order the rows

See Also

scan gsub Quotes

Examples

Run this code
##
## 1.  simple example 
##
fakeF1 <- data.frame(yr=c('1948', '1947 (1)'),
                     q1=c('1,234', ''), duh=rep(NA, 2), 
                     dol=c('$1,234', ''), 
                     pct=c('1%', '2%'))
nF1 <- asNumericDF(fakeF1)

nF1. <- data.frame(yr=asNumericChar(fakeF1$yr),
                   q1=asNumericChar(fakeF1$q1), 
                   dol=asNumericChar(fakeF1$dol), 
                   pct=c(.01, .02))

nF1c <- data.frame(yr=1948:1947, q1=c(1234, NA), 
                   dol=c(1234, NA), pct=c(.01, .02))

stopifnot(
all.equal(nF1, nF1.)
)
stopifnot(
all.equal(nF1., nF1c)
)

##
## 2.  orderBy=1:2
##
nF. <- asNumericDF(fakeF1, orderBy=1:2)

stopifnot(
all.equal(nF., nF1c[2:1,])
)

Run the code above in your browser using DataLab