fillNASVD: Function to Use SVD to Impute the Missing Values for Training Dataset

Description

Singular value decomposition (SVG) is used to impute the missing values for the training dataset. For each monitoring location, the time series of multivariate data is leveraged to impute the missing values using SVD.

Usage

fillNASVD(dset, cols, idF, dateF)

Arguments

dset

The dataframe having many missing values. Data format: dataframe

cols

A character vector to contain the column names (including the columns with missing values) used to impute the missing valeus

idF

Unique location identification

dateF

Date column name if any

Value

A dataframe base on the input dset, but with filled values.

Examples

Run this code

# NOT RUN {
# Use the covariates for PM2.5 data as a example:

data("trainsample")
cols=c("ndvi","aod","wnd_avg","monthAv")
n=nrow(trainsample)
p=0.05
pn=as.integer(p*n)
trainsample2missed=trainsample
for(col in cols){
  index=sample(n,pn)
  trainsample2missed[index,col]=NA
}
trainsample2filled=fillNASVD(trainsample2missed,cols,"siteid","date")

#Examine the accuracy:
for(col in cols){
  index=which(is.na(trainsample2missed[,col]))
  obs=trainsample[index,col]
  missed=trainsample2missed[index,]
  sindex=match(interaction(missed$siteid,missed$date),
               interaction(trainsample2filled$siteid,trainsample2filled$date))
  pre=trainsample2filled[sindex,col]
  print(paste(col," missing value correlation: ",round(cor(obs,pre),2)))
  print(paste(col," missing value cv rmse: ",round(rmse(obs,pre),2)))
}

# }

Run the code above in your browser using DataLab