Learn R Programming

GSODR (version 0.1.9)

get_GSOD: Download, Clean, Reformat and Generate New Variables From GSOD Weather Data

Description

This function automates downloading, cleaning, reformatting of data from the Global Surface Summary of the Day (GSOD) data provided by the US National Climatic Data Center (NCDC), https://data.noaa.gov/dataset/global-surface-summary-of-the-day-gsod, and calculates three new variables; Saturation Vapor Pressure (ES) – Actual Vapor Pressure (EA) and relative humidity (RH). Stations are individually checked for number of missing days to assure data quality, stations with too many missing observations are omitted, stations with a latitude of < -90 or > 90 or longitude of < -180 or > 180 are removed. All units are converted to International System of Units (SI), e.g., Fahrenheit to Celsius and inches to millimetres. Alternative elevation measurements are supplied for missing values or values found to be questionable based on the Consulatative Group for International Agricultural Research's Consortium for Spatial Information group's (CGIAR-CSI) Shuttle Radar Topography Mission 90 metre (SRTM 90m) digital elevation data based on NASA's original SRTM 90m data. Further information on these data and methods can be found on GSODR's GitHub repository here: https://github.com/adamhsparks/GSODR/blob/master/data-raw/fetch_isd-history.md

Usage

get_GSOD(years = NULL, station = NULL, country = NULL, path = "", max_missing = 5, agroclimatology = FALSE, shapefile = FALSE, CSV = TRUE, merge_station_years = FALSE)

Arguments

years
Year(s) of weather data to download.
station
Specify single station for which to retrieve, check and clean weather data.
country
Specify a country of interest for which to retrieve weather data; full name. For stations located in locales having an ISO code 2 or 3 letter ISO code can also be used if known. See country_list for a full list of country names and ISO codes available.
path
Path entered by user indicating where to store resulting output file. Defaults to the current working directory.
max_missing
The maximum number of days allowed to be missing from a station's data before it is excluded from final file output. Defaults to five days. If a single station is specified, this option is ignored and any data available, even an empty file,from NCDC will be returned.
agroclimatology
Only clean data for stations between latitudes 60 and -60 for agroclimatology work, defaults to FALSE. Set to FALSE to override and include only stations within the confines of these latitudes.
shapefile
If set to TRUE, create an ESRI shapefile of vector type, points, of the data for use in a GIS. Defaults to FALSE, no shapefile created.
CSV
If set to TRUE, create a comma separated value (CSV) file of data, defaults to TRUE, a CSV file is created.
merge_station_years
If set to TRUE, merge output files into one output file for all years when selecting a single station, defaults to FALSE.

Details

Due to the size of the resulting data, output is saved as a comma-separated, csv, file (default) or ESRI shapefile in a directory specified by the user or defaults to the current working directory. The files summarize each year by station, which includes vapour pressure and relative humidity variables calculated from existing data in GSOD. Optionally, because the file sizes are much smaller, when selecting a single station, all years queried may be merged into one final ouptut file (CSV or shapefile) using the merge_station_years option.

All missing values in resulting files are represented as -9999 regardless of which field they occur in.

Be sure to have disk space free and allocate the proper time for this to run. This is a time, processor and disk input/output/space intensive process. This function was largely based on T. Hengl's "getGSOD.R" script, available from http://spatial-analyst.net/book/system/files/getGSOD.R with enhancements to be cross-platform, faster and more flexible. For more information see the description of the data provided by NCDC, http://www7.ncdc.noaa.gov/CDO/GSOD_DESC.txt.

The CSV or ESRI format shapefile in the respective year-directory will contain the following fields/values:

STNID
Station number (WMO/DATSAV3 number) for the location

WBAN
Number where applicable--this is the historical "Weather Bureau Air Force Navy" number - with WBAN being the acronym

STN.NAME
Unique text string identifier

CTRY
Country

LAT
Latitude

LON
Longitude

ELEV.M
Station reported elevation (metres to tenths)

ELEV.M.SRTM.90m
Corrected elevation data in whole metres for stations derived from Jarvis et al. (2008), extracted from DEM using reported LAT/LON values in metres

YEARMODA
Date in YYYY-MM-DD format

YEAR
The year

MONTH
The month

DAY
The day

YDAY
Sequential day of year (not in original GSOD)

TEMP
Mean daily temperature converted to degrees C to tenths. Missing = -9999

TEMP.CNT
Number of observations used in calculating mean daily temperature

DEWP
Mean daily dew point converted to degrees C to tenths. Missing = -9999

DEWP.CNT
Number of observations used in calculating mean daily dew point

SLP
Mean sea level pressure in millibars to tenths. Missing = -9999

SLP.CNT
Number of observations used in calculating mean sea level pressure

STP
Mean station pressure for the day in millibars to tenths Missing = -9999

STP.CNT
Number of observations used in calculating mean station pressure

VISIB
Mean visibility for the day converted to kilometers to tenths Missing = -9999

VISIB.CNT
Number of observations used in calculating mean daily visibility

WDSP
Mean daily wind speed value converted to metres/second to tenths Missing = -9999

WDSP.CNT
Number of observations used in calculating mean daily windspeed

MXSPD
Maximum sustained wind speed reported for the day converted to metres/second to tenths. Missing = -9999

GUST
Maximum wind gust reported for the day converted to metres/second to tenths. Missing = -9999

MAX
Maximum temperature reported during the day converted to Celsius to tenths--time of maximum temperature report varies by country and region, so this will sometimes not be the maximum for the calendar day. Missing = -9999

MAX.FLAG
Blank indicates maximum temperature was taken from the explicit maximum temperature report and not from the 'hourly' data. " * " indicates maximum temperature was derived from the hourly data (i.e., highest hourly or synoptic-reported temperature)

MIN
Minimum temperature reported during the day converted to Celsius to tenths--time of minimum temperature report varies by country and region, so this will sometimes not be the minimum for the calendar day. Missing = -9999

MIN.FLAG
Blank indicates minimum temperature was taken from the explicit minimum temperature report and not from the 'hourly' data. " * " indicates minimum temperature was derived from the hourly data (i.e., lowest hourly or synoptic-reported temperature)

PRCP
Total precipitation (rain and/or melted snow) reported during the day converted to millimetres to hundredths will usually not end with the midnight observation--i.e., may include latter part of previous day. ".00" indicates no measurable precipitation (includes a trace). Missing = -9999. Note: Many stations do not report '0' on days with no precipitation-- therefore, '-9999' will often appear on these days. For example, a station may only report a 6-hour amount for the period during which rain fell. See PRCP.FLAG column for source of data

PRCP.FLAG
A
= 1 report of 6-hour precipitation amount

B
= Summation of 2 reports of 6-hour precipitation amount

C
= Summation of 3 reports of 6-hour precipitation amount

D
= Summation of 4 reports of 6-hour precipitation amount

E
= 1 report of 12-hour precipitation amount

F
= Summation of 2 reports of 12-hour precipitation amount

G
= 1 report of 24-hour precipitation amount

H
= Station reported '0' as the amount for the day (eg, from 6-hour reports), but also reported at least one occurrence of precipitation in hourly observations--this could indicate a trace occurred, but should be considered as incomplete data for the day

I
= Station did not report any precipitation data for the day and did not report any occurrences of precipitation in its hourly observations. It's still possible that precipitation occurred but was not reported

SNDP
Snow depth in millimetres to tenths. Missing = -9999

I.FOG
Fog, (1 = yes, 0 = no/not reported) for the occurrence during the day

I.RAIN_DZL
Rain or drizzle, (1 = yes, 0 = no/not reported) for the occurrence during the day

I.SNW_ICE
Snow or ice pellets, (1 = yes, 0 = no/not reported) for the occurrence during the day

I.HAIL
Hail, (1 = yes, 0 = no/not reported) for the occurrence during the day

I.THUNDER
Thunder, (1 = yes, 0 = no/not reported) for the occurrence during the #' day

I.TDO_FNL
Tornado or funnel cloud, (1 = yes, 0 = no/not reported) for the occurrence during the day

Values calculated by this package and included in final output:

ea
Mean daily actual vapour pressure

es
Mean daily saturation vapour pressure

RH
Mean daily relative humidity

References

Jarvis, A, HI Reuter, A Nelson, E Guevara, 2008, Hole-filled SRTM for the globe Version 4, available from the CGIAR-CSI SRTM 90m Database http://srtm.csi.cgiar.org

Examples

Run this code
## Not run: 
# # Download weather station for Toowoomba, Queensland for 2010, save resulting
# # file, GSOD-955510-99999-2010.csv, in the user's home directory.
# 
# get_GSOD(years = 2010, station = "955510-99999", path = "~/")
# 
# # Download data for Philippines for year 2010 and generate a yearly
# # summary file, GSOD-PHL-2010.csv, file in the user's home directory with a
# # maximum of five missing days per station allowed.
# 
# get_GSOD(years = 2010, country = "Philippines", path = "~/")
# 
# # Download global GSOD data for agroclimatology work for years 2009 and 2010
# # and generate yearly summary files, GSOD-agroclimatology-2010.csv and
# # GSOD-agroclimatology-2011.csv in the user's home directory with a maximum
# # of five missing days per weather station allowed.
# 
# get_GSOD(years = 2010:2011, path = "~/", agroclimatology = TRUE)
# ## End(Not run)


Run the code above in your browser using DataLab