observation: Get observation table

Description

Download data from the observation ("observacao") table of one or more datasets contained in the Free Brazilian Repository for Open Soil Data -- febr, http://www.ufsm.br/febr. This includes spatial coordinates, observation date, and variables such as geology, land use and vegetation, local topography, and much more. Use header if you want to check what are the variables contained in the observation table of a dataset before downloading it.

Usage

observation(dataset, variable, stack = FALSE, missing = list(coord =
  "keep", time = "keep", data = "keep"), standardization = list(crs =
  NULL, time.format = NULL, units = FALSE, round = FALSE),
  harmonization = list(harmonize = FALSE, level = 2), progress = TRUE,
  verbose = TRUE)

Arguments

dataset

Character vector indicating one or more datasets. Identification codes should be as recorded in http://www.ufsm.br/febr/catalog/. Use dataset = "all" to download all datasets.

variable

(optional) Character vector indicating one or more variables. Accepts only general identification codes, e.g. "ferro" and "carbono". If missing, then a set of standard identification variables is downloaded. Use variable = "all" to download all variables. See ‘Details’ for more information.

stack

(optional) Logical value indicating if tables from different datasets should be stacked on a single table for output. Requires standardization = list(units = TRUE) -- see below. Defaults to stack = FALSE, the output being a list of tables.

missing

(optional) List with named sub-arguments indicating what should be done with an observation missing spatial coordinates, coord, date of observation, time, or data on variables, data? Options are "keep" (default) and "drop".

standardization

(optional) List with named sub-arguments indicating how to perform data standardization.

crs Character string indicating the EPSG code of the coordinate reference system (CRS) to which spatial coordinates should be transformed. For example, crs = "EPSG:4674", i.e. SIRGAS 2000, the standard CRS for Brazil -- see more at http://spatialreference.org/ref/epsg/. Defaults to crs = NULL, i.e. no transformation is performed.
time.format Character string indicating how to format dates. For example, time.format = "%d-%m-%Y", i.e. dd-mm-yyyy such as in 31-12-2001. Defaults to time.format = NULL, i.e. no formatting is performed. See as.Date for more details.
units Logical value indicating if the measurement units of the continuous variable(s) should be converted to the standard measurement unit(s). Defaults to units = FALSE, i.e. no conversion is performed. See standard for more information.
round Logical value indicating if the values of the continuous variable(s) should be rounded to the standard number of decimal places. Requires units = TRUE. Defaults to round = FALSE, i.e. no rounding is performed. See standard for more information.

harmonization

(optional) List with named sub-arguments indicating if and how to perform data harmonization.

harmonize Logical value indicating if data should be harmonized? Defaults to harmonize = FALSE, i.e. no harmonization is performed.
level Integer value indicating the number of levels of the identification code of the variable(s) that should be considered for harmonization. Defaults to level = 2. See ‘Details’ for more information.

progress

(optional) Logical value indicating if a download progress bar should be displayed.

verbose

(optional) Logical value indicating if informative messages should be displayed. Generally useful to identify datasets with inconsistent data. Please report to febr-forum@googlegroups.com if you find any issue.

Value

A list of data frames or a data frame with data on the chosen variable(s) of the chosen dataset(s).

Details

Standard identification variables

Standard identification variables and their content are as follows:

dataset_id. Identification code of the dataset in febr to which an observation belongs.
observacao_id. Identification code of an observation in febr.
sisb_id. Identification code of an observation in the Brazilian Soil Information System maintained by the Brazilian Agricultural Research Corporation (EMBRAPA) at https://www.bdsolos.cnptia.embrapa.br/consulta_publica.html.
ibge_id. Identification code of an observation in the database of the Brazilian Institute of Geography and Statistics (IBGE) at http://www.downloads.ibge.gov.br/downloads_geociencias.htm#.
observacao_data. Date (dd-mm-yyyy) in which an observation was made.
coord_sistema. EPSG code of the coordinate reference system.
coord_x. Longitude (<U+00B0>) or easting (m).
coord_y. Latitude (<U+00B0>) or northing (m).
coord_precisao. Precision with which x- and y-coordinates were determined (m).
coord_fonte. Source of the x- and y-coordinates.
pais_id. Country code (ISO 3166-1 alpha-2).
estado_id. Code of the Brazilian federative unit where an observation was made.
municipio_id. Name of the Brazilian county where as observation was made.
amostra_tipo. Type of sample taken.
amostra_quanti. Number of samples taken.
amostra_area. Sampling area.

Further details about the content of the standard identification variables can be found in http://www.ufsm.br/febr/book/ (in Portuguese).

Harmonization

Data harmonization consists of converting the values of a variable determined using some method B so that they are (approximately) equivalent to the values that would have been obtained if the standard method A had been used instead. For example, converting carbon content values obtained using a wet digestion method to the standard dry combustion method is data harmonization.

A heuristic data harmonization procedure is implemented in the febr package. It consists of grouping variables based on a chosen number of levels of their identification code. For example, consider a variable with an identification code composed of four levels, aaa_bbb_ccc_ddd, where aaa is the first level and ddd is the fourth level. Now consider a related variable, aaa_bbb_eee_fff. If the harmonization is to consider all four coding levels (level = 4), then these two variables will remain coded as separate variables. But if level = 2, then both variables will be re-coded as aaa_bbb, thus becoming the same variable.

Examples

Run this code

# NOT RUN {
res <- observation(dataset = "ctb0013", variable = "taxon")
str(res)
# }

Run the code above in your browser using DataLab