galah_filter
uses non-standard evaluation (NSE),
and is designed to be as compatible as possible with dplyr::filter()
syntax.
All statements passed to galah_filter()
(except the profile
argument) take the form of field - logical - value. Permissible examples include:
=
or ==
(e.g. year = 2020
)
!=
, e.g. year != 2020
)
>
or >=
(e.g. year >= 2020
)
<
or <=
(e.g. year <= 2020
)
OR
statements (e.g. year == 2018 | year == 2020
)
AND
statements (e.g. year >= 2000 & year <= 2020
)
In some cases R
will fail to parse inputs with a single equals sign
(=
), particularly where statements are separated by &
or
|
. This problem can be avoided by using a double-equals (==
) instead.
Notes on behaviour
Separating statements with a comma is equivalent to an AND
statement;
Ergo galah_filter(year >= 2010 & year < 2020)
is the same as
galah_filter(year >= 2010, year < 2020)
.
All statements must include the field name; so
galah_filter(year == 2010 | year == 2021)
works, as does
galah_filter(year == c(2010, 2021))
, but galah_filter(year == 2010 | 2021)
fails.
It is possible to use an object to specify required values, e.g.
year_value <- 2010; galah_filter(year > year_value)
solr
supports range queries on text as well as numbers; so this is valid:
galah_filter(cl22 >= "Tasmania")
It is possible to filter by 'assertions', which are statements about data
validity, e.g. to remove those lacking critical spatial or taxonomic data:
galah_filter(assertions != c("INVALID_SCIENTIFIC_NAME", "COORDINATE_INVALID")
Valid assertions can be found using show_all(assertions)
.