googleAnalyticsR (version 1.1.0)

google_analytics: Get Google Analytics v4 data

Description

Fetch Google Analytics data using the v4 API. For the v3 API use google_analytics_3, for GA4's Data API use ga_data. See website help for lots of examples: Google Analytics Reporting API v4 in R

Usage

google_analytics(
  viewId,
  date_range = NULL,
  metrics = NULL,
  dimensions = NULL,
  dim_filters = NULL,
  met_filters = NULL,
  filtersExpression = NULL,
  order = NULL,
  segments = NULL,
  pivots = NULL,
  cohorts = NULL,
  max = 1000,
  samplingLevel = c("DEFAULT", "SMALL", "LARGE"),
  metricFormat = NULL,
  histogramBuckets = NULL,
  anti_sample = FALSE,
  anti_sample_batches = "auto",
  slow_fetch = FALSE,
  useResourceQuotas = NULL,
  rows_per_call = 10000L
)

google_analytics_4(...)

Value

A Google Analytics data.frame, with attributes showing row totals, sampling etc.

Arguments

viewId

viewId of data to get.

date_range

character or date vector of format c(start, end) or for two date ranges: c(start1,end1,start2,end2)

metrics

Metric(s) to fetch as a character vector. You do not need to supply the "ga:" prefix. See meta for a list of dimensons and metrics the API supports. Also supports your own calculated metrics.

dimensions

Dimension(s) to fetch as a character vector. You do not need to supply the "ga:" prefix. See meta for a list of dimensons and metrics the API supports.

dim_filters

A filter_clause_ga4 wrapping dim_filter

met_filters

A filter_clause_ga4 wrapping met_filter

filtersExpression

A v3 API style simple filter string. Not used with other filters.

order

An order_type object

segments

List of segments as created by segment_ga4

pivots

Pivots of the data as created by pivot_ga4

cohorts

Cohorts created by make_cohort_group

max

Maximum number of rows to fetch. Defaults at 1000. Use -1 to fetch all results. Ignored when anti_sample=TRUE.

samplingLevel

Sample level

metricFormat

If supplying calculated metrics, specify the metric type

histogramBuckets

For numeric dimensions such as hour, a list of buckets of data.

anti_sample

If TRUE will split up the call to avoid sampling.

anti_sample_batches

"auto" default, or set to number of days per batch. 1 = daily.

slow_fetch

For large, complicated API requests this bypasses some API hacks that may result in 500 errors. For smaller queries, leave this as FALSE for quicker data fetching.

useResourceQuotas

If using GA360, access increased sampling limits. Default NULL, set to TRUE or FALSE if you have access to this feature.

rows_per_call

Set how many rows are requested by the API per call, up to a maximum of 100000.

...

Arguments passed to google_analytics

Row requests

By default the API call will use v4 batching that splits requests into 5 separate calls of 10k rows each. This can go up to 100k, so this means up to 500k rows can be fetched per API call, however the API servers will fail with a 500 error if the query is too complicated as the processing time at Google's end gets too long. In this case, you may want to tweak the rows_per_call argument downwards, or fall back to using slow_fetch = FALSE which will send an API request one at a time. If fetching data via scheduled scripts this is recommended as the default.

Anti-sampling

anti_sample being TRUE ignores max as the API call is split over days to mitigate the sampling session limit, in which case a row limit won't work. Take the top rows of the result yourself instead e.g. head(ga_data_unsampled, 50300)

anti_sample being TRUE will also set samplingLevel='LARGE' to minimise the number of calls.

Resource Quotas

If you are on GA360 and have access to resource quotas, set the useResourceQuotas=TRUE and set the Google Cloud client ID to the project that has resource quotas activated, via gar_set_client or options.

Caching

By default local caching is turned on for v4 API requests. This means that making the same request as one this session will read from memory and not make an API call. You can also set the cache to disk via the ga_cache_call function. This can be useful when running RMarkdown reports using data.

Metrics

Metrics support calculated metrics like ga:users / ga:sessions if you supply them in a named vector.

You must supply the correct 'ga:' prefix unlike normal metrics

You can mix calculated and normal metrics like so:

customMetric <- c(sessionPerVisitor = "ga:sessions / ga:visitors", "bounceRate", "entrances")

You can also optionally supply a metricFormat parameter that must be the same length as the metrics. metricFormat can be: METRIC_TYPE_UNSPECIFIED, INTEGER, FLOAT, CURRENCY, PERCENT, TIME

All metrics are currently parsed to as.numeric when in R.

Dimensions

Supply a character vector of dimensions, with or without ga: prefix.

Optionally for numeric dimension types such as ga:hour, ga:browserVersion, ga:sessionsToTransaction, etc. supply histogram buckets suitable for histogram plots.

If non-empty, we place dimension values into buckets after string to int64. Dimension values that are not the string representation of an integral value will be converted to zero. The bucket values have to be in increasing order. Each bucket is closed on the lower end, and open on the upper end. The "first" bucket includes all values less than the first boundary, the "last" bucket includes all values up to infinity. Dimension values that fall in a bucket get transformed to a new dimension value. For example, if one gives a list of "0, 1, 3, 4, 7", then we return the following buckets: -

  • bucket #1: values < 0, dimension value "<0"

  • bucket #2: values in [0,1), dimension value "0"

  • bucket #3: values in [1,3), dimension value "1-2"

  • bucket #4: values in [3,4), dimension value "3"

  • bucket #5: values in [4,7), dimension value "4-6"

  • bucket #6: values >= 7, dimension value "7+"

Examples

Run this code

if (FALSE) {
library(googleAnalyticsR)

## authenticate, or use the RStudio Addin "Google API Auth" with analytics scopes set

ga_auth()

## get your accounts

account_list <- ga_account_list()

## account_list will have a column called "viewId"
account_list$viewId

## View account_list and pick the viewId you want to extract data from
ga_id <- 123456

# examine the meta table to see metrics and dimensions you can query
meta

## simple query to test connection
google_analytics(ga_id, 
                 date_range = c("2017-01-01", "2017-03-01"), 
                 metrics = "sessions", 
                 dimensions = "date")

## change the quotaUser to fetch under
google_analytics(1234567, date_range = c("30daysAgo", "yesterday"), metrics = "sessions")

options("googleAnalyticsR.quotaUser" = "test_user")
google_analytics(1234567, date_range = c("30daysAgo", "yesterday"), metrics = "sessions")

}

Run the code above in your browser using DataLab