Learn R Programming

fetch (version 0.1.5)

catalog: Create a data source catalog

Description

The catalog function returns a data catalog for a data source. A data catalog is like a collection of data dictionaries for all the datasets in the data source. The catalog allows you to examine the datasets in the data source without yet loading anything into memory. Once you decide which data items you want to load, use the fetch function to load that item into memory.

Usage

catalog(source, engine, pattern = NULL, where = NULL, import_specs = NULL)

Value

The loaded data catalog, as class "dcat". The catalog will be a list of data dictionaries. Each data dictionary is a tibble.

Arguments

source

The source for the data. This parameter is required. Normally the source is passed as a full or relative path.

engine

The data engine to use for this data source. This parameter is required. The available data engines are available on the engines enumeration. For example, engines$csv will specify the CSV engine, and engines$rdata will specify the RDATA engine.

pattern

A pattern to use when loading data items from the data source. The pattern can be a name or a vector of names. Names also accept wildcards. The supplied pattern will be used to filter which data items are loaded into the catalog. For example, the pattern pattern = "AD*" will load only datasets that start with "AD".

where

A where expression to use when fetching the data. This expression will apply to all fetch operations on this catalog. The where expression should be defined with the Base R expression function. The expression is unquoted and can use any Base R operators or functions.

import_specs

The import specs to use for any fetch operation on this catalog. The import spec can be used to control the data types on the incoming columns. You can create separate import specs for each dataset, or one import spec to use for all datasets. See the import_spec and specs functions for more information about this capability.

See Also

The fetch function to retrieve data from the catalog, and the import_spec function to create import specifications.

Examples

Run this code
# Get data directory
pkg <- system.file("extdata", package = "fetch")

# Create catalog
ct <- catalog(pkg, engines$csv)

# Example 1: Catalog all rows

# View catalog
ct
# data catalog: 6 items
# - Source: C:/packages/fetch/inst/extdata
# - Engine: csv
# - Items:
  # data item 'ADAE': 56 cols 150 rows
  # data item 'ADEX': 17 cols 348 rows
  # data item 'ADPR': 37 cols 552 rows
  # data item 'ADPSGA': 42 cols 695 rows
  # data item 'ADSL': 56 cols 87 rows
  # data item 'ADVS': 37 cols 3617 rows

# View catalog item
ct$ADEX
# data item 'ADEX': 17 cols 348 rows
# - Engine: csv
# - Size: 70.7 Kb
# - Last Modified: 2020-09-18 14:30:22
#    Name   Column     Class Label Format NAs MaxChar
# 1  ADEX  STUDYID character       NA   0       3
# 2  ADEX  USUBJID character       NA   0      10
# 3  ADEX   SUBJID character       NA   0       3
# 4  ADEX   SITEID character       NA   0       2
# 5  ADEX     TRTP character       NA   8       5
# 6  ADEX    TRTPN   numeric       NA   8       1
# 7  ADEX     TRTA character       NA   8       5
# 8  ADEX    TRTAN   numeric       NA   8       1
# 9  ADEX   RANDFL character       NA   0       1
# 10 ADEX    SAFFL character       NA   0       1
# 11 ADEX   MITTFL character       NA   0       1
# 12 ADEX  PPROTFL character       NA   0       1
# 13 ADEX    PARAM character       NA   0      45
# 14 ADEX  PARAMCD character       NA   0       8
# 15 ADEX   PARAMN   numeric       NA   0       1
# 16 ADEX     AVAL   numeric       NA  16       4
# 17 ADEX AVALCAT1 character       NA  87      10


# Example 2: Catalog with where expression
ct <- catalog(pkg, engines$csv, where = expression(SUBJID == '049'))

# View catalog item - Now only 4 rows
ct$ADEX
# data item 'ADEX': 17 cols 4 rows
#- Where: SUBJID == "049"
#- Engine: csv
#- Size: 4.5 Kb
#- Last Modified: 2020-09-18 14:30:22
#Name   Column     Class Label Format NAs MaxChar
#1  ADEX  STUDYID character       NA   0       3
#2  ADEX  USUBJID character       NA   0      10
#3  ADEX   SUBJID character       NA   0       3
#4  ADEX   SITEID character       NA   0       2
#5  ADEX     TRTP character       NA   0       5
#6  ADEX    TRTPN   numeric       NA   0       1
#7  ADEX     TRTA character       NA   0       5
#8  ADEX    TRTAN   numeric       NA   0       1
#9  ADEX   RANDFL character       NA   0       1
#10 ADEX    SAFFL character       NA   0       1
#11 ADEX   MITTFL character       NA   0       1
#12 ADEX  PPROTFL character       NA   0       1
#13 ADEX    PARAM character       NA   0      45
#14 ADEX  PARAMCD character       NA   0       8
#15 ADEX   PARAMN   numeric       NA   0       1
#16 ADEX     AVAL   numeric       NA   0       4
#17 ADEX AVALCAT1 character       NA   1      10

Run the code above in your browser using DataLab