Learn R Programming

esmtools (version 1.0.1)

dataInfo: Display information regarding the dataset in a succinct way.

Description

The 'dataInfo()' function displays detailed information about a dataset in a similar style as 'sessionInfo()'. It provides various details such as size, creation and update times, number of columns and rows, number of participants, variable names, and more. This information is useful for reproducibility, tracking the dataset, and ensuring transparency in data analysis workflows.

Usage

dataInfo(
  file_path = NULL,
  read_fun = NULL,
  idvar = NULL,
  timevar = NULL,
  validvar = NULL,
  citation = NULL,
  URL = NULL,
  DOI = NULL,
  path = TRUE,
  variables = TRUE
)

Value

The 'dataInfo()' function displays detailed information about the dataset. It can also be store as a list in a variable.

A kable object that summarizes the information on the data, the current R session, and the article or document associated with the script.

Arguments

file_path

The path or URL of the dataset file.

read_fun

The function used to read the dataset file.

idvar

The identifier variable(s) in the dataset, represented as a character vector.

timevar

A time variable(s) name in the dataset. Preference is to use the sent timestamp variable (the time when the beep was sent to the participant).

validvar

The validation variable name in the dataset, represented as a numerical vector. If NULL, the function do not display compliance rate information.

citation

A character element to cite the article or document associated with the script.

URL

The citation information for the dataset (article associated), represented as a character string. If NULL, the function will not display the citation information.

DOI

The Digital Object Identifier (DOI) of the dataset, if applicable. If NULL, the function will not display the DOI information.

path

If TRUE, the function will display the path information.

variables

A logical value indicating whether to display the names of the dataset's variables. Set to TRUE to display variable information, and FALSE to omit it. The default is TRUE.

Details

The 'dataInfo()' function provides a comprehensive summary of information about the dataset. The information returned includes:

  • Size: The size of the dataset in octets.

  • File extension

  • Creation and Update Times: The date and time when the data file was created and last updated.

  • Number of Columns and Rows

  • Number of Participants

  • Average Observations per Participant

  • Compliance Mean: The mean compliance value for the dataset.

  • Data Collection Period: The duration or period during which the data was collected.

  • Path: The path or URL of the dataset file.

  • Variable Names: The names of the variables in the dataset.

  • Associated Links: Any associated URL, DOI, or citation links for the dataset.

Examples

Run this code
library(dplyr)

# Load data
file_path <- system.file("extdata", "esmdata_sim.csv", package = "esmtools")

# Create a function to read the data
read_fun <- function(x) read.csv2(x) %>% 
    mutate(sent = as.POSIXct(as.character(sent), format="%Y-%m-%d %H:%M:%S"))

# Get data information
dataInfo(
  file_path = file_path, read_fun = read_fun,
  idvar = "id", timevar = "sent"
)

Run the code above in your browser using DataLab