Learn R Programming

injurytools (version 2.0.1)

prepare_data: Prepare data in a standardized format

Description

These are the data preprocessing functions provided by the injurytools package, which involve:

  1. setting exposure and injury and illness data in a standardized format and

  2. integrating both sources of data into an adequate data structure.

prepare_inj() and prepare_exp() set standardized names and proper classes to the (key) columns in injury/illness and exposure data, respectively. prepare_all() integrates both, standardized injury and exposure data sets, and convert them into an injd S3 object that has an adequate structure for further statistical analyses. See the Prepare Sports Injury Data vignette for details.

Usage

prepare_inj(
  df_injuries0,
  person_id = "person_id",
  date_injured = "date_injured",
  date_recovered = "date_recovered"
)

prepare_exp( df_exposures0, person_id = "person_id", date = "date", time_expo = "time_expo" )

prepare_all( data_exposures, data_injuries, exp_unit = c("minutes", "hours", "days", "matches_num", "matches_minutes", "activity_days", "seasons") )

Value

prepare_inj() returns a data frame in which the key columns in injury/illness data are standardized and have a proper format.

prepare_exp() returns a data frame in which the key columns in exposure data are standardized and have a proper format.

prepare_all() returns the injd

S3 object that contains all the necessary information and a proper data structure to perform further statistical analyses (e.g. calculate injury summary statistics, visualize injury data).

  • If exp_unit is "minutes" (the default), the columns tstart_min and tstop_min are created which specify the time to event (injury) values, the starting and stopping time of the interval, respectively. That is the training time in minutes, that the sportsperson has been at risk, until an injury/illness (or censorship) has occurred. For other choices, tstart_x and tstop_x are also created according to the exp_unit indicated (x, one of: min, h, match, minPlay, d, acd or s). These columns will be useful for survival analysis routines. See Note section.

  • It also creates days_lost column based on the difference between date_recovered and date_injured in days. And if it does exist (in the raw data) it overrides.

Arguments

df_injuries0

A data frame containing injury or illness information, with columns referring to the athlete name/id, date of injury/illness and date of recovery (as minimal data).

person_id

Character referring to the column name storing sportsperson (player, athlete) identification information.

date_injured

Character referring to the column name where the information about the date of injury or illness is stored.

date_recovered

Character referring to the column name where the information about the date of recovery is stored.

df_exposures0

A data frame containing exposure information, with columns referring to the sportsperson's name/id, date of exposure and the total time of exposure of the corresponding data entry (as minimal data).

date

Character referring to the column name where the exposure date information is stored. Besides, the column must be of class Date or integer/numeric. If it is integer/numeric, it should refer to the year in which the season started (e.g. date = 2015 to refer to the 2015/2016 season).

time_expo

Character referring to the column name where the information about the time of exposure in that corresponding date is stored.

data_exposures

Exposure data frame with standardized column names, in the same fashion that prepare_exp() returns.

data_injuries

Injury data frame with standardized column names, in the same fashion that prepare_inj() returns.

exp_unit

Character defining the unit of exposure time ("minutes" the default).

Examples

Run this code
df_injuries <- prepare_inj(df_injuries0   = raw_df_injuries,
                           person_id      = "player_name",
                           date_injured   = "from",
                           date_recovered = "until")

df_exposures <- prepare_exp(df_exposures0 = raw_df_exposures,
                            person_id     = "player_name",
                            date          = "year",
                            time_expo     = "minutes_played")

# \donttest{
injd <- prepare_all(data_exposures = df_exposures,
                    data_injuries  = df_injuries,
                    exp_unit = "matches_minutes")
head(injd)
class(injd)
str(injd, 1)
# }

Run the code above in your browser using DataLab