- data
The dataset, which can be a case notifications dataset (infections), hospital admissions or vaccination dataset (must pre-specify if it is a vaccinations dataset). Make sure dates are in date format.
- id_var
Any format as long as unique to individual. This is important This ID variable is critical. Must ensure for case data that it only has one row per person, or first infection only. Identifies the multiple rows associated with a person who has multiple vaccines, admissions or infections. Cannot have missing data, or the observation will be lost in the linking process.
- event_id_var
Any format as long as unique for the whole dataset. This represents the ID of the vaccination event, or the hospitalization event, which MUST be distinct. A person (id_var) can have multiple events (event_id). Some datasets will surprise you with multiple entries for the same admission.
- drop_eggs
This effectively drops the variables that are not being used. May turn this off if you need lots of extra information, but certainly good for the early stages of an analysis. Enables a lean dataset.
- data_type
Three options: "vaccination", "hospital", or "cases". The key information required is that for linkage, and the vaccination events. No age or age categories will be calculated if it is a vaccination dataset.
- lie_nest_flat
Takes a long vaccination dataset (like Australian Immunization Register; 1 or more rows per person) and turns it into a wide dataset - one row per person
- drop_the_na_vax
Drops (removes) vaccines that are listed as having no names.
- keep_vars
Vector list of variables. Variables in a vector list with quotation marks, as it will be used in a select statement.
- diagnosis
Character format. The column with the infectious disease diagnosis listed. e.g. COVID-19, SARS-CoV-2, RSV, Influenza.
- lettername1
Character format. First Name variable. If there is a second first name (some cases this might be a middle name), it will be removed during cleaning. All non-alphanumeric characters will be removed and everything becomes lower case.
- lettername2
Character format. Last name variable. All non-alphanumeric characters will be removed and everything becomes lower case. Two part last names will be kept.
- dob
Date format. The date of birth (make sure dates are in date format).
- age
Numeric format. Include age only if it has been pre-specified in the dataset, and you don't want it re-calculated.
- medicare
Numeric format. Medicare number. A medicare number with 9, 10 and 11 numbers will have been created. In Australia, the 10th number represents the card ID, and the 11th number represents the person ID. A family or individual will get a new card id (10th digit) every time their card expires.
- postcode
Numeric format. Post code of person with no restriction on the number of digits.
- gender
Character format. Pay close attention that your genders are in a similar format for data-linkage - "F", vs "0" vs "Female". This is left up to the user to clean.
- fn
Character format. First Nations Status.
- latitude
Numeric format. Latitude of address. Not explicitly required for linkage.
- longitude
Numeric format. Longitude of address. Not explicitly required for linkage.
- onset_date
Date format. Onset date of the illness. Commonly the date of diagnosis (date of the lab test or date of the first symptom). Must be in date format.
- vax_type
Character format. Variable that indicates the vaccine type, brand, or antigen
- vax_date
Date format. Variable that indicates the vaccination event date. Make sure is in date format, and arranged in order of dates you would like it to appear when it goes to wide format. For example, if it is not in order, vax_date_1 (an output variable) may be the latest vaccination date, instead of the first.
- lag
Numeric format. Number of days to add to the vaccination event date. Useful to define when a person reaches peak immunity post-vaccination. For COVID-19 this is often thought to be 14 days. Default lag is zero days.
- admission_date
Date format. Admission date variable. Typically, this should be later than the date of onset, but there are times when the disease is diagnosed in hospital.
- discharge_date
Date format. Discharge date variable. This date should be later than the date of admission.
- hospital
Hospital identifier. Typically name of the hospital.
- icd_code
Character format. ICD code variable for the admission. No pre-specified format required.
- diagnosis_description
Character format. Written description of the ICD code. For ease of understanding what the ICD codes mean, not a critical variable.
- drg
Character format. Diagnostic related group variable for the admission. No pre-specified format required.
- icu_date
Date format. ICU admission date preferably. Typically, this should be later than the date of onset and admission, but there are times when the disease is diagnosed in ICU.
- icu_hours
ICU hours. Hours spent in ICU. Should be numeric.
- dialysis
Dialysis indicator (0/1).
- genomics
Character format. Genomics variable. Can be variant of SARS-CoV-2, or similarly the Hepatitis A.
- dod
Date format. Variable representing date of death. Must only have one date of death chosen (in diagnosis dataset or hospitalization dataset, not both). If dod selected is from the hospitalization dataset, it will be deleted for persons without an admission.
- died
Variable representing death, best use 0 and 1.