Learn R Programming

Rcan (version 1.3.91)

csu_merge_cases_pop: csu_merge_cases_pop

Description

csu_merge_cases_pop merges registry data and population data, group by year and other user defined variable (sex, registry, etc...).

Usage

csu_merge_cases_pop(df_cases, 
	df_pop,
	var_age,
	var_cases="cases",
	var_py=NULL,
	group_by=NULL)

Value

Return a dataframe.

Arguments

df_cases

Registry data group by 5 years-age group (need to be R data.frame format, see examples to import csv file).

df_pop

Population data group by 5-years age group (need to be R data.frame format, see examples to import csv file).

var_age

Age variable. Several format are accepted

10-40
25-95
310-1410
.........
1780-8480
1885+85

This variable must be a variable with the same column name in both dataset (df_cases and df_pop).
Age >= 85 in the df_pop dataset will be aggregated as 85+.

var_cases

Cases variable in the df_cases dataset.

var_py

(Optional) If population is "long format", name of the population variable in the df_pop dataset.
If population data is wide format (see details), var_py must be NULL.

group_by

(Optional) A vector of variables to create the different population (sex, country, etc...).
Each variable must be a variable with the same column name in both dataset (df_cases and df_pop).
Do not include the "year" variable since it is automatically detected (see details).

Author

Mathieu Laversanne

Details

This function merges registry data and population for further analysis.
Both datasets must be group by 5-years age group.
If present, the year information in format "yyyy" will be detected automatically.
2 formats are accepted for population data:.
Long format: (year and population are 2 variables)

sexagepopyear
111161282005
121309952005
131375562005
............
216271712007
217135852007
218135852007

Wide format: (One column per year and no population variable, "yyyy" year format must be included in columns name)

sexageY2013Y2014Y2015
10-4215607237346247166
15-9160498152190152113
110-14175676171794165406
...............
275-79206252086823434
280-84718772767620
285+255125972617

See Also

csu_group_cases csu_asr csu_cumrisk csu_eapc csu_ageSpecific csu_ageSpecific_top csu_bar_top csu_time_trend csu_trendCohortPeriod

Examples

Run this code

# you can import your data from csv file using read.csv:
# mydata <-  read.csv("mydata.csv", sep=",")

data(ICD_group_GLOBOCAN)
data(data_individual_file)
data(data_population_file)

#group individual data by 
# 5 year age group 
# ICD grouping from dataframe ICD_group_GLOBOCAN
# year (extract from date of incidence)

df_data_year <- csu_group_cases(data_individual_file,
  var_age="age",
  group_by=c("sex", "regcode", "reglabel"),
  df_ICD = ICD_group_GLOBOCAN,
  var_ICD  ="site",
  var_year = "doi")     

#Merge 5-years age grouped data with population by year (automatic) and sex

df_data <- csu_merge_cases_pop(
	df_data_year, 
	data_population_file, 
	var_age = "age_group",
	var_cases = "cases",
	var_py = "pop",
	group_by = c("sex"))


# you can export your result as csv file using write.csv:
# write.csv(result, file="result.csv")
				  		  

Run the code above in your browser using DataLab