Simple projections of the annual 2% samples of Australian Taxation Office tax returns.
project(
sample_file,
h = 0L,
fy.year.of.sample.file = NULL,
WEIGHT = 50L,
excl_vars = NULL,
forecast.dots = list(estimator = "mean", pred_interval = 80),
wage.series = NULL,
lf.series = NULL,
use_age_pop_forecast = FALSE,
.recalculate.inflators = NA,
.copyDT = TRUE,
check_fy_sample_file = TRUE,
differentially_uprate_Sw = NA,
r_super_balance = 1.05,
r_generic = NULL
)
A sample file with the same number of rows as sample_file
but
with inflated values as a forecast for the sample file in to_fy
.
If WEIGHT
is not already a column of sample_file
, it will be added and its sum
will be the predicted number of taxpayers in to_fy
.
A data.table
matching a 2% sample file from the ATO.
See package taxstats
for an example.
An integer. How many years should the sample file be projected?
The financial year of sample_file
. If NULL
, the default, the number is inferred from the
number of rows of sample_file
to be one of 2012-13
, 2013-14
, 2014-15
, 2015-16
, or 2016-17
.
The sample weight for the sample file. (So a 2% file has WEIGHT
= 50.)
A character vector of column names in sample_file
that should not be inflated. Columns not present in the 2013-14 sample file are not inflated and nor are the columns Ind
, Gender
, age_range
, Occ_code
, Partner_status
, Region
, Lodgment_method
, and PHI_Ind
.
A list containing parameters to be passed to generic_inflator
.
See wage_inflator
. Note that the Sw_amt
will uprated by differentially_uprate_wage
(if requested).
See lf_inflator_fy
.
Should the inflation of the number of taxpayers be
moderated by the number of resident persons born in a certain year? If TRUE
,
younger ages will grow at a slightly higher rate beyond 2018 than older ages.
(logical, default: NA
).
Should generic_inflator()
or CG_inflator
be called to project the other variables? Adds time.
Default NA
means TRUE
if the pre-calculated inflators are available,
FALSE
otherwise.
(logical, default: TRUE
) Should a copy()
of sample_file
be made?
If set to FALSE
, will update sample_file
in place, which may
be necessary when memory is constrained, but is dangerous as it modifies the
original data and its projection. (So if you run the same code twice you
may end up with a projection 2h
years ahead, not h
years.)
(logical, default: TRUE
)
Should fy.year.of.sample.file
be checked against sample_file
?
By default, TRUE
, an error is raised if the base is not 2012-13, 2013-14, 2014-15, 2015-16, 2016-17,
or 2017-18,
and a warning is raised if the
number of rows in sample_file
is different to the known number of rows in the sample files.
(logical, default: NA
)
Should the salary and wage column (Sw_amt
) be differentially uprated
using (differentially_uprate_wage
)? Default of NA
means
use differential uprating is used when fy.year.of.sample.file <= "2016-17"
.
It is known that the Treasury stopped using differential uprating by 2019.
Selecting TRUE
for fy.year.of.sample.file > "2016-17"
is an
error as the precalculated values are not available.
The factor to inflate super balances by (annualized).
Set to 1.05
for backwards compatibility. The annual superannuation
bulletin of June 2019 from APRA reported 7.3% growth of funds with more than
fund members over the previous 5 years and 7.9% growth over the
previous ten years.
(Present from version 2024.1.0) The factor to inflate other
columns. Subject to change in future versions. If NULL
, the default,
an internal factor is used.
Currently components of taxable income are individually inflated based on their historical trends in the ATO sample files, with the exception of:
differentially_uprate_wage
.Sw_amt
wage_inflator
Alow_ben_amt
, ETP_txbl_amt
, Rptbl_Empr_spr_cont_amt
, Non_emp_spr_amt
, MCS_Emplr_Contr
, MCS_Prsnl_Contr
, MCS_Othr_Contr
cpi_inflator
WRE_car_amt
, WRE_trvl_amt
, WRE_uniform_amt
, WRE_self_amt
, WRE_other_amt
lf_inflator_fy
WEIGHT
CG_inflator
Net_CG_amt
, Tot_CY_CG_amt
Superannuation balances are inflated by a fixed rate of 5% p.a.
We recommend you use sample_file_1213
over sample_file_1314
,
unless you need the superannuation variables,
as the latter suggests lower-than-recorded tax collections.
However, more recent data is of course preferable.