cerardat: cerardat

Description

The methodology is based on a statistical and visual approach using two estimated density curves to date each archaeological context. The statistical procedure required two steps, each leading to the construction of a density curve. The first allowed us to estimate a date corresponding to the terminus post quem of the context, a cursor reflecting an event dated in calendar time. This statistical tool allows the archaeologist to easily visualise and analyse chronological patterns.

Usage

cerardat(df, col.sup, date, nf = NULL, confidence = 0.95, graph = T)

Value

prediction: Estimated date for archaeological context (event: dateEV and accumulation: dateAC) with confidence interval.
date_gt: Estimated date for technical groups with confidence interval. (use for dateAC)
lm: Linear model on the components of the correspondance analysis.
predict_obj_row: date prediction of archaeological contexts (rows) using predict.lm.
predict_obj_col: date prediction of technical groups (columns) using predict.lm.
cont_gt: Contingency table of the reference corpus.
statistical.summary: Statistical summary of the model:
Adjusted R-squared
R-squared
sigma (Residual standard error)
The Shapiro-Wilks test is used to verify the normality of the residuals.
The Durbin-Watson test checks for first order autocorrelation.
The Breusch-Pagan test checks for heteroscedasticity.
obs_ca_eval: Quality of row representation in the correspondence analysis.
check_ref: Plot of estimated dates (with confidence interval) and real dates of reference data. Only when the real date is known.
check_sup: Plot of estimated dates (with confidence interval) and real dates of supplementary data. Only when the real date is known.
Shapiro_Wilks: Summary of the Shapiro-Wilks test. see shapiro.test.
Durbin_Watson: Summary of the Durbin-Watson test. see dwtest.
Breusch_Pagan: Summary of the Breusch-Pagan test. see bptest.

Arguments

df: The data (data.frame) is a contingency table with the technical groups in the rows and the observations in the columns.
col.sup: Index of supplementary columns in df (vector).
date: The dates of each observation or NA (vector).
nf: an integer representing the number of axes retained in the correspondence analysis. If NULL, it is automatically chosen to keep a number corresponding to at least 60% of the inertia.
confidence: The desired confidence interval (0.95 for 95%).
graph: logical to display the plots or not.

Author

A. COULON

L. BELLANGER

P. HUSI

Details

The corpus data is a contingency table with the technical groups in the rows and the observations in the columns. There are two types of observations: the reference corpus observations and the supplementary observations. The supplementary columns (observations) are identified by the argument `col.sup`.

step 1: modelling events dated in calendar time (dateEv)
This step involves estimating the date of an event recorded in the ground (an archaeological context for the archaeologist) from the pottery assemblage of which it is composed, by fitting a regression model that relates a known date in calendar time, such as the date of issue of a coin, to its pottery profile. The reference corpus used to fit the regression model. We then used the previously fitted model to calculate a predicted value for contexts not included in the reference corpus, sometimes stratigraphically separated or poorly documented, with a 95% confidence interval for the predicted date.

A correspondence analysis (CA) was carried out to summarize the information in the reference corpus data. We then kept only the first factorial axes. In this way, our contingency table becomes a reduced size table, an incomplete reconstruction of the data. This principle is used in many factor analysis techniques to reduce the number of explanatory variables in the linear regression model.

After estimating the beta parameters of the model using the classical results of multiple regression analysis and checking that the model fits the data correctly, we can deduce the estimated date of an observation and also predict the date of another observation that has no coins and is therefore not dated.

step 2: from event time (dateEv) to accumulation time (dateAc)
We used the results of the first step and the properties of the CA to obtain an estimate of the date of each fabric. We could then define the archaeological time represented as dateAc, in other words the accumulation time of a context, as the weighted sum of the fabric dates; the weights being the proportions of MINVC of each fabric in the context. Assuming that the random variables dateEvj are independent, the distribution of the accumulation time of each context can be approximated by the Gaussian mixture. In this way, for each context, we obtained a plurimodal density curve representing the estimated law of accumulation time based on the mixture of normal densities (dates of each tissue). By definition, the area under the density curve has a value of 1 (i.e. 100%).

date
In order to estimate a date for the context, it is essential to refer to objects that have been dated by another source, such as coins. These contexts were selected on a very strict basis for their chronostratigraphic reliability, level of domestic occupation or enclosures with long urban stratigraphic sequences, thereby minimising any bias associated with the disparity between the date of the coin and that of the context.

References

Bellanger L. and Husi P. (2012) Statistical tool for dating and interpreting archaeological contexts using pottery. Journal of Archaeological Science, Elsevier, 39 (4), pp.777-790. doi:10.1016/j.jas.2011.06.031.

Examples

Run this code

data("datacerardat")

resultat = cerardat(df = datacerardat$df,
           col.sup = datacerardat$col.sup,
           date = datacerardat$date,
           nf = NULL,
           confidence = 0.95,
           graph = TRUE
        )

resultat
#The Shapiro-Wilks test is used to verify the normality of the residuals.
#The Durbin-Watson test checks for first order autocorrelation.
#The Breusch-Pagan test checks for heteroscedasticity.



#See the first plot
plot(resultat,
     which = 1,
     col1=rgb(0.93,0.23,0.23,0.5),
     col2="black",
     xlim=NULL,
     ylim=c(0,0.03)
    )

#See the first ten plots
#plot(resultat,
#     which = 1:10,
#     col1=rgb(0.93,0.23,0.23,0.5),
#     col2="black",
#     xlim=NULL,
#     ylim=c(0,0.03)
#    )

#See all plots
#plot(resultat,
#     which = NULL,
#     col1=rgb(0.93,0.23,0.23,0.5),
#     col2="black",
#     xlim=NULL,
#     ylim=c(0,0.03)
#    )

#You can extract the plots and find them in the directory :
paste0(getwd(),"/figures")
#With the extract_results() function

Run the code above in your browser using DataLab