
Last chance! 50% off unlimited learning
Sale ends in
The package provides utilities for the estimation
of probability of informed trading measures: original PIN (PIN
) as
introduced by Easley1992;textualPINstimation and
Easley1996;textualPINstimation
, multilayer PIN (MPIN
) as introduced by
Ersan2016;textualPINstimation, adjusted PIN (AdjPIN
) model
as introduced in Duarte09;textualPINstimation, and
volume-synchronized PIN (VPIN
) as introduced by
Easley2011;textualPINstimation and
Easley2012;textualPINstimation. Estimations of
PIN
, MPIN
, and adjPIN
are subject to floating-point exception
error, and are sensitive to the choice of initial values.
Therefore, researchers developed factorizations of the model likelihood
functions as well as algorithms for determining initial parameter sets for
the maximum likelihood estimation - (MLE henceforth).
As for the factorizations, the package includes three
different factorizations of the PIN
likelihood function :fact_pin_eho()
as in Easley2010;textualPINstimation, fact_pin_lk()
as in
WilliamLin2011;textualPINstimation, and fact_pin_e()
as in
Ersan2016;textualPINstimation;
one factorization for MPIN
likelihood function: fact_mpin()
as in
Ersan2016;textualPINstimation; and one factorization for
AdjPIN
likelihood function: fact_adjpin()
as in
Ersan2022b;textualPINstimation.
The package implements three algorithms to generate initial
parameter sets for the MLE of the PIN
model in: initials_pin_yz()
for the algorithm of Yan2012;textualPINstimation,
initials_pin_gwj()
for the algorithm of
Gan2015;textualPINstimation, and initials_pin_ea()
for the
algorithm of ErsanAlici2016;textualPINstimation. As for the
initial parameter sets for the MLE of the MPIN
model, the function
initials_mpin()
implements a multilayer extension of the algorithm of
ErsanAlici2016;textualPINstimation. Finally, three functions
implement three algorithms of initial parameter sets for the MLE of
the AdjPIN
model, namely initials_adjpin()
for the algorithm in
Ersan2022b;textualPINstimation, initials_adjpin_cl()
for the algorithm of ChengLai2021;textualPINstimation; and
initials_adjpin_rnd()
for randomly generated initial parameter sets.
The choice of the initial parameter sets can be done directly, either using
specific functions implementing MLE for the PIN model, such as, pin_yz()
,
pin_gwj()
, pin_ea()
; or through the argument initialsets
in generic
functions implementing MLE for the MPIN
and AdjPIN
models, namely
mpin_ml()
, and adjpin()
.
Besides, PIN
, MPIN
and AdjPIN
models can be estimated using custom
initial parameter set(s) provided by the user and fed through
the argument initialsets
for the functions pin()
, mpin_ml()
and
adjpin()
. Through the function get_posteriors()
, the package also
allows users to assign, for each day in the sample, the posterior
probability that the day is a no-information day, good-information day
and bad-information day.
As an alternative to the standard maximum likelihood estimation,
estimation via expectation conditional maximization algorithm (ECM
)
is suggested in Ghachem2022;textualPINstimation, and is
implemented through the function mpin_ecm()
for the MPIN
model, and
the function adjpin()
for the AdjPIN
model.
Dataset(s) of daily aggregated numbers of buys and sells with user
determined number of information layers can be simulated with the function
generatedata_mpin()
for the MPIN
(PIN
) model;
and generatedata_adjpin()
for the AdjPIN
model. The output of these functions contains the
theoretical parameters used in the data generation, empirical parameters
computed from the generated data, alongside the generated data itself.
Data simulation functions allow for broad customization
to produce data that fit the user's preferences. Therefore, simulated data
series can be utilized in comparative analyses for the applied methods in
different scenarios. Alternatively, the user can use two example datasets
preloaded in the package: dailytrades
as a representative of a quarterly
trade data with daily buys and sells; and hfdata
as a simulated
high-frequency dataset comprising 100 000
trades.
Finally, the package provides two functions to deal with
high-frequency data.
First, the function vpin()
estimates and provides detailed output on the
order flow toxicity metric, volume-synchronized probability of informed
trading, as developed in Easley2011;textualPINstimation and
Easley2012;textualPINstimation. Second, the function
aggregate_trades()
aggregates the high-frequency trade-data into daily
data using several trade classification algorithms, namely the tick
algorithm, the quote
algorithm, LR
algorithm
LeeReady1991PINstimation and the EMO
algorithm Ellis2000PINstimation.
The package provides fast, compact, and precise utilities to tackle
the sophisticated, error-prone, and time-consuming estimation procedure of
informed trading, and this solely using the raw trade-level data.
Ghachem2022b;textualPINstimation
provides comprehensive overview of the package: it first
details the underlying theoretical background, provides a thorough
description of the functions, before using them to tackle relevant
research questions.
adjpin estimates the adjusted probability of informed trading
(AdjPIN
) of the model of Duarte09;textualPINstimation.
aggregate_trades aggregates the trading data per day using different trade classification algorithms.
detectlayers_e detects the number of information layers present in the trade-data using the algorithm in Ersan2016;textualPINstimation.
detectlayers_eg detects the number of information layers present in the trade-data using the algorithm in Ersan2022a;textualPINstimation.
detectlayers_ecm detects the number of information layers present in the trade-data using the expectation-conditional maximization algorithm in Ghachem2022;textualPINstimation.
fact_adjpin returns the AdjPIN
factorization of the likelihood
function by Ersan2022b;textualPINstimation evaluated at the
provided data and parameter sets.
fact_pin_e returns the PIN
factorization of the likelihood
function by Ersan2016;textualPINstimation evaluated at
the provided data and parameter sets.
fact_pin_eho returns the PIN
factorization of the likelihood
function by Easley2010;textualPINstimation evaluated at the
provided data and parameter sets.
fact_pin_lk returns the PIN
factorization of the likelihood
function by WilliamLin2011;textualPINstimation evaluated
at the provided data and parameter sets.
fact_mpin returns the MPIN
factorization of the likelihood
function by Ersan2016;textualPINstimation evaluated at the
provided data and parameter sets.
generatedata_adjpin generates a dataset object or a list of
dataset objects generated according to the assumptions of the AdjPIN
model.
generatedata_mpin generates a dataset object or a list of
dataset objects generated according to the assumptions of the MPIN
model.
get_posteriors computes, for each day in the sample, the posterior probabilities that it is a no-information day, good-information day and bad-information day respectively.
initials_adjpin generates the initial parameter sets for the
ML
/ECM
estimation of the adjusted probability of informed trading using
the algorithm of Ersan2022b;textualPINstimation.
initials_adjpin_cl generates the initial parameter sets for the
ML
/ECM
estimation of the adjusted probability of informed trading using
an extension of the algorithm of
ChengLai2021;textualPINstimation.
initials_adjpin_rnd generates random parameter sets for the
estimation of the AdjPIN
model.
initials_mpin generates initial parameter sets for the maximum
likelihood estimation of the multilayer
probability of informed trading (MPIN
) using the
Ersan2016;textualPINstimation generalization of the algorithm
in ErsanAlici2016;textualPINstimation.
initials_pin_ea generates the initial parameter sets for the
maximum likelihood estimation of the probability of informed trading (PIN
)
using the algorithm of ErsanAlici2016;textualPINstimation.
initials_pin_gwj generates the initial parameter set for the
maximum likelihood estimation of the probability of informed trading (PIN
)
using the algorithm of Gan2015;textualPINstimation.
initials_pin_yz generates the initial parameter sets for the
maximum likelihood estimation of the probability of informed trading (PIN
)
using the algorithm of Yan2012;textualPINstimation.
mpin_ecm estimates the multilayer probability of informed
trading (MPIN
) using the expectation-conditional maximization algorithm
(ECM
) as in Ghachem2022;textualPINstimation.
mpin_ml estimates the multilayer probability of informed trading
(MPIN
) using layer detection algorithms in
Ersan2016;textualPINstimation, and
Ersan2022a;textualPINstimation; and standard maximum
likelihood estimation.
pin estimates the probability of informed trading (PIN
) using
custom initial parameter set(s) provided by the user.
pin_bayes estimates the probability of informed trading (PIN
) using
the Bayesian approach in griffin2021;textualPINstimation.
pin_ea estimates the probability of informed trading (PIN
)
using the initial parameter sets from the algorithm of
ErsanAlici2016;textualPINstimation.
pin_gwj estimates the probability of informed trading (PIN
)
using the initial parameter set from the algorithm of
Gan2015;textualPINstimation.
pin_yz estimates the probability of informed trading (PIN
)
using the initial parameter sets from the grid-search algorithm of
Yan2012;textualPINstimation.
vpin estimates the volume-synchronized probability of informed
trading (VPIN
).
dailytrades A dataframe representative of quarterly (60 trading days) data of simulated daily buys and sells.
hfdata A dataframe containing simulated high-frequency
trade-data on 100 000 timestamps with the variables
{timestamp, price, volume, bid, ask}
.
estimate.adjpin-class The class estimate.adjpin
stores the
estimation results of the function adjpin()
.
estimate.mpin-class The class estimate.mpin
stores the
estimation results of the MPIN
model as estimated by the function
mpin_ml()
.
estimate.mpin.ecm-class The class estimate.mpin.ecm
stores
the estimation results of the MPIN
model as estimated by the function
mpin_ecm()
.
estimate.pin-class The class estimate.pin
stores the
estimation results of the following PIN
functions: pin(), pin_yz(),
pin_gwj()
, and pin_ea()
.
estimate.vpin-class The class estimate.vpin
stores the
estimation results of the VPIN
model using the function vpin()
.
dataset-class The class dataset
stores the result of
simulation of the aggregate daily trading data.
data.series-class The class data.series
stores a list of
dataset
.
Montasser Ghachem montasser.ghachem@pinstimation.com
Department of Economics at Stockholm University, Stockholm, Sweden.
Oguz Ersan oguz.ersan@pinstimation.com
Department of International Trade and Finance at Kadir Has University,
Istanbul, Turkey.