Unlimited learning, half price | 50% off

Last chance! 50% off unlimited learning

Sale ends in


PINstimation (version 0.1.2)

PINstimation-package: An R package for estimating the probability of informed trading

Description

The package provides utilities for the estimation of probability of informed trading measures: original PIN (PIN) as introduced by Easley1992;textualPINstimation and Easley1996;textualPINstimation , multilayer PIN (MPIN) as introduced by Ersan2016;textualPINstimation, adjusted PIN (AdjPIN) model as introduced in Duarte09;textualPINstimation, and volume-synchronized PIN (VPIN) as introduced by Easley2011;textualPINstimation and Easley2012;textualPINstimation. Estimations of PIN, MPIN, and adjPIN are subject to floating-point exception error, and are sensitive to the choice of initial values. Therefore, researchers developed factorizations of the model likelihood functions as well as algorithms for determining initial parameter sets for the maximum likelihood estimation - (MLE henceforth).


As for the factorizations, the package includes three different factorizations of the PIN likelihood function :fact_pin_eho() as in Easley2010;textualPINstimation, fact_pin_lk() as in WilliamLin2011;textualPINstimation, and fact_pin_e() as in Ersan2016;textualPINstimation; one factorization for MPIN likelihood function: fact_mpin() as in Ersan2016;textualPINstimation; and one factorization for AdjPIN likelihood function: fact_adjpin() as in Ersan2022b;textualPINstimation.

The package implements three algorithms to generate initial parameter sets for the MLE of the PIN model in: initials_pin_yz() for the algorithm of Yan2012;textualPINstimation, initials_pin_gwj() for the algorithm of Gan2015;textualPINstimation, and initials_pin_ea() for the algorithm of ErsanAlici2016;textualPINstimation. As for the initial parameter sets for the MLE of the MPIN model, the function initials_mpin() implements a multilayer extension of the algorithm of ErsanAlici2016;textualPINstimation. Finally, three functions implement three algorithms of initial parameter sets for the MLE of the AdjPIN model, namely initials_adjpin() for the algorithm in Ersan2022b;textualPINstimation, initials_adjpin_cl() for the algorithm of ChengLai2021;textualPINstimation; and initials_adjpin_rnd() for randomly generated initial parameter sets. The choice of the initial parameter sets can be done directly, either using specific functions implementing MLE for the PIN model, such as, pin_yz(), pin_gwj(), pin_ea(); or through the argument initialsets in generic functions implementing MLE for the MPIN and AdjPIN models, namely mpin_ml(), and adjpin(). Besides, PIN, MPIN and AdjPIN models can be estimated using custom initial parameter set(s) provided by the user and fed through the argument initialsets for the functions pin(), mpin_ml() and adjpin(). Through the function get_posteriors(), the package also allows users to assign, for each day in the sample, the posterior probability that the day is a no-information day, good-information day and bad-information day.

As an alternative to the standard maximum likelihood estimation, estimation via expectation conditional maximization algorithm (ECM) is suggested in Ghachem2022;textualPINstimation, and is implemented through the function mpin_ecm() for the MPIN model, and the function adjpin() for the AdjPIN model.

Dataset(s) of daily aggregated numbers of buys and sells with user determined number of information layers can be simulated with the function generatedata_mpin() for the MPIN (PIN) model; and generatedata_adjpin() for the AdjPIN model. The output of these functions contains the theoretical parameters used in the data generation, empirical parameters computed from the generated data, alongside the generated data itself. Data simulation functions allow for broad customization to produce data that fit the user's preferences. Therefore, simulated data series can be utilized in comparative analyses for the applied methods in different scenarios. Alternatively, the user can use two example datasets preloaded in the package: dailytrades as a representative of a quarterly trade data with daily buys and sells; and hfdata as a simulated high-frequency dataset comprising 100 000 trades.

Finally, the package provides two functions to deal with high-frequency data. First, the function vpin() estimates and provides detailed output on the order flow toxicity metric, volume-synchronized probability of informed trading, as developed in Easley2011;textualPINstimation and Easley2012;textualPINstimation. Second, the function aggregate_trades() aggregates the high-frequency trade-data into daily data using several trade classification algorithms, namely the tick algorithm, the quote algorithm, LR algorithm LeeReady1991PINstimation and the EMO algorithm Ellis2000PINstimation.

The package provides fast, compact, and precise utilities to tackle the sophisticated, error-prone, and time-consuming estimation procedure of informed trading, and this solely using the raw trade-level data. Ghachem2022b;textualPINstimation provides comprehensive overview of the package: it first details the underlying theoretical background, provides a thorough description of the functions, before using them to tackle relevant research questions.

Arguments

Functions

  • adjpin estimates the adjusted probability of informed trading (AdjPIN) of the model of Duarte09;textualPINstimation.

  • aggregate_trades aggregates the trading data per day using different trade classification algorithms.

  • detectlayers_e detects the number of information layers present in the trade-data using the algorithm in Ersan2016;textualPINstimation.

  • detectlayers_eg detects the number of information layers present in the trade-data using the algorithm in Ersan2022a;textualPINstimation.

  • detectlayers_ecm detects the number of information layers present in the trade-data using the expectation-conditional maximization algorithm in Ghachem2022;textualPINstimation.

  • fact_adjpin returns the AdjPIN factorization of the likelihood function by Ersan2022b;textualPINstimation evaluated at the provided data and parameter sets.

  • fact_pin_e returns the PIN factorization of the likelihood function by Ersan2016;textualPINstimation evaluated at the provided data and parameter sets.

  • fact_pin_eho returns the PIN factorization of the likelihood function by Easley2010;textualPINstimation evaluated at the provided data and parameter sets.

  • fact_pin_lk returns the PIN factorization of the likelihood function by WilliamLin2011;textualPINstimation evaluated at the provided data and parameter sets.

  • fact_mpin returns the MPIN factorization of the likelihood function by Ersan2016;textualPINstimation evaluated at the provided data and parameter sets.

  • generatedata_adjpin generates a dataset object or a list of dataset objects generated according to the assumptions of the AdjPIN model.

  • generatedata_mpin generates a dataset object or a list of dataset objects generated according to the assumptions of the MPIN model.

  • get_posteriors computes, for each day in the sample, the posterior probabilities that it is a no-information day, good-information day and bad-information day respectively.

  • initials_adjpin generates the initial parameter sets for the ML/ECM estimation of the adjusted probability of informed trading using the algorithm of Ersan2022b;textualPINstimation.

  • initials_adjpin_cl generates the initial parameter sets for the ML/ECM estimation of the adjusted probability of informed trading using an extension of the algorithm of ChengLai2021;textualPINstimation.

  • initials_adjpin_rnd generates random parameter sets for the estimation of the AdjPIN model.

  • initials_mpin generates initial parameter sets for the maximum likelihood estimation of the multilayer probability of informed trading (MPIN) using the Ersan2016;textualPINstimation generalization of the algorithm in ErsanAlici2016;textualPINstimation.

  • initials_pin_ea generates the initial parameter sets for the maximum likelihood estimation of the probability of informed trading (PIN) using the algorithm of ErsanAlici2016;textualPINstimation.

  • initials_pin_gwj generates the initial parameter set for the maximum likelihood estimation of the probability of informed trading (PIN) using the algorithm of Gan2015;textualPINstimation.

  • initials_pin_yz generates the initial parameter sets for the maximum likelihood estimation of the probability of informed trading (PIN) using the algorithm of Yan2012;textualPINstimation.

  • mpin_ecm estimates the multilayer probability of informed trading (MPIN) using the expectation-conditional maximization algorithm (ECM) as in Ghachem2022;textualPINstimation.

  • mpin_ml estimates the multilayer probability of informed trading (MPIN) using layer detection algorithms in Ersan2016;textualPINstimation, and Ersan2022a;textualPINstimation; and standard maximum likelihood estimation.

  • pin estimates the probability of informed trading (PIN) using custom initial parameter set(s) provided by the user.

  • pin_bayes estimates the probability of informed trading (PIN) using the Bayesian approach in griffin2021;textualPINstimation.

  • pin_ea estimates the probability of informed trading (PIN) using the initial parameter sets from the algorithm of ErsanAlici2016;textualPINstimation.

  • pin_gwj estimates the probability of informed trading (PIN) using the initial parameter set from the algorithm of Gan2015;textualPINstimation.

  • pin_yz estimates the probability of informed trading (PIN) using the initial parameter sets from the grid-search algorithm of Yan2012;textualPINstimation.

  • vpin estimates the volume-synchronized probability of informed trading (VPIN).

Datasets

  • dailytrades A dataframe representative of quarterly (60 trading days) data of simulated daily buys and sells.

  • hfdata A dataframe containing simulated high-frequency trade-data on 100 000 timestamps with the variables {timestamp, price, volume, bid, ask}.

Estimation results

  • estimate.adjpin-class The class estimate.adjpin stores the estimation results of the function adjpin().

  • estimate.mpin-class The class estimate.mpin stores the estimation results of the MPIN model as estimated by the function mpin_ml().

  • estimate.mpin.ecm-class The class estimate.mpin.ecm stores the estimation results of the MPIN model as estimated by the function mpin_ecm().

  • estimate.pin-class The class estimate.pin stores the estimation results of the following PIN functions: pin(), pin_yz(), pin_gwj(), and pin_ea().

  • estimate.vpin-class The class estimate.vpin stores the estimation results of the VPIN model using the function vpin().

Data simulation

  • dataset-class The class dataset stores the result of simulation of the aggregate daily trading data.

  • data.series-class The class data.series stores a list of dataset.

Author

Montasser Ghachem montasser.ghachem@pinstimation.com
Department of Economics at Stockholm University, Stockholm, Sweden.

Oguz Ersan oguz.ersan@pinstimation.com
Department of International Trade and Finance at Kadir Has University, Istanbul, Turkey.

References