The $h$-index, proposed by J.E. Hirsch (2005) is among the most popular scientific impact indicators. An author who has published $n$ papers has the Hirsch index equal to $H$, if each of his $H$ publications were cited at least $H$ times, and each of the remaining $n-H$ items were cited no more than $H$ times. This simple bibliometric tool quickly received much attention in the academic community and started to be a subject of intensive research. It was noted that, contrary to earlier approaches, i.e. publication count, citation count, etc., this measure concerns both productivity and impact of an individual. In a broader perspective, this issue is a special case of the so-called Producer Assessment Problem (Gagolewski, Grzegorzewski, 2010b).
Consider a producer (e.g. a writer, scientist, artist, craftsman) and a nonempty set of his products (e.g. books, papers, works, goods). Suppose that each product is given a rating (of quality, popularity, etc.) which is a single number in $I=[a,b]$, where $a$ denotes the lowest admissible valuation. We typically choose $I=[0,\infty]$ (an interval in the extended real line). Some instances of such situation are listed below.
Each possible state of producer's activity can be described by a point in $I^n$ for some arbitrary $n$. The Producer Assessment Problem (PAP) involves constructing and analyzing --- both theoretically and empirically --- aggregation operators (see Grabisch et al, 2009) which can be used for rating producers. A family of such functions should take into account the two following aspects of producer's quality:
(1) Given a numeric vector, the first class of functions computes the values of certain impact functions. Among them we have:
index.h
),index.g
),index.rp
andindex.lp
), which
generalize the$h$-index, the$w$-index (Woeginger, 2008), and
the MAXPROX-index (Kosmulski, 2007),Sstat
andSstat2
),
which generalize the OWMax operators (Dubois et al, 1988)
and the$h$- and$r_\infty$-indices.(2)
To preprocess and analyze bibliometric data (cf. Gagolewski, 2011) retrieved
e.g. from Elsevier's SciVerse Scopus
we need the lbsCreate
function.
The data frames Scopus_ASJC
and Scopus_SourceList
contain various information on current source coverage of SciVerse Scopus.
They may be needed during the creation of the LBS and lbsCreate
for more details.
License information: this data are publicly available
and hence no special permission is needed to redistribute them
(information from Elsevier).
Scopus_ReadCSV
). Note that the output limit in Scopus
is 2000 entries per file. Therefore, to perform
bibliometric research we often need to divide the query results
into many parts.
The data may be accessed via functions from the lbsDescriptiveStats
(basic description of the whole sample
or its subsets, called lbsGetCitations
(gather citation sequences selected
authors), and lbsAssess
(mass-compute impact functions'
values for given citation sequences).
There are also some helpful functions (in **EXPERIMENTAL** stage) which use
the lbsFindDuplicateTitles
and
lbsFindDuplicateAuthors
.
(3) Additionally, a set of functions dealing with stochastic aspects of S-statistics (generalized OWMax operators), the $h$-index and the Pareto type-II statistical models is included (Gagolewski, Grzegorzewski, 2010a). We have the following.
psstat
,dsstat
for computing
the distribution of S-statistics generated by some control function,phirsch
,dhirsch
for computing
the distribution of the Hirsch index,rho.get
for computing the so-called$\rho$-index
($\rho_\kappa$), which is a particular location characteristic
of a given probability distribution depending on
a control function$\kappa$.ppareto2
,dpareto2
,qpareto2
,rpareto2
for general functions
dealing with the Pareto distribution of the second kind,
including the c.d.f., p.d.f, quantiles and random deviates,pareto2.phirsch
,pareto2.dhirsch
for
computing the distribution of the Hirsch index (much faster than
the above general versions),pareto2.htest
--- two-sample$h$-test for
equality of shape parameters based on the difference
of$h$-indices,pareto2.htest.approx
--- two-sample asymptotic
(approximate)$h$-test,pareto2.ftest
--- two-sample exact F-test for
equality of shape parameters,pareto2.zsestimate
--- estimation of parameters
using the Bayesian method (MMSE) developed by
Zhang and Stevens (2009),pareto2.mlekestimate
,pareto2.mleksestimate
--- estimation of parameters
using the MLE,discrpareto2.mlekestimate
,discrpareto2.mleksestimate
--- estimation of parameters
of the Discretized Pareto-type II distribution using the MLE,pareto2.goftest
,discrpareto2.goftest
--- goodness-of-fit tests,pareto2.confint.rho
,pareto2.confint.rho.approx
--- exact and
approximate (asymptotic) confidence intervals for
the$\rho$-index basing on S-statistics,pareto2.confint.h
--- exact confidence intervals
for the theoretical$h$-index.
(4)
Moreover, we have implemented some simple graphical methods
than may be used to illustrate various aspects of data being analyzed,
see plot.citfun
, curve.add.rp
,
and curve.add.lp
.
Please feel free to send any comments and suggestions (e.g.
to include some new bibliometric impact indices) to the author
(see also
For a complete list of functions, use library(help="CITAN")
.
Keywords: Hirsch's h-index, Egghe's g-index, L-statistics,
S-statistics, bibliometrics, scientometrics, informetrics,
webometrics, aggregation operators, impact functions, impact assessment.