# bipartiteGibbs

##### Gibbs Sampler Used for Beta Record Linkage

Run a Gibbs sampler to explore the posterior distribution of bipartite matchings that represent the linkage of the datafiles in beta record linkage.

##### Usage

`bipartiteGibbs(cd, nIter = 1000, a = 1, b = 1, aBM = 1, bBM = 1, seed = 0)`

##### Arguments

- cd
a list with the same structure as the output of the function

`compareRecords`

, containing:`comparisons`

matrix with

`n1*n2`

rows, where the comparison pattern for record pair \((i,j)\) appears in row`(j-1)*n1+i`

, for \(i\) in \({1,\dots,n1}\), and \(j\) in \({1,\dots,n2}\). A comparison field with \(L+1\) levels of disagreement, is represented by \(L+1\) columns of TRUE/FALSE indicators. Missing comparisons are coded as FALSE, which is justified under an assumption of ignorability of the missing comparisons, see Sadinle (2017).`n1,n2`

the datafile sizes,

`n1 = nrow(df1)`

and`n2 = nrow(df2)`

.`nDisagLevs`

a vector containing the number of levels of disagreement per comparison field.

`compFields`

a data frame containing the names of the fields in the datafiles used in the comparisons and the types of comparison.

- nIter
number of iterations of Gibbs sampler.

- a, b
hyper-parameters of the Dirichlet priors for the \(m\) and \(u\) parameters in the model for the comparison data among matches and non-matches, respectively. These can be vectors with as many entries as disagreement levels among all comparison fields. If specified as positive constants, they get recycled to the required length. If not specified, flat priors are taken.

- aBM, bBM
hyper-parameters of beta prior on bipartite matchings. Default is

`aBM=bBM=1`

.- seed
seed to be used for pseudo-random number generation. By default it sets

`seed=0`

.

##### Value

a list containing:

`Z`

matrix with

`n2`

rows and`nIter`

columns containing the chain of bipartite matchings. A number smaller or equal to`n1`

in row`j`

indicates the record in datafile 1 to which record`j`

in datafile 2 is linked at that iteration, otherwise`n1+j`

.`m,u`

chain of \(m\) and \(u\) parameters in the model for the comparison data among matches and non-matches, respectively.

##### References

Mauricio Sadinle (2017). Bayesian Estimation of Bipartite Matchings for Record Linkage. *Journal of the
American Statistical Association* 112(518), 600-612. [Published] [arXiv]

##### Examples

```
# NOT RUN {
data(twoFiles)
myCompData <- compareRecords(df1, df2, flds=c("gname", "fname", "age", "occup"),
types=c("lv","lv","bi","bi"))
chain <- bipartiteGibbs(myCompData)
# }
```

*Documentation reproduced from package BRL, version 0.1.0, License: GPL-3*