Learn R Programming

fedmatch

Any views expressed here do not reflect those of the Federal Reserve Board or Federal Reserve System.

The goal of fedmatch is to match un-linked datasets. It provides a variety of tools that allow a user to build a custom matching algorithm for their specific application. To get started, see the “Introduction to fedmatch” vignette.

You can view all the vignettes, and the rest of the documentation, on the fedmatch website.

Features

  • String cleaning tools
  • Fuzzy matching with standard string distance metrics from the package stringdist
  • A new fuzzy matching method which we call a Weighted Jaccard metric
  • Numeric matching using a trained logit model
  • A system to sequentially execute many different types of match algorithms
  • A system for evaluating matches post-hoc

Installation

You can install ‘fedmatch’ from CRAN with

install.packages("fedmatch")

You can install the development version from GitHub with:

install.packages("devtools")
devtools::install_github("seunglee98/fedmatch", build_vignettes = TRUE)

Or, you can download the folder from github, either by cloning it or downloading it manually and unzipping it, then running:

devtools::install("path_to_fedmatch", build_vignettes = TRUE)

Citation

This package is licensed under the terms of the MIT license. See the LICENSE.md file for details.

If you use this package for your research, please cite the technical paper:

Gregory J. Cohen, Jacob Dice, Melanie Friedrichs, Kamran Gupta, William Hayes, Isabel Kitschelt, Seung Jung Lee, W. Blake Marsh, Nathan Mislang, Maya Shaton, Martin Sicilian, Chris Webster. “The U.S. Syndicated Loan Market: Matching Data.” Journal of Financial Research, 2021.

Copy Link

Version

Install

install.packages('fedmatch')

Monthly Downloads

1,080

Version

2.0.6

License

MIT + file LICENSE

Maintainer

Chris Webster

Last Published

May 20th, 2024

Functions in fedmatch (2.0.6)

wgt_jaccard_distance

Computing Weighted Jaccard Distance
fund_words

fund_words
word_frequency

Compute frequency of words in a corpus
fuzzy_match

Use string distances to match on names
corp_data2

corp_data2
corporate_words

corporate_words
build_corpus

Calculate word corpus for weighted jaccard matching
calculate_weights

Calculate weights for computing matchscore
build_clean_settings

Building settings for string cleaning
build_score_settings

Build settings for scoring
State_FIPS

State_FIPS
build_fuzzy_settings

Build settings for fuzzy matching
build_tier

Build settings for a tier
build_multivar_settings

Build settings for multivar matching
clean_strings

String cleaning for easier matching
match_evaluate

evaluate a matched dataset
multivar_match

Matching by computing multivar_scores based on several variables
corp_data1

corp_data1
articles

articles
%>%

Pipe operator
World_Bank_Codes

World_Bank_Codes
sp_char_words

sp_char_words
merge_plus

Merge two datasets either by exact, fuzzy, or multivar-based matching
tier_match

Perform an iterative match by tier