suffix_extract

0th

Percentile

extract the suffix from domain names

domain names have suffixes - common endings that people can or could register domains under. This includes things like ".org", but also things like ".edu.co". A simple Top Level Domain list, as a result, probably won't cut it.

suffix_extract takes the list of public suffixes, as maintained by Mozilla (see suffix_dataset) and a vector of domain names, and produces a data.frame containing the suffix that each domain uses, and the remaining fragment.

Usage
suffix_extract(domains, suffixes = NULL)
Arguments
domains

a vector of damains, from domain or url_parse. Alternately, full URLs can be provided and will then be run through domain internally.

suffixes

a dataset of suffixes. By default, this is NULL and the function relies on suffix_dataset. Optionally, if you want more updated suffix data, you can provide the result of suffix_refresh for this parameter.

Value

a data.frame of four columns, "host" "subdomain", "domain" & "suffix". "host" is what was passed in. "subdomain" is the subdomain of the suffix. "domain" contains the part of the domain name that came before the matched suffix. "suffix" is, well, the suffix.

See Also

suffix_dataset for the dataset of suffixes.

Aliases
  • suffix_extract
Examples
# NOT RUN {
# Using url_parse
domain_name <- url_parse("http://en.wikipedia.org")$domain
suffix_extract(domain_name)

# Using domain()
domain_name <- domain("http://en.wikipedia.org")
suffix_extract(domain_name)

# }
# NOT RUN {
#Relying on a fresh version of the suffix dataset
suffix_extract(domain("http://en.wikipedia.org"), suffix_refresh())
# }
# NOT RUN {
# }
Documentation reproduced from package urltools, version 1.7.3, License: MIT + file LICENSE

Community examples

Looks like there are no examples yet.