urltools (version 1.2.0)

suffix_extract: extract the suffix from domain names

Description

domain names have suffixes - common endings that people can or could register domains under. This includes things like ".org", but also things like ".edu.co". A simple Top Level Domain list, as a result, probably won't cut it.

suffix_extract takes the list of public suffixes, as maintained by Mozilla (see suffix_dataset) and a vector of domain names, and produces a data.frame containing the suffix that each domain uses, and the remaining fragment.

Usage

suffix_extract(domains)

Arguments

domains
a vector of damains, from domain or url_parse. Alternately, full URLs can be provided and will then be run through doma

Value

  • a data.frame of two columns, "domain_body" and "suffix". "domain_body" contains that part of the domain name that came before the matched suffix, and the suffix contains..well, the suffix. If a suffix cannot be extracted, domain_body will contain the entire domain, and suffix the string "Invalid".

See Also

suffix_dataset for the dataset of suffixes, and suffix_refresh for refreshing it.

Examples

Run this code
#Using url_parse
domain_name <- url_parse("http://en.wikipedia.org")$domain
suffix_extract(domain_name)

#Using domain()
domain_name <- domain("http://en.wikipedia.org")
suffix_extract(domain_name)

#Using internal parsing
suffix_extract("http://en.wikipedia.org")

Run the code above in your browser using DataLab