extract the suffix from domain names
domain names have suffixes - common endings that people can or could register domains under. This includes things like ".org", but also things like ".edu.co". A simple Top Level Domain list, as a result, probably won't cut it.
suffix_extract takes the list of public suffixes,
as maintained by Mozilla (see
a vector of domain names, and produces a data.frame containing the
suffix that each domain uses, and the remaining fragment.
a data.frame of four columns, "host" "subdomain", "domain" & "suffix".
"host" is what was passed in. "subdomain" is the subdomain of the suffix.
"domain" contains the part of the domain name that came before the matched suffix.
"suffix" is, well, the suffix.
suffix_dataset for the dataset of suffixes.
# Using url_parse domain_name <- url_parse("http://en.wikipedia.org")$domain suffix_extract(domain_name) # Using domain() domain_name <- domain("http://en.wikipedia.org") suffix_extract(domain_name) #Using internal parsing suffix_extract("http://en.wikipedia.org")