apsahtml2csv: Read, parse, and write to a .csv file APSA eJobs html files

Description

Reads American Political Science Association (APSA) ``eJobs'' html files, parses the content of these files into a format for muRL to read, and writes that content to a .csv file.

Usage

apsahtml2csv(directory, file.name, file.ext = ".htm")

Arguments

Value

An Rdataframe is created and a .csv file is written. These include columns containing the APSA job listing ID number, the date the job advertisement was posted, the type of institution, the title of the position, the start date, salary, and region, the name of the institution and department, the name, address, city, state, ZIP code, and phone number of the individual to contact, the department or institution's web address, and a full paragraph description of the position.
The full paragraph description is stored in a column named desc. Due to the current parsing strategy, this field may include some excess characters from the APSA html page.

Details

After logging in to eJobs, the job announcement site of the American Political Science Association (APSA), the user can search for and find the APSA web page announcing a single job listing. The user can download the html from several such pages (usually with a simple ``Save As'' command, depending on one's operating system). apsahtml2csv then parses the html code from these pages, and sorts and stores the relevant content. A .csv file is written containing this content.

If the user downloads the APSA webpages using a different (or no) file extension, that extension (or "") should be specified using the file.ext argument. Because apsahtml2csv uses the value of file.ext in a grep command, we strongly recommend that the directory specified by directory include only the downloaded webpages, and no other files or directories.

Institutions are inconsistent in how they enter the names of their jobs' contact representatives. Thus, some tweaking of the output of apsahtml2csv may be required in order to create a .csv file that can be seemlessly read by read.murl. Specifically, the user may have to take the single column of the output of apsahtml2csv called contact, and create columns called title, fname, and lname.

Description

Usage

Arguments

Value

Details

See Also