ratings: An Inconvenient Sequel

Description

The raw data behind the story "Al Gore's New Movie Exposes The Big Flaw In Online Movie Ratings" https://fivethirtyeight.com/features/al-gores-new-movie-exposes-the-big-flaw-in-online-movie-ratings/.

Usage

ratings

Arguments

Format

Because of R package size restrictions, only a preview of the first 10 rows of this dataset is included; to obtain the entire dataset (80,053 rows) see Examples below. The preview is a data frame with 10 rows representing movie ratings and 27 variables:

timestamp: The date at which the rating was recorded.
respondents: The number of respondents in a category associated with a given timestamp.
category: The subgroups of respondents differentiated by demographics like gender, age, and nationality.
link: The website associated with a given category's responses.
average: The average rating reported by a given category.
mean: The mean rating reported by a given category.
median: The median rating reported by a given category.
votes_1: The count of votes denoting a rating of one that respondents gave.
votes_2: The count of votes denoting a rating of two that respondents gave.
votes_3: The count of votes denoting a rating of three that respondents gave.
votes_4: The count of votes denoting a rating of four that respondents gave.
votes_5: The count of votes denoting a rating of five that respondents gave.
votes_6: The count of votes denoting a rating of six that respondents gave.
votes_7: The count of votes denoting a rating of seven that respondents gave.
votes_8: The count of votes denoting a rating of eight that respondents gave.
votes_9: The count of votes denoting a rating of nine that respondents gave.
votes_10: The count of votes denoting a rating of ten that respondents gave.
pct_1: The percentage of votes denoting a rating of one that respondents gave.
pct_2: The percentage of votes denoting a rating of two that respondents gave.
pct_3: The percentage of votes denoting a rating of three that respondents gave.
pct_4: The percentage of votes denoting a rating of four that respondents gave.
pct_5: The percentage of votes denoting a rating of five that respondents gave.
pct_6: The percentage of votes denoting a rating of six that respondents gave.
pct_7: The percentage of votes denoting a rating of seven that respondents gave.
pct_8: The percentage of votes denoting a rating of eight that respondents gave.
pct_9: The percentage of votes denoting a rating of nine that respondents gave.
pct_10: The percentage of votes denoting a rating of ten that respondents gave.

Examples

Run this code

# NOT RUN {
# To obtain the entire dataset, run the following code:
library(readr)
library(dplyr)
ratings <- 
  "https://github.com/fivethirtyeight/data/raw/master/inconvenient-sequel/ratings.csv" %>%
  read_csv() %>%
  mutate(category = as.factor(category)) %>% 
  rename(
    votes_1 = `1_votes`, votes_2 = `2_votes`, votes_3 = `3_votes`, 
    votes_4 = `4_votes`, votes_5 = `5_votes`, votes_6 = `6_votes`,
    votes_7 = `7_votes`, votes_8 = `8_votes`, votes_9 = `9_votes`,
    votes_10 = `10_votes`,
    pct_1 = `1_pct`, pct_2 = `2_pct`, pct_3 = `3_pct`, pct_4 = `4_pct`,
    pct_5 = `5_pct`, pct_6 = `6_pct`, pct_7 = `7_pct`, pct_8 = `8_pct`,
    pct_9 = `9_pct`, pct_10 = `10_pct`
  )

# To convert data frame to tidy data (long) format, run:
library(dplyr)
library(tidyr)
library(stringr)
ratings_tidy <- ratings %>%
  gather(votes, count, -c(timestamp, respondents, category, link, average, mean, median)) %>%
  arrange(timestamp)
# }

Run the code above in your browser using DataLab