Instructions

Solutions to the exercises of this homework 5 should, just as for HW1-HW4, be written in an R-Markdown document with output: github_document. Both the R-Markdown document (.Rmd-file) and the compiled Markdown document (.md file), as well as any figures needed for properly rendering the Markdown file on GitHub, should be uploaded to your Homework repository as part of a HW5 folder. Code should be written clearly in a consistent style, see in particular Hadley Wickham’s tidyverse style guide. As an example, code should be easily readable and avoid unnecessary repetition of variable names.

Open the project hw_data and pull the most recent changes. If this does not work, delete the folder and clone a new version through an R project.

Exercise 1: Lööf vs. Löfven

The file ../hw_data/LoofLofvenTweets.Rdata contains tibbles Loof and Lofven of tweets during the period from 2018-11-20 to 2018-11-30 mentioning “Lööf” and “Löfven”, respectively. The data were fetched from the Twitter API using the R package rtweet, which provides a convenient R access point to the twitter API. Load the data using the R function load.

Tasks

  1. Construct a tibble tweets that joins the two tibbles and contains a variable Person identifying whether the observation comes from the “Lööf” of “Löfven” table. Tweets common to both tibbles should not be included in the join.

  2. Illustrate how the intensity of tweets containing the word “statsminister” (or “Statsminister”) has evolved in time for the Persons using, e.g., barplots with time on the x-axis.

  3. Compute and plot the daily average sentiment of words in the tweet texts for the two Persons. We define the average sentiment as the average strength of words common to the text and the sentiment lexicon at https://svn.spraakdata.gu.se/sb-arkiv/pub/lmf/sentimentlex/sentimentlex.csv. Note that the function separate_rows can be useful in splitting the text into words.

Exercise 2: Nobel API v2

The 2022 Nobel lectures take place on 5–10 December. The Nobel foundation even maintains an API to look up information about the Nobel Laureates. We are going to use version 2 of this API.

Tasks:

  1. Fetch a list in JSON format with information on the Nobel prizes in Literature from the Nobel Prize API version 2. Choose a range of years to fetch data for. The API follows the OpenAPI standard and the documentation can be found here. A large part of this question is to figure out how to read and work with the OpenAPI documentation.

  2. Extract all the prize motivations from the JSON-list, convert into a character vector of words, remove stop words and visualize the frequencies of remaining words in a word-cloud. R-packages for plotting word clouds include e.g. wordcloud, wordcloud2 and ggwordcloud and a list of stop words can be fetched by

stop_words_url <- "https://raw.githubusercontent.com/stopwords-iso/stopwords-en/master/stopwords-en.txt"
stopwords <- read_table(stop_words_url, col_names = "words")