Solutions to the exercises of this homework 5 should, just as for
HW1-HW4, be written in an R-Markdown document with output:
github_document
. Both the R-Markdown document
(.Rmd-file) and the compiled Markdown document (.md
file), as well as any figures needed for properly rendering the Markdown
file on GitHub, should be uploaded to your Homework repository as part
of a HW5
folder. Code should be written clearly in a
consistent style, see in particular Hadley Wickham’s tidyverse style guide. As an
example, code should be easily readable and avoid unnecessary repetition
of variable names.
Open the project hw_data
and pull the most recent
changes. If this does not work, delete the folder and clone a new
version through an R project.
The file ../hw_data/LoofLofvenTweets.Rdata
contains tibbles Loof
and Lofven
of tweets
during the period from 2018-11-20 to 2018-11-30 mentioning “Lööf” and
“Löfven”, respectively. The data were fetched from the Twitter API using
the R package rtweet
,
which provides a convenient R access point to the twitter API. Load the
data using the R function load
.
Construct a tibble tweets
that joins the two tibbles
and contains a variable Person
identifying whether the
observation comes from the “Lööf” of “Löfven” table. Tweets common to
both tibbles should not be included in the join.
Illustrate how the intensity of tweets containing the word “statsminister” (or “Statsminister”) has evolved in time for the Persons using, e.g., barplots with time on the x-axis.
Compute and plot the daily average sentiment of words in the
tweet texts for the two Persons. We define the average sentiment as the
average strength of words common to the text and the sentiment lexicon
at https://svn.spraakdata.gu.se/sb-arkiv/pub/lmf/sentimentlex/sentimentlex.csv.
Note that the function separate_rows
can be useful in
splitting the text into words.
The 2022 Nobel lectures take place on 5–10 December. The Nobel foundation even maintains an API to look up information about the Nobel Laureates. We are going to use version 2 of this API.
Fetch a list in JSON format with information on the Nobel prizes in Literature from the Nobel Prize API version 2. Choose a range of years to fetch data for. The API follows the OpenAPI standard and the documentation can be found here. A large part of this question is to figure out how to read and work with the OpenAPI documentation.
Extract all the prize motivations from the JSON-list, convert
into a character vector of words, remove stop words and visualize the
frequencies of remaining words in a word-cloud. R-packages for plotting
word clouds include e.g. wordcloud
, wordcloud2
and ggwordcloud
and a list of stop words can be fetched
by
stop_words_url <- "https://raw.githubusercontent.com/stopwords-iso/stopwords-en/master/stopwords-en.txt"
stopwords <- read_table(stop_words_url, col_names = "words")