group_by
and
summarise
, more ggplot2
Read R4DS chapters 5.6-5.7, 3.7-3.10.
Solve the Grouping and Summarizing and Types of vizualisations chapters of the Introduction to the Tidyverse course at DataCamp.
Open your class_files
-project and “Pull Branch” (under
Tools > Version control in RStudio) in order to make sure you have
files ready and updated.
The script class_files/SR_music.R
contains a simple function get_SR_music
for grabbing music
played on Swedish Radio channels
from their open API.
Load it by
source("class_files/SR_music.R")
and grab e.g. the songs on P3 (channel 164) played very recently, i.e. 2022-10-31, by
get_SR_music(channel = 164, date = "2022-10-31") %>% select(title, artist, start_time) %>% head(n=2)
## title artist start_time
## 1 Body Paint Arctic Monkeys 2022-10-31 23:58:41
## 2 Take It Personal Ella Tiritiello 2022-10-31 23:53:56
If you want multiple dates, the map
-functions from the
purrr
-package (included in the tidyverse
) are
convenient (more about these later on in the course). Grabbing music
played, e.g., in the last week of October (2022-10-25 to 2022-10-31 into
music
is done by
days <- seq(as.Date("2022-10-25"), as.Date("2022-10-31"), "days")
music <- map_df(days, get_SR_music, channel = 164)
Note: Data is not entirely clean and the same artist/song may be
coded in multiple ways (e.g. Cherrie & Z.E.
,
Cherrie, Z.e
and Cherrie, Z.E
). You may ignore
this for now.
start_time
s are distributed over the day. Repeat for
another channel, e.g. P2 (channel 163). You can grab components of a
date-time (POSIXct) object with format
as inas.POSIXct("2019-01-01 23:57:04 CET") %>% format("%H:%M")
## [1] "23:57"
for extracting the hour and minute, see ?format.POSIXct
for more examples. Note that the above code results in a value of
character-type, you may want to further convert to numeric format
(e.g. minutes or hours after midnight) before plotting. The tidyverse
package for date formatting is called lubridate
and
has function to extract hours and minutes as well, e.g.
as.POSIXct("2019-01-01 23:57:04 CET") %>% lubridate::hour()
## [1] 23
Kammarkollegiet is a
public agency that among other things issue insurances. The file class_files/claims.csv
contains data on claims from one of their personal insurances. Each
claim has an unique Claim id
, a Claim date
, a
Closing date
and a number of Payment
s
disbursed at Payment date
s. If the claim is not closed
(there may be more payments coming) Closing date
is given
value NA
. Null claims, i.e. claims that has been closed
without payment, are not included.
Read the data by
claim_data <- read_csv("class_files/claims.csv")
Claim id
should only be counted once!).Actuaries are very fond of loss
triangles. This is a table where the value on row \(i\), column \(j\) is the sum of all payments on claims
with Claim date
in year \(i\) that are disbursed until the \(j\):th calendar year after the year of the
claim/accident. The table will be a triangle since future payments are
not available.
knitr::kable
. Try to do it in a single sequence of
pipes. If future payments are coded as NA
, using
options(knitr.kable.NA = '')
will result in a nicer looking
table.All political parties participating in the 2022 Swedish elections can be downloaded from Valmyndigheten by
party_url = "https://www.val.se/download/18.75995f7b17f5a986a4eebb/1664362507785/deltagande-partier.csv"
parties_2022 <- read_delim(party_url, delim = ";", locale = locale("sv", encoding = "UTF-8"))
There is a warning about parsing issues when reading the data set. Can you find out where the problem is coming from?
How many unique parties participated in each of the three
elections (VALTYP
equals RD
for Riksdag,
RF
for Regionfullmäktige
and KF
for Kommunfullmäktige)?
Note that the same party may appear multiple times (based on
e.g. multiple reasons of inclusion in
DELTAGANDEGRUND
)
How many local parties (parties only participating within a
single VALKRETSKOD
) participated in the Kommunalval
(VALTYP
equals K
)?
As in last class load Systembolaget’s assortment and select the regular product range.
Varugrupp
)? Use filter
and is.na
to filter out beverages where Varugrupp
is not
available.PrisPerLiter
for each vintage and visualise using
ggplot
.PrisPerLiter
) in each
Varugrupp
.