dplyr
and ggplot2
.Read R4DS chapters 3.1-3.6, 5.1-5.5
Complete assignments Data wrangling and Data visualization (first two chapters of Introduction to the Tidyverse) at DataCamp.
Start by creating a new R-project “Classroom” that you will use for
your class activities. Some activities will require data or scripts from
the repo Class_files
,
we therefore recommend that you clone this into a subfolder of your
Classroom
directory. Now create a new R Markdown document
Class1.Rmd
where you will do your work for this class.
Systembolaget’s
assortment of beverages from 2019-10-30 is available in the file Class_files/systembolaget2019-10-30.csv
.
It is downloaded from Systembolaget’s public API
and saved in csv-format by the script Class_files/Systembolaget.R
.
Unfortunately , Systembolaget just changed how they share information
via APIs (Nov, 1, 2022) due to usage of the product-related information
that went against the purpose of Swedish alcohol policy and
Systembolaget’s mission. Load the data by
# Define date when scraping took place - can then be easily changed.
date_systembolaget_scrape <- "2019-10-30"
library(tidyverse)
file_name <- paste0("systembolaget",date_systembolaget_scrape,".csv")
Sortiment_hela <- read_csv(file.path("Class_files", file_name))
arrange
, filter
,
mutate
, select
, %>%
)The variable Alkoholhalt
(alcohol by volume) has
been classified as character
by read_delim
,
since it contains a percent sign. Convert it to numeric using
mutate
by first removing the percent sign (e.g. with
gsub
) and then transform with
as.numeric
.
A few wines are labelled as Röda - lägre alkoholhalt
and Vita - lägre alkoholhalt
instead of
Rött vin
(red wine) respektive Vitt vin
(white
wine) in the Varugrupp
(group of products) column. Merge
these wines into Rött vin
and Vitt vin
,
respectively, e.g. by using mutate
and
ifelse
.
What beverage has the highest PrisPerLiter
? Display
the answer (the Namn
of the beverage) as dynamically coded
in the text body of your .Rmd
-document.
Create a new data frame Sortiment_ord
with the
regular product range (where SortimentText
equals
Ordinarie sortiment
). Make a table (with kable
from the knitr
-library) of the 10 most expensive
(PrisPerLiter
) beverages from this range. Use
select
to select suitable columns for the table.
if you have not already done so, write the code from the previous
excercise using a sequence of pipes (%>%
).
ggplot
, geom_point
,
geom_line
, facet_wrap
)For the regular product range in Sortiment_ord
PrisPerLiter
against Alkoholhalt
,
color the points by Varugrupp
and consider using a
log-scale for PrisPerLiter
.PrisPerLiter
(possibly on a log-scale) against
Varugrupp
. Consider coord_flip
to improve
readability.c("Vitt vin", "Rött vin", "Rosévin", "Mousserande vin")
of
vintage (Argang
) 2010-2019, plot PrisPerLiter
against Argang
. Try both using a facet
for
each group and coloring by group in the same facet.The Stockholm
international film festival takes place early November each year. In
Class_files/Film_events_2018-11-07.csv
you will find their event schedule for the 2018 edition.
arrange
, filter
,
mutate
, select
, %>%
)The file Class_files/Winter_medals2022-11-03.csv
contains the number of medals per country and olympic year at the winter
olympics since 1980 together with the total population of the country.
The data set is scraped from Wikipedia using the script Class_files/Winter_medals.R
which contains more information, in particular on countries that has
been split or joined during the period.
Load the file using
winter_medals <- read_csv("class_files/Winter_medals2022-11-03.csv")
arrange
, filter
,
mutate
, select
, %>%
)medals_per_mill
, the number of
medals per million inhabitants.medals_per_mill
, during the 2022 Winter Olympics.ggplot
, geom_point
,
geom_line
, facet_wrap
)?geom_point
for a list of aesthetics
geom_point
understands).facet
” for each of
Sweden, Norway and Finland.Use ggplot
to recreate (static versions) of some figures
from Hans Rosling’s
talks. Data is available in the package gapminder
.