Do this before class

Read R4DS chapter 7 about exploratory data analysis.

Solve Exploring Categorical Data and Exploring Numerical Data of the Exploratory Data Analysis course at DataCamp.

During class

Systembolaget’s assortment

  • Use filter to extract the groups of products c("Vitt vin", "Rött vin", "Rosévin", "Mousserande vin") of vintage 2011-2018. Try and compare the following bar charts

    • ggplot with aes(x = Argang), geom_bar() and
    • ggplot with aes(x = Argang), geom_bar() and facet_wrap(~ Varugrupp) (try adding scale = "free_y" to facet_wrap)
    • ggplot with aes(x = Argang, fill = Varugrupp) and
      • geom_bar()
      • geom_bar(position = "dodge")
      • geom_bar(position = "fill")
  • Recreate the following plot (Red wines in the regular range)

  • Make a box_plot of PrisPerLiter on the log-scale,with x = Varugrupp. Try coord_flip to improve readability.

Winter medals

The following code transforms the medals data to “long” format (more about this next time!) which is easier to work with in ggplot:

medal_long <- read_csv("../class_files/Winter_medals2019-10-30.csv") %>% 
    select(-Total) %>% 
    pivot_longer(cols = c("Gold", "Silver", "Bronze"),
                 names_to = "Medal", 
                 values_to = "Number")

Check the result with glimpse(medal_long). Use group_by and summarise in order to aggregate the total number of medals (Gold/Silver/Bronze) for each country. Illustrate the relative proportions of medals, e.g. by geom_bar with stat = "identity and position = "fill".

First math course

The file class_files/MM2001_results.csv contains the age, sex, and grade on course Matematik I (MM2001) of 3201 students aged 18-40 years. An NA in the grade column means that the student has been registered but not yet completed the course.

Use ggplot to explore relations between the variables.