From the first homework assignment, you should now have a Homework
folder on your computer containing a subfolder HW1
with the
first assignment. For the coming assignments, you will need data that is
made available in the repo at https://github.com/mt4007-ht22/HW_data.
Clone this repo (by creating a new R-project) into a subfolder HW_data
of Homework. We recommend that you start a fresh R-project in
Homework/HW2
, but communicate with GitHub through the
project in Homework (i.e. open this project when you need to
commit/push).
HW_username
repository on GitHub. When you want to push your work to GitHub, open
the R-project in this folder and commit-push. It has a subfolder
Homework/HW_data
and one subfolder for each homework
(Homework/HW1
, Homework/HW2
, …). It also
contains the README.md-file where you insert links to your
homeworks.Homework/HW_data
folder is connected to https://github.com/mt4007-ht22/HW_data.
When a new homework is issued, you need to open the R-project in this
folder and pull the new data from GitHub. You should never change the
files in this folder. If you do so by mistake, delete it and make a new
clone.Homework/HW[1-6]
folders. This is where you keep
your rmarkdown and markdown document for each homework. You should keep
a separate R-project in each, but these need not be under version
control.You should not push the Homework/HW_data
to GitHub. In
order to avoid this:
.gitignore
(a
list of files that git ignores).HW_data
and save the
file.Solutions to the following tasks should be presented in an R-Markdown
document with output: github_document
. Both the R-Markdown
document (.Rmd-file) and the compiled Markdown document (.md file), as
well as any figures needed for properly rendering the Markdown file on
GitHub needs to be pushed as part of the HW2 subdirectory. Code should
be written clearly in a consistent style, see in particular Hadley
Wickham’s tidyverse style
guide. As an example, code should be easily readable and avoid
unnecessary repetition of variable names.
Your submitted code should be self-contained and results should
reproducible for someone having access to the HW_data-repo
directory. Once ou are ready to submit and before the deadline, use the
same procedure as for HW1: open an issue in your
HW_<username>
repository with the title “HW2 ready
for grading!”.
The file ../HW_data/booli_sold.csv
contains sales data on 158 apartments in Ekhagen (next to Lappis).
geom_boxplot
).The file ../HW_data/Folkhalsomyndigheten_Covid19.xlsx
contains data on COVID-19 cases in Sweden. The data was obtained through
Folkhälsomyndigheten’s
webpage on the 1st of October 2020. Due to the fact that we
downloaded it manually on a specific date, reproduceability might be an
issue since COVID cases might be updated.
Answer the listed questions below.
excel_sheets
.readxl
package, use an appropriate
read_*
function to read all sheets in the .xlsx file and
store them as tibbles (data.frames). The read_*
function
will be simply referred to as the “read function” in the coming
questions. When you read these sheets, you should see a lot of warning
messages. We will investigate those in the coming questions.knitr::kable
and
head
. What are the column names? Does anything seem
strange? Using the argument n_max
in the read
function, remove the last row.read_*
function parsed for the column Statsdel
? Read the
documentation and the appropriate function, give an explanation to why
this happens and how to fix it.tot_antal_fall
and nya_fall_vecka
. What is the
type of these variables and why has it been parsed as such? Correct
these (in some way) such that these become numeric variables.summarise
, and across
function,
reproduce the number of COVID-19 cases for each region as well as for
the total (here named Totalt_antal_fall
) based on sheet 1
of the excel file. Notice that this information can be found in another
part of the excel file. What is the total number of cases? Which region
has had most cases so far? Which has had the least? Argue why looking at
counts might be misleading when comparing regions.cumsum
function inside of mutate).Antal_fall_vecka
from sheet 7. Plot it as a barplot
against the week number veckonummer
using
geom_col()
.