The final part of the course involves an individual project, taking
shape as a data blog post. This should illustrate an issue/problem using
an unique data set collected by yourself and at the same time illustrate
the use of tools taught in this course. Deadline for the project is
2023-01-10 at 18:00. Hand-in occurs by raising an issue
with the title “Project ready for grading” in your
PR_<github_username>
repository. Shortly after the
deadline, we will clone all project repos with such an issue to a local
installation. This version will count as your hand-in version. On
2023-01-12 the projects will be presented orally (5
minutes presentation) - presence is compulsory for the session you are
presenting in, so please reserve that day in your calendar.
The below blog posts could be viewed as inspiration or to give a rough idea of the amount of work expected in the projects.
Look for more inspiration at, e.g., R Weekly or R-Bloggers.
Data sources: During the course, you were introduced to a lot of possible data sources. Additional public web based data sources could, e.g., be the Stockholm Open Data Portal or an API to query data from Sweden’s national data portal . Another example of a contemporary website for relevant data is the COVID-19 data page by the Swedish Folkhälsomyndigheten.
The project work has the following elements:
#rstats
post, something which might interest your fellow
students. Your post can be about a serious matter, but it can also be a
not so serious matter. However, make it clear before writing who is your
intended readership (general public, fellow B.Sc. students, R users,
ornotologists, …)wordcountaddin
for RStudio to count the words in your report.The biggest challenge of the project will be to be realistic about what you can achieve within the given deadline. Once you have an estimate of how much that could be, take 50% of that and you are still likely to be busy. Make sure you have a working project early on and then scale up iteratively, so you’re always ready. Start early.
For every student we will create a private
PR_<username>
GitHub repository as part of the MT4007-HT22 organisation,
which only you and the teachers of the course have read/write access to
(similar setup as your HW_<username>
repo) and which
follow a generic template. At the project deadline we will pull all
repos, which have an issue “Project work submitted”.
At submission, your repo should at least contain the following files:
PR_jensjensen/Report/report.Rmd
PR_jensjensen/Report/report.html
PR_jensjensen/Presentation/presentation.Rmd
PR_jensjensen/Presentation/presentation.html
where jensjensen
is to be substituted with your GitHub
user name. Note: It’s important to use exactly the
filenames as above, since we will extract report and presentation
automatically from your repository. Furthermore, ensure that any support
files like data files, graphics, etc. which are needed to compile the
.Rmd documents, are part of the repository. Similar to when creating R
packages this would mean to put R preprocessing files in the folder
R
and data in a directory data
. One exception
are data aquired by using private API-keys. To this end, make a
R/query_data.R
script, which imports the API key stored
somewhere outside the git, does all the work and finally stores the data
using save
in the data
directory. Your R
Markdown report should then access the data by using
load
.
The data remain private as part of the
PW_<username>
repository, but your project report
HTML-file will be made accessible to the teachers and students of the
course.
In contrast to the homework exercises we use HTML as backend for the project report, see e.g. Section 3.1 of the R Markdown: The definite guide for options to customize. In order to ensure portability of the HTML reports please use the following as part of your YAML header:
----
title: The snappy title of your project
author: Jens Jensen <jens.jensen@student.su.se>
date: 2021-01-15
output:
html_document:
self_contained: true
toc: true
toc_depth: 2
---
The template in your repo already contains these options.
Final note: For the presentation the data wrangling steps do not need
to be repeated. One can either import all data generated by
report.Rmd
using load
or use some other type
of caching.
The project will be graded based on the following five dimensions, which have equal weight:
Lycka till!
If you want to use SCB data extractable by their web
interface, please us the pxweb
package for this instead.↩︎
Examples: using a targets
pipeline (Sect. 10.1), visualize using an interactive Shiny app , parallel computing, …↩︎