From http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
Matstat courses usually focus entirely on the Model stage, this course aims to touch on the others.
Pic from: http://r4ds.had.co.nz/introduction.html
Provides basic training and preparation for class. Not grading-relevant, but we assume that you worked through the material.
NA
/“missing values”?Reproducibility is the ability to get the same research results or inferences, based on the raw data and computer programs provided by researchers. (Wikipedia)
Cf. Replicability, the ability to arrive at the same conclusion based on independent data/analysis.
You can never guarantee that you did “right”, but you can at least document what you did.
It can also be difficult to guarantee that everything works the same between OSs.
Everything written in code (no clicking or cutting/pasting results/tables/figures)
Portable (the code must be executable, not just on your computer today)
Accessible (others should be able to easily access and reproduce your analysis)
Automated from raw data to report (a button press should be enough to generate the final product)
summary(mtcars$mpg) summary(mtcars$"mpg") summary(mtcars[, "mpg"]) summary(mtcars["mpg"]) summary(mtcars[["mpg"]]) summary(mtcars[1]) summary(mtcars[, 1]) summary(mtcars[[1]]) with(mtcars, summary(mpg)) attach(mtcars); summary(mpg) summary(subset(mtcars, select=mpg))
A series of R packages from RStudio (posit). Design philosophy: Fast, consistent, purposeful functions. Focus in this course.
We need to automatically combine text, results, tables and figures:
Image from https://rosannavanhespenresearch.files.wordpress.com/
A markup language for typing.
An evolution of Markdown that includes executable code.
An important aspect of making code accessible is making it readable
In this course we will use The tidyverse style guide by Hadley Wickham
The styler
package has a convenient Rstudio Add-on that helps you transform your code according to the style guide
Image from http://phdcomics.com/comics/archive.php?comicid=1531
Not necessary for reproducibility, but a must for large projects over a long period of time.
Version management supports working with code projects in teams
Also provides .Rproj for increased portability.
All written in code: R
Portable: .Rproj (RStudio)
Available: GitHub
Automated: R Markdown