readrtibble and read_*The read_* functions are extremely clever, helping you read data and always returning a tibble.
A tibble is a “nicer” and more honest data.frame
Will this turn out as you imagined?
data.frame(x=list("="="=", "B"="b"), # length 2
y=c(1,2,3)) # length 3
tibble(x=list("="="=", "B"="b"),
y=c(1,2,3))
data.frame(x=list("A"="a", "B"="b"),
y=list("C"="c", "D"="d"))
tibble(x=list("A"="a", "B"="b"),
y=list("C"="c", "D"="d"))
read_*read.*, used in the same way.col_types (colClasses from read.*) and guess_maxread.csv instead of read_csvreadxl::read_excelreadxl, excel files are specialreadxl::read_excel function to read .xls/ .xlsx files.read_excel the sheet argument.dplyrarrange arranges the rows in the datamtcars <- tibble::rownames_to_column(mtcars, var = "model") kable(head(arrange(mtcars, mpg), n = 4))
| model | mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Cadillac Fleetwood | 10.4 | 8 | 472 | 205 | 2.93 | 5.250 | 17.98 | 0 | 0 | 3 | 4 |
| Lincoln Continental | 10.4 | 8 | 460 | 215 | 3.00 | 5.424 | 17.82 | 0 | 0 | 3 | 4 |
| Camaro Z28 | 13.3 | 8 | 350 | 245 | 3.73 | 3.840 | 15.41 | 0 | 0 | 3 | 4 |
| Duster 360 | 14.3 | 8 | 360 | 245 | 3.21 | 3.570 | 15.84 | 0 | 0 | 3 | 4 |
arrange arranges the rows in the datakable(head(arrange(mtcars, mpg, disp), n = 4))
| model | mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Lincoln Continental | 10.4 | 8 | 460 | 215 | 3.00 | 5.424 | 17.82 | 0 | 0 | 3 | 4 |
| Cadillac Fleetwood | 10.4 | 8 | 472 | 205 | 2.93 | 5.250 | 17.98 | 0 | 0 | 3 | 4 |
| Camaro Z28 | 13.3 | 8 | 350 | 245 | 3.73 | 3.840 | 15.41 | 0 | 0 | 3 | 4 |
| Duster 360 | 14.3 | 8 | 360 | 245 | 3.21 | 3.570 | 15.84 | 0 | 0 | 3 | 4 |
filter select rows (observations)#only those with manual transmission kable(head(filter(mtcars, am == 1), n=4))
| model | mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Mazda RX4 | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 |
| Mazda RX4 Wag | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 |
| Datsun 710 | 22.8 | 4 | 108.0 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 |
| Fiat 128 | 32.4 | 4 | 78.7 | 66 | 4.08 | 2.200 | 19.47 | 1 | 1 | 4 | 1 |
filter select rows (observations)kable(head(filter(mtcars, mpg < 30), n=4))
| model | mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Mazda RX4 | 21.0 | 6 | 160 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 |
| Mazda RX4 Wag | 21.0 | 6 | 160 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 |
| Datsun 710 | 22.8 | 4 | 108 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 |
| Hornet 4 Drive | 21.4 | 6 | 258 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 |
mutate introduces new/transforms variablekable(head(mutate(mtcars, lpm = 235 / mpg), n=4))
| model | mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | lpm |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Mazda RX4 | 21.0 | 6 | 160 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 | 11.19048 |
| Mazda RX4 Wag | 21.0 | 6 | 160 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 | 11.19048 |
| Datsun 710 | 22.8 | 4 | 108 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 | 10.30702 |
| Hornet 4 Drive | 21.4 | 6 | 258 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 | 10.98131 |
select selects variables (columns)kable(head(select(mtcars, model, mpg), n=4))
| model | mpg |
|---|---|
| Mazda RX4 | 21.0 |
| Mazda RX4 Wag | 21.0 |
| Datsun 710 | 22.8 |
| Hornet 4 Drive | 21.4 |
%>%What function does the argument x=3 take?
df <- f1(f2(f3(f4(mtcars, x=2), x), x=3))
This is not easy to read! How can we solve it?
%>%Determine \(h\circ g \circ f(a) = h(g(f(a)))\)
Three different ways to calculate this in R:
b <- f(a) c <- g(b) h(c)
h(g(f(a)))
a %>%
f() %>%
g() %>%
h()
%>%mtcars <- mutate(mtcars, lpm = 235 / mpg) mtcars <- filter(mtcars, am == 1) ggplot(mtcars, aes(x = hp, y = lpm)) + geom_point()
ggplot(
filter(
mutate(mtcars, lpm = 235 / mpg)
, am ==1),
aes(x = hp, y = lpm)) + geom_point()
mtcars %>%
mutate(lpm = 235 / mpg) %>%
filter(am == 1) %>%
ggplot(aes(x = hp, y = lpm)) + geom_point()
ggplot2ggplot2A statistical plot has components
datageom: type of geometric objects (points, lines, …)cord: coordinate systemmapping: binds data to the dimensions/“aesthetics” of the coordinate system (position, color, shape, size, …)ggplot2A scatterplot
data: mpg and hp for a number of carsgeom: pointscoord: Cartesianmapping: binds hp to position on the x axis and mpg to the y axisggplot2ggplot(data = mtcars, mapping = aes(x = hp, y = mpg)) + geom_point()
ggplot2 with some color and sizesggplot(mtcars,
aes(x = hp, y = mpg, size = wt, color = cyl)) +
geom_point()
What is cyl?
ggplot2: the types in mtcars matter!ggplot(mtcars,
aes(x = hp, y = mpg, size = wt, color = as.factor(cyl))) +
geom_point()
ggplot2 arguments outside of aes but in a geomggplot(mtcars,
aes(x = hp, y = mpg, size = wt)) +
geom_point(color = cyl)
# ERROR
ggplot(mtcars,
aes(x = hp, y = mpg, size = wt)) +
geom_point(color = "red")
# OK
aes looks in mtcars but not geom!
ggplot2Be careful about reusing names in columns, variables, etc.
cyl <- "blue"
ggplot(mtcars,
aes(x = hp, y = mpg, size = wt)) +
geom_point(color = cyl)
Error?