Do this before class

Read R4DS chapter 19.

Solve the course Introduction to Writing Functions in R at Datacamp.

During class

Complete the exercises in R4DS sections 19.2.1, 19.3.1, 19.4.4, 19.5.5.

Introduction

Functional programming is a vast (and quite complicated) subject. It is quite different to the usual concept of object oriented programming. Although you should know what functional programming is, we wont cover much about the concept other than the basics.

Functions

As the name entails, functions are a big part of functional programming. In R, functions are objects and consist of three components. The arguments (or formals if you want to be fancy), a body and the environment. The arguments are exactly what the describe, though they may not need be defined! The body is what the function is supposed to do and its environment is what is used

With the question below we are going to take a look at how functions can help with readability but also adaptability.

Monte Carlo, playing games and functions

Functions with function arguments and ...!

If you take a look at the R help for the mean function you can see the following entry under Arguments

... further arguments passed to or from other methods.

The ... is essentially a “anything and everything”-type of argument. If you are used to Python this is kind of the same as **kwargs. Anything can go in there, which is really powerful but also quite worrying. How do we know if the arguments we pass to ... will ever be used? Take the following (trivial) example

x <- runif(10)
mean(x)
mean(x, this_question_is_useless_i_hate_you_felix = FALSE)

Try to answer the following questions:

  • Do you expect the same output and is it?

  • Do functions need arguments?

  • What happens if you misspell the actual argument trim with say, trm?

We are going to use this feature to play games. We use it to set up how a game is played and with what probabilities. Implement the function below according to its instructions.

simulate_round <- function(probability_fun, ...){
  # Function which specifies a round of a game. It will
  # simulate an outcome from a Bernoulli distribution whose 
  # probability of  sucess is determined by the function 
  # probability_fun. The arguments of the function is not yet
  # determined which must be accounted for.
}

Playing games

We are going to simulate a game where there are two teams who play against each other. The game consists of a number of rounds. If a team wins one round then the team gets one point. The win conditions (e.g. when one team wins against the other) are

  1. First to 16, 19, 22, 25, ...,
  2. Any team needs to win with two points.

To clarify, the first team who gets 16 points wins, given that the other team has less than 15 points. If the score is 15-15 then the teams will continue playing. Given that the score is 15-15, the first team to get 19 points wins, given that the other team has less than 18. If the score is 18-18 then the teams will continue and so the match continues.

Implement the following function which determines if a game is finished or not

win_condition_achieved <- function(scoreA, scoreB) {
   # Function which determines if a win condition is achieved.
   # It takes the score of player A and B and returns a bool
   # TRUE/FALSE if the game has ended, e.g. someone has won.
}

Our interest is the team-specific conditional distribution(s) of scores given a team wins. For us to determine these quantities we need a little background of Monte Carlo integration. If you feel like you know how Monte carlo integration works, feel free to skip it.


A very brief introduction to Monte Carlo integration

If the stochastic variable \(X\) is follows a standard uniform distribution, then \(E(f(X)) =\int_a ^b f(x)dx\). This can be used in order to approximate integrals since \[ E(f(X)) \approx \frac{1}{N} \sum_i^N f(x_i) \] That is, we simulate a large number \(x_1,…,x_N\) and evaluate the function \(f\) accordingly. By the Law of Large Numbers, \(\sum_{i=1}^N f(x_i)/N\) will converge to the expectation and can thus be viewed as a numerical approximation of the integral. Hence, a probability can be evaluated according to
\[ P(X\leq x) = E(\mathbf{1}(X \leq x)) \approx \frac{1}{N} \sum_i^N \mathbf{1}(x_i \leq x). \]


The key takeaway is that to approximate anything involving an integral, we can write it as as an expectation and compute it through simulation using Monte Carlo integration. So even for “very complicated” events we can always approximate their probability by simulation! We are going to use our functions to simulate the game. First thing is to determine how the probability is going to be calculated. Remember that its always best to start simple.

  • Complete the following function with your own idea on how a probability should be determined (or continue to read and see some examples)
probability_fun <- function(scoreA, scoreB, ...) {
  # code
}

For each simulation iteration we want to keep track of the score of the team A, the score of team B and who won. Store these in the matrix results. Use all functions we have described above to simulate a number of repetitions (simulations) of the game. While no one has won, we need to

  1. Simulate one round
  2. Update score
  3. Check if someone has won, if yes stop, save the outcomes, otherwise continue with 1.
nsims <- 100 # change to a larger number if you want though it might take some time
results <- matrix(nrow=nsims, ncol=3)
for (x in 1:nsims) {
  game_won <- FALSE
  scoreA <- 0
  scoreB <- 0
  
  while (!game_won) {
    # code
  }
  # save outcomes
}
colMeans(results)

Try to answer the following questions:

  • What do the numbers of colMeans(results) represent? What are they trying to approximate?
  • Why is this more adaptable than other type of implementations? Do you think its more readable?
  • What happens if you let probability_fun return 0.5 or 0.51 regardless of the score? How does the distribution of number of rounds played look? What happens if probability_fun depends on the score (e.g. it might return (scoreA+0.01) / (scoreA + ScoreB + 0.02))? Could you think of a probbility_fun that leads to a large expected number of rounds played?
  • We actually don’t need to store who won. That can be computed through the score for each team. How could you easily (and vectorized) compute such a variable with the score of the two teams?
  • Monte Carlo integration is slow but can in some cases be the best of the worst. Since each simulation is independent of each other, google for some kind of parallelization package that works with your OS. Try to parallelize the “for” loop. For the interested, search on google (or take a numerics course) and see where ordinary integration fails and where monte carlo integration might work better.