Advanced-R-Solutions/2-23-Measuring_performance.Rmd at master · anouel/Advanced-R-Solutions · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
```{r, include=FALSE}
source("common.R")
```

# Measuring performance

    ```{r, include=FALSE}
    library(magrittr)
    ```

## Profiling

1. __<span style="color:red">Q</span>__: Profile the following function with `torture = TRUE`. What is surprising? Read the source code of `rm()` to figure out what's going on.

    ```{r}
    f <- function(n = 1e5) {
      x <- rep(1, n)
      rm(x)
    }
    ```

   __<span style="color:green">A</span>__:

    ```{r, eval = FALSE, include = FALSE}
    profvis::profvis(f(), torture = TRUE)

    f()
    ```

<!-- from `?profvis()`: `Toruture` triggers garbage collection after every torture memory allocation call. -->

## Microbenchmarking

1. __<span style="color:red">Q</span>__: Instead of using `bench::mark()`, you could use the built-in function `system.time()`. But `system.time()` is much less precise, so you'll need to repeat each operation many times with a loop, and then divide to find the average time of each operation, as in the code below.

    ```{r, eval = FALSE}
    n <- 1e6
    system.time(for (i in 1:n) sqrt(x)) / n
    system.time(for (i in 1:n) x ^ 0.5) / n
    ```

   How do the estimates from `system.time()` compare to those from `bench::mark()`? Why are they different?

   __<span style="color:orange">A</span>__: (TODO: Last part of the quesiton: Why are the results different?)

    ```{r}
    n <- 1e6
    x <- runif(100)

    bench_res <- bench::mark(sqrt(x), x ^ 0.5)
    time_sqrt <- system.time(for (i in 1:n) sqrt(x)) / n
    time_powerhalf <- system.time(for (i in 1:n) x ^ 0.5) / n

    # compare results for sqrt(x)
    time_sqrt[["elapsed"]]
    as.numeric(bench_res$mean)[[1]]

    # compare results for x ^ sqrt(0.5)
    time_powerhalf[["elapsed"]]
    as.numeric(bench_res$mean)[[2]]
    ```

   - both approaches get the order of magnitude right, but the average method is a little faster.
   - this is surprising to me, because this approach also get's the (small) overhead of the `for`-loop
   - `system.time`-approach is only able to return the average execution time, which is not optimal for the skewed distribution of execution times.

<!-- Just in case, that you are not sold on bench::mark yet - remember, there is no fancy default plot for system.time. ;) -->

2. __<span style="color:red">Q</span>__: Here are two other ways to compute the square root of a vector. Which do you think will be fastest? Which will be slowest? Use microbenchmarking to test your answers.

    ```{r, eval = FALSE}
    x ^ (1 / 2)
    exp(log(x) / 2)
    ```

   __<span style="color:green">A</span>__: We'll use the "bench"-package to estimate the relative execution time of these expression, with the fastest expression standardized to 1.

    ```{r, message=FALSE}
    x <- runif(100)

    bench::mark(sqrt(x),          # (1)
                x ^ 0.5,          # (2)
                x ^ (1 / 2),      # (3)
                exp(log(x) / 2),  # (4)
                relative = TRUE) %>%
      dplyr::select(expression, median) %>%
      dplyr::arrange(median)
    ```

   As supposed, `exp(log(x)/2)` needs the longest time to calculate the square root of `x`.

## Old exercises
1. __<span style="color:red">Q</span>__: Instead of using `microbenchmark()`, you could use the built-in function `system.time()`. But `system.time()` is much less precise, so you'll need to repeat each operation many times with a loop, and then divide to find the average time of each operation, as in the code below.

    ```{r, eval = FALSE}
    n <- 1:1e6
    system.time(for (i in n) sqrt(x)) / length(n)
    system.time(for (i in n) x ^ 0.5) / length(n)
    ```

   How do the estimates from `system.time()` compare to those from `microbenchmark()`? Why are they different?

2. __<span style="color:red">Q</span>__: Here are two other ways to compute the square root of a vector. Which do you think will be fastest? Which will be slowest? Use microbenchmarking to test your answers.

    ```{r, eval = FALSE}
    x ^ (1 / 2)
    exp(log(x) / 2)
    ```

   __<span style="color:green">A</span>__: The second one looks more complex, but you never know...unless you test it.

    ```{r}
    x <- runif(100)
    microbenchmark::microbenchmark(
      sqrt(x),
      x ^ 0.5,
      x ^ (1 / 2),
      exp(log(x) / 2)
    )
    ```

3. __<span style="color:red">Q</span>__: Use microbenchmarking to rank the basic arithmetic operators (`+`, `-`, `*`, `/`, and `^`) in terms of their speed. Visualise the results. Compare the speed of arithmetic on integers vs. doubles.

   __<span style="color:green">A</span>__: Since I am on a Windows system, where these short execution times are hard to measure, I just ran the following code on a linux and paste the results here:

    ```{r, eval = FALSE}
    mb_integer <- microbenchmark::microbenchmark(
      1L + 1L, 1L - 1L, 1L * 1L, 1L / 1L, 1L ^ 1L,
      times = 1000000,
      control = list(order = "random",
                     warmup = 20000))

    mb_double <- microbenchmark::microbenchmark(
      1 + 1, 1 - 1, 1 * 1, 1 / 1, 1 ^ 1,
      times = 1000000,
      control = list(order = "random",
                     warmup = 20000))

    mb_integer
    # and got the following output:
    # Unit: nanoseconds
    #     expr min lq      mean median uq     max neval
    #  1L + 1L  50 66  96.45262     69 73 7006051 1e+06
    #  1L - 1L  52 69  88.76438     71 76  587594 1e+06
    #  1L * 1L  51 68  88.51854     70 75  582521 1e+06
    #    1L/1L  50 65  94.40669     68 74 7241972 1e+06
    #    1L^1L  67 77 102.96209     84 92  574519 1e+06

    mb_double
    # Unit: nanoseconds
    #   expr min lq      mean median  uq      max neval
    #  1 + 1  48 66  92.44331     69  75  7217242 1e+06
    #  1 - 1  50 66  88.13654     68  77   625462 1e+06
    #  1 * 1  48 66 135.88379     70  77 42974915 1e+06
    #    1/1  48 65  87.11615     69  77   659032 1e+06
    #    1^1  79 92 127.07686    103 135   641524 1e+06
    ```

   To visualise and compare the results, we make some short spaghetties:

    ```{r}
    mb_median <- data.frame(operator = c("+", "-", "*", "/", "^"),
                            int = c(69, 71, 70, 68, 84),  # same as mb_integer$median
                            dbl = c(69, 68, 70, 69, 103), # same as mb_double$median
                            stringsAsFactors = FALSE)

    mb_median <- tidyr::gather(mb_median, type, time, int, dbl)
    mb_median <- dplyr::mutate(mb_median, type = factor(type, levels = c("int", "dbl")))

    library(ggplot2)
    ggplot(mb_median, aes(x = type, y = time, group = operator, color = operator)) +
      geom_point(show.legend = FALSE) +
      geom_line(show.legend = FALSE, size = 1.5) +
      geom_label(aes(label = operator), show.legend = FALSE) +
      theme_minimal() +
      ylab("time in nanoseconds") +
      theme(axis.title.x = element_blank(),
            axis.title.y = element_text(size = 14),
            axis.text.x = element_text(size = 14),
            axis.text.y = element_text(size = 10)) +
      scale_y_continuous(breaks = seq(0, max(mb_median$time), 10))
    ```

4. __<span style="color:red">Q</span>__: You can change the units in which the microbenchmark results are
    expressed with the `unit` parameter. Use `unit = "eps"` to show
    the number of evaluations needed to take 1 second. Repeat the benchmarks
    above with the eps unit. How does this change your intuition for performance?

<!-- ## Language performance -->

1. __<span style="color:red">Q</span>__: `scan()` has the most arguments (21) of any base function. About how much time does it take to make 21 promises each time scan is called? Given a simple input (e.g., `scan(text = "1 2 3", quiet = T)`) what proportion of the total run time is due to creating those promises?

   __<span style="color:green">A</span>__: According to the textbook every extra argument slows the function down by approximately 20 nanoseconds, which I can't reproduce on my system:

    ```{r}
    f5 <- function(a = 1, b = 2, c = 4, d = 4, e = 5) NULL
    f6 <- function(a = 1, b = 2, c = 4, d = 4, e = 5, f = 6) NULL
    f7 <- function(a = 1, b = 2, c = 4, d = 4, e = 5, f = 6, g = 7) NULL
    f8 <- function(a = 1, b = 2, c = 4, d = 4, e = 5, f = 6, g = 7, h = 8) NULL
    microbenchmark::microbenchmark(f5(), f6(), f7(), f8(), times = 10000)
    ```

   However, for now we just assume that 20 nanoseconds are correct and in kind of doubt, we recommend to benchmark this value individually. With this assumption we calculate `21 * 20 = 420` nanoseconds of extra time for each call of `scan()`.

   For a percentage, we first benchmark a simple call of `scan()`:

    ```{r}
    (mb_prom <- microbenchmark::microbenchmark(
      scan(text = "1 2 3", quiet = T),
      times = 100000,
      unit = "ns",
      control = list(warmup = 1000)
      ))

    mb_prom_median <- summary(mb_prom)$median
    ```

   This lets us calculate, that ~`r round(420 / mb_prom_median, 4) * 100`% of the median run time are caused by the extra arguments.

2. __<span style="color:red">Q</span>__: Read ["Evaluating the Design of the R Language"](http://r.cs.purdue.edu/pub/ecoop12.pdf). What other aspects of the R-language slow it down? Construct microbenchmarks to illustrate.

3. __<span style="color:red">Q</span>__: How does the performance of S3 method dispatch change with the length of the class vector? How does performance of S4 method dispatch change with number of superclasses? How about RC?

4. __<span style="color:red">Q</span>__: What is the cost of multiple inheritance and multiple dispatch on S4 method dispatch?

5. __<span style="color:red">Q</span>__: Why is the cost of name lookup less for functions in the base package?

<!-- ## Implementations performance -->

1. __<span style="color:red">Q</span>__: The performance characteristics of `squish_ife()`, `squish_p()`, and `squish_in_place()` vary considerably with the size of `x`. Explore the differences. Which sizes lead to the biggest and smallest differences?

2. __<span style="color:red">Q</span>__: Compare the performance costs of extracting an element from a list, a column from a matrix, and a column from a data frame. Do the same for rows.