advanced R -functions



A pure function satisfies two property:

函数按其输入以及输出划分,分为普通函数(regular function),泛函(functionals),函数工厂(function factories),函数算子(function operator)。


A functional is a function that takes a function as an input and returns a vector as output.

map family


various map

One of the big differences between map2() and the simple function above is that map2() recycles its inputs to make sure that they’re the same length.

A big difference between pmap() and the other map functions is that pmap() gives you much finer control over argument matching because you can name the components of the list.

params <- tibble::tribble(
  ~ n, ~ min, ~ max,
   1L,     0,     1,
   2L,    10,   100,
   3L,   100,  1000

pmap(params, runif)
#> [[1]]
#> [1] 0.332
#> [[2]]
#> [1] 53.5 47.6
#> [[3]]
#> [1] 231 715 515


List Atomic Same type Nothing
One argument map() map_lgl(), … modify() walk()
Two arguments map2() map2_lgl(), … modify2() walk2()
One argument + index imap() imap_lgl(), … imodify() iwalk()
N arguments pmap() pmap_lgl(), … pwalk()

Reduce family

If you’re using reduce() in a function, you should always supply .init.



l <- map(1:4, ~ sample(1:10, 15, replace = T))
#> List of 4
#>  $ : int [1:15] 7 1 8 8 3 8 2 4 7 10 ...
#>  $ : int [1:15] 3 1 10 2 5 2 9 8 5 4 ...
#>  $ : int [1:15] 6 10 9 5 6 7 8 6 10 8 ...
#>  $ : int [1:15] 9 8 6 4 4 5 2 9 9 6 ...

reduce(l, intersect)
#> [1] 8 4

accumulate(l, intersect)
#> [[1]]
#>  [1]  7  1  8  8  3  8  2  4  7 10 10  3  7 10 10
#> [[2]]
#> [1]  1  8  3  2  4 10
#> [[3]]
#> [1]  8  4 10
#> [[4]]
#> [1] 8 4

Predicate functionals

A predicate functional applies a predicate to each element of a vector. purrr provides seven useful functions which come in three groups:

Base functions

Matrices and arrays

a2d <- matrix(1:20, nrow = 5)
apply(a2d, 1, mean)
#> [1]  8.5  9.5 10.5 11.5 12.5
apply(a2d, 2, mean)
#> [1]  3  8 13 18

Mathematical concerns

Functionals are very common in mathematics. The limit, the maximum, the roots (the set of points where f(x) = 0), and the definite integral are all functionals: given a function, they return a single number (or vector of numbers).

Base R provides a useful set:

The following example shows how functionals might be used with a simple function, sin():

integrate(sin, 0, pi)
str(uniroot(sin, pi * c(1 / 2, 3 / 2)))
str(optimise(sin, c(0, 2 * pi)))
str(optimise(sin, c(0, pi), maximum = TRUE))

总结:对于循环来说,我们考虑的是循环的是什么。对于泛函结构f(list(), .f(arg1 = ...1, arg2 = ...2, ...), arg = constant),我们将需要循环的内容放在.f前面,这个内容可以是.f的需要处理的数据向量,或者是变化的参数向量,将.f的固定内容放在.f后面,这些内容都是.f函数的参数,为了方便理解,建议显式的指定.f函数中的参数名称以及固定参数的名称。另外文中没有提到map_if, modify_if,因为我觉得这些选择现在可以用across()函数替代了,虽然该函数只能用于数据框结构。

Function factories

The enclosing environment of the manufactured function is an execution environment of the function factory.


Case study

Case in Graphical factories

y <- c(12345, 123456, 1234567)
comma_format()(y) #注意函数工厂comma_format返回的是一个函数,所以后面接上括号调用这个函数。
#> [1] "12,345"    "123,456"   "1,234,567"

umber_format(scale = 1e-3, suffix = " K")(y)
#> [1] "12 K"    "123 K"   "1 235 K"

Histogram bins的值也可以是函数。


plot_dev <- function(ext, dpi = 96) {

  eps =  ,
  ps  =  function(path, ...) {
      file = filename, ..., onefile = FALSE, 
      horizontal = FALSE, paper = "special"
  pdf = function(filename, ...) grDevices::pdf(file = filename, ...),
  svg = function(filename, ...) svglite::svglite(file = filename, ...),
  emf = ,
  wmf = function(...) grDevices::win.metafile(...),
  png = function(...) grDevices::png(..., res = dpi, units = "in"),
  jpg = ,
  jpeg = function(...) grDevices::jpeg(..., res = dpi, units = "in"),
  bmp = function(...) grDevices::bmp(..., res = dpi, units = "in"),
  tiff = function(...) grDevices::tiff(..., res = dpi, units = "in"),
  stop("Unknown graphics extension: ", ext, call. = FALSE)

#> function(filename, ...) grDevices::pdf(file = filename, ...)
#> <bytecode: 0x7fe857744590>
#> <environment: 0x7fe8575f6638>
#> function(...) grDevices::png(..., res = dpi, units = "in")
#> <bytecode: 0x7fe85947f938>
#> <environment: 0x7fe859169548>

Case in statistical factories

Case in combination of function factories and functionals

names <- list(
  square = 2, 
  cube = 3, 
  root = 1/2, 
  cuberoot = 1/3, 
  reciprocal = -1
funs <- purrr::map(names, power1)

#> [1] 8
#> function(x) {
#>     x ^ exp
#>   }
#> <bytecode: 0x7fe85512a410>
#> <environment: 0x7fe85b21f190>

# 直接使用list中函数的三种方式
with(funs, root(100))
#> [1] 10

#> The following objects are masked _by_ .GlobalEnv:
#>     cube, square
#> [1] 10

rlang::env_bind(globalenv(), !!!funs)
#> [1] 10

总结:函数工厂可以返回的函数如果只有一个,那么其返回函数的函数体的形式必定是一致的,为了改变这些函数体形式一致的函数的行为,给予函数工厂的函数参数不同值,改变返回函数的环境中的binds,就可以改变这些manufactured function的行为。这种形式可以完美的和泛函结合,产生大量的函数;函数工厂如果可以返回多个函数,那么一般形式为通过函数工厂的输入参数,经过choice结构,返回合适的函数。

Fuction operators

A function operator is a function that takes one (or more) functions as input and returns a function as output.
They’re just a function factory that takes a function as input

Case study

Capturing errors with purrr::safely()

safely() is a function operator that transforms a function to turn errors into data.

x <- list(
  c(0.512, 0.165, 0.717),
  c(0.064, 0.781, 0.427),
  c(0.890, 0.785, 0.495),

out <- map(x, safely(sum))
#> List of 4
#>  $ :List of 2
#>   ..$ result: num 1.39
#>   ..$ error : NULL
#>  $ :List of 2
#>   ..$ result: num 1.27
#>   ..$ error : NULL
#>  $ :List of 2
#>   ..$ result: num 2.17
#>   ..$ error : NULL
#>  $ :List of 2
#>   ..$ result: NULL
#>   ..$ error :List of 2
#>   .. ..$ message: chr "invalid 'type' (character) of argument"
#>   .. ..$ call   : language .Primitive("sum")(..., na.rm = na.rm)
#>   .. ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"

out <- transpose(map(x, safely(sum)))
#> List of 2
#>  $ result:List of 4
#>   ..$ : num 1.39
#>   ..$ : num 1.27
#>   ..$ : num 2.17
#>   ..$ : NULL
#>  $ error :List of 4
#>   ..$ : NULL
#>   ..$ : NULL
#>   ..$ : NULL
#>   ..$ :List of 2
#>   .. ..$ message: chr "invalid 'type' (character) of argument"
#>   .. ..$ call   : language .Primitive("sum")(..., na.rm = na.rm)
#>   .. ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"

Caching computations with memoise::memoise()

This is an example of dynamic programming, where a complex problem can be broken down into many overlapping subproblems, and remembering the results of a subproblem considerably improves performance.

fib2 <- memoise::memoise(function(n) {
  if (n < 2) return(1)
  fib2(n - 2) + fib2(n - 1)
#>    user  system elapsed 
#>   0.009   0.000   0.008

Creating your own function operators

urls <- c(
  "adv-r" = "", 
  "r4ds" = ""
  # and many many more
path <- paste(tempdir(), names(urls), ".html")

delay_by <- function(f, amount) {
  function(...) {

dot_every <- function(f, n) {
  i <- 0
  function(...) {
    i <<- i + 1
    if (i %% n == 0) cat(".")

  urls, path, 
  download.file %>% dot_every(10) %>% delay_by(0.1), 
  quiet = TRUE
