advanced R - 基础

2022-06-22


Names and values

Name 指向 value;
保留字的问题,可以用反引号实现非合法的命名。

Copy-on-modify:

Modify-in-place

Garbage collector(gc): 自动内存管理,没必要了解太多。

总结:了解向量name, value之间的关系。实际上在R中想要避免修改对象时复制基本不太可能,除了环境这种特殊的数据结构具有reference semantics, 其它的R对象想要避免复制,只能通过对象仅有一个name(performance optimization)实现, 但是函数调用(除了primitive functions – c functions)总会给R对象引入一个新的name,而且R对于R对象name的计数只用0,1,more,三种,也就是说一旦R对象有了2个name,对R来说该对象就有more个name,即使这时候R对象去掉一个name,对R来说 more -1 = more, 该对象仍有多个name,在修改对象时还是会对其进行复制。

Vectors

这里提到的向量是一类数据结构,是广义的向量,包括原子型向量(atomic vector)以及列表(lists)。
原子型向量和列表有什么区别呢?没有区别,除了原子型向量的每个元素必须相同,所以它们都是向量(Vector)。实际上列表的每个元素也是相同的,因为列表的每个元素都是指向其对象的的reference,只不过列表的每个元素存储的是指向不同对象的箭头。Null和向量的关系?Null代表了空的向量,或者缺失的向量,用于删除向量元素,或者作为函数的默认参数。

属性(attributes),是向量的额外数据(metadata),为name-value的结构(list),通过R对象的箭头指向我们就可以找到它的额外数据。属性一般都是暂时的,除了name(name也是属性,想不到吧)和维度。如果想要永久的保留属性,我们需要通过class属性创建自己的S3 object。

Dimension attributes power atomic vector to matrix and array; list to list-matrices and list-arrays. Class attributes power atomic vector to factors, date, date-times, difftime vectors; list to date.frame and tibble.

Atomic

Atomic vector

Atomic vector S3 object

List

List S3 object
Data.frame and tibble:

Null

NULL is special because it has a unique type, is always length zero, and can’t have any attributes. It has two usages:

Subset

Six ways

Subset operators:
[, [[, $. @ and slot() for S4 object.

NB: Factors are not treated specially when subsetting. This means that subsetting will use the underlying integer vector, not the character levels. This is typically unexpected, so you should avoid subsetting with factors:

x <- c(2.1, 4.2, 3.3, 5.4)
(y <- setNames(x, letters[1:4]))
y[factor("b")]
#>   a 
#> 2.1

Factor subsetting has a drop argument. It controls whether or not levels (rather than dimensions) are preserved, and it defaults to FALSE. If you find you’re using drop = TRUE a lot it’s often a sign that you should be using a character vector instead of a factor.

Some wired:
If you use a vector with [[, it will subset recursively, i.e. x[[c(1, 2)]]is equivalent to x[[1]][[2]].

总结:
subset的关键在于6种方式与3种(不考虑仅用于S4的操作符)操作符的组合。[ 用于获取向量(广义)的一个或多个元素(保持其结构不变),但是如果对于data frame仅获取了一个元素,那么其结构drop为原子型向量。[[,$用于仅获取列表的一个元素(结构改变),但如果该列表为data frame,$会进行部分匹配,如果列表具有嵌套结构可以考虑使用函数purrr:pluck或者purrr:chuck。6种匹配方式,其中正整数与字符使用可以说是完全相等,逻辑值与正整数也可以进行替换。subset的应用的主要应用方式为键值匹配以及排序。

x <- c("m", "f", "u", "f", "f", "m", "m")
lookup <- c(m = "Male", f = "Female", u = NA)
lookup[x]
#>        m        f        u        f        f        m        m 
#>   "Male" "Female"       NA "Female" "Female"   "Male"   "Male"
x <- c("b", "c", "a")
order(x)
#> [1] 3 1 2
x[order(x)]
#> [1] "a" "b" "c"

逻辑值与正整数(which(TRUE)一般可以替换,它们的两个区别在于:

另外还需要提到的就data frame以及tibble虽然在本质上与矩阵数组完全不同,不过它们在结构上有一致性,所以都可以通过1维,2维,3维进行提取。

Control flow

Choice: if; switch; vectorized if(ifelse, case_when).
Loop: for(loop in vector); while(condition loop); repeat(loop forever).

if:

for:

for (i in seq_along(xs)) {
  print(xs[[i]])
}

Stop loop

Functions

Everything that exists is an object.
Everything that happens is a function call.
— John Chambers

Function fundamentals

Functions can be broken down into three components: arguments, body, and environment.There are exceptions to every rule, and in this case, there is a small selection of “primitive” base functions that are implemented purely in C. Functions are objects, just as vectors are objects.

Function components

Like all objects in R, functions can also possess any number of additional attributes(). One attribute used by base R is srcref, short for source reference.

R functions are objects in their own right, a language property often called “first-class functions”. Thus, we can put functions in a list or create anonymous function.

Invoke function

args <- list(1:10, na.rm = TRUE)
do.call(mean, args)

Function composition

Function forms

By position, like help(mean). Using partial matching, like help(top = mean), options (warnPartialMatchArgs = TRUE). By name, like help(topic = mean).

`modify<-` <- function(x, position, value) {
  x[position] <- value
  x
}
modify(x, 1) <- 10
x
#>  [1] 10  5  3  4  5  6  7  8  9 10

Any call can be written in prefix form.

x + y
`+`(x, y)

names(df) <- c("x", "y", "z")
`names<-`(df, c("x", "y", "z"))

for(i in 1:10) print(i)
`for`(i, 1:10, print(i))

函数执行机制

Lexical scoping
R uses lexical scoping: it looks up the values of names based on how a function is defined, not how it is called. “Lexical” here is not the English adjective that means relating to words or a vocabulary. It’s a technical CS term that tells us that the scoping rules use a parse-time, rather than a run-time structure.

Lazy evaluation
In R, function arguments are lazily evaluated: they’re only evaluated if accessed.

…(dot-dot-dot)
... (pronounced dot-dot-dot). With it, a function can take any number of additional arguments. It can use with list(…)orrlang::list2()` to support splicing.

Exiting a function

总结:函数由3个部分组成,包括参数,函数体以及环境。函数的环境一般为定义函数时的环境,该环境也是Lexical scope的环境。函数的形式有四种,并且所有形式都可以写为prefix form, 将其它形式写为prefix form的重点是了解该函数的名称。函数的执行包括两个方面,一个是如何获取name对应的值,函数通过lexical scope在定义函数的环境中一层层向上查找name对应的value;另一个问题是如何获取参数值,R函数通过lazy evaluation获取参数值,即只有在需要获取参数值时才从对应的数据结构(promise)中评估该参数值,并且获取以后将其保存在promise中(value)。需要注意的是当评估参数值时,用户提供的参数是在调用环境中进行评估,而函数的默认参数则是函数内部进行评估。关于函数退出时执行代码,更多可以参考这里。关于表达式和环境的关系是一个很重要的概念,以后还会提到,基于这两者才可以实现参数的lazy evaluation以及元编程。

Environment

Generally, an environment is similar to a named list, with four important exceptions:

Environment basics

Special environments

所谓的环境不过是已知一个名如何寻找其值的问题。关于各种环境,借助于rlang包一看便知。

Environment as a data structure

my_env <- new.env(parent = emptyenv())
my_env$a <- 1

get_a <- function() {
  my_env$a
}
set_a <- function(value) {
  old <- my_env$a
  my_env$a <- value
  invisible(old)
}

Condition system

Signalling conditions
Ignoring conditions
# the error message will be displayed but execution will continue
f <- function(x) {
  try(log(x))
  10
}
f("a")
#> Error in log(x) : non-numeric argument to mathematical function
#> [1] 10

suppressWarnings({
  warning("Uhoh!")
  warning("Another warning")
  1
})
#> [1] 1

suppressMessages({
  message("Hello there")
  2
})
#> [1] 2
Handling conditions
cnd <- catch_cnd(stop("An error"))
str(cnd)
#> List of 2
#>  $ message: chr "An error"
#>  $ call   : language force(expr)
#>  - attr(*, "class")= chr [1:3] "simpleError" "error" "condition"

conditionMessage(cnd)
conditionCall(cnd)
tryCatch(
 error = function(cnd) {
   # code to run when error is thrown
 },
 code_to_run_while_handlers_are_active
 finally = {
   code_to_run_regardless_of_whether_the_initial_expression_succeeds_or_fails
 }
)

withCallingHandlers(
 warning = function(cnd) {
   # code to run when warning is signalled
 },
 message = function(cnd) {
   # code to run when message is signalled
 },
 code_to_run_while_handlers_are_active
)
tryCatch(
  message = function(cnd) cat("Caught a message!\n"), 
  {
    message("Someone there?")
    message("Why, yes!")
  }
)
#> Caught a message!

withCallingHandlers(
  message = function(cnd) cat("Caught a message!\n"), 
  {
    message("Someone there?")
    message("Why, yes!")
  }
)
#> Caught a message!
#> Someone there?
#> Caught a message!
#> Why, yes!
# Bubbles all the way up to default handler which generates the message
withCallingHandlers(
  message = function(cnd) cat("Level 2\n"),
  withCallingHandlers(
    message = function(cnd) cat("Level 1\n"),
    message("Hello")
  )
)
#> Level 1
#> Level 2
#> Hello

# Bubbles up to tryCatch
tryCatch(
  message = function(cnd) cat("Level 2\n"),
  withCallingHandlers(
    message = function(cnd) cat("Level 1\n"),
    message("Hello")
  )
)
#> Level 1
#> Level 2
# Muffles the default handler which prints the messages
withCallingHandlers(
  message = function(cnd) {
    cat("Level 2\n")
    cnd_muffle(cnd)
  },
  withCallingHandlers(
    message = function(cnd) cat("Level 1\n"),
    message("Hello")
  )
)
#> Level 1
#> Level 2

# Muffles level 2 handler and the default handler
withCallingHandlers(
  message = function(cnd) cat("Level 2\n"),
  withCallingHandlers(
    message = function(cnd) {
      cat("Level 1\n")
      cnd_muffle(cnd)
    },
    message("Hello")
  )
)
#> Level 1
Custom condition object