6 min read

That time I thought I invented pluck

Foreword

Just before publishing this article, I read the purrr cheatsheet and realised that what I developed here already exists. Its name is pluck and I definitely suggest that you use it.

TL;DR

This is an attempt to write a R helper function to subset lists that would be as nice as get-in in the functional programming language Clojure. As said above, this is in fact already available in purrr. Maybe some will still learn something about subsetting, tryCatch, <<- or purrr::reduce here as I did.

Subsetting in R

In R, you have two notations that you can use to access list elements. Both work with nested elements.

  • Bracket notation
my_list <- list(
  key_a=2, 
  key_b=list(
    key_b1="hello"
    )
  )
my_list[["key_b"]][["key_b1"]]
## [1] "hello"
  • Dollar notation
my_list$key_b$key_b1
## [1] "hello"

Interestingly, both notations support partial key names, as long as the partial names are precise enough to yield a unique match. For this, bracket notation requires that you set the exact argument to FALSE.

# both would work
c(
  my_list[["key_b"]][["k", exact=FALSE]], 
  my_list$key_b$k
)
## [1] "hello" "hello"

Querying a list for a key that does not exist returns NULL. And querying NULL for a key also returns NULL. So when you search nested keys, you can count on getting NULL if not found. You can go as deep as you want, since the first mismatch will yield NULL and the rest of the keys will be searched on this NULL.

# The 3 expressions below return NULL
NULL$hello
## NULL
my_list$its$me
## NULL
my_list[["i"]][["was"]][["wondering"]]
## NULL

Well, actually, you can only count on getting NULL if you are searching on a list. However, when exploring unknown deeply nested structures, it can happen that you end up searching for a key on something else like a string or a number. Rather than NULL, this will return errors, and not the same error whether you used bracket or dollar notation.

# Returns NULL
my_list$key_b$wrong_key_on_list
## NULL
# Returns "$ operator error"
my_list$key_b$key_b1$wrong_key_on_string
Error in my_list$key_b$key_b1$wrong_key_on_string : 
  $ operator is invalid for atomic vectors
# Returns "subscript out of bounds" error
my_list[["key_b"]][["key_b1"]][["wrong_key_on_string"]]
Error in my_list[["key_b"]][["key_b1"]][["wrong_key_on_string"]] : 
  subscript out of bounds

These errors can be very informative, but I wanted to see if I could get a notation that:

  • looked more functional (I find it nicer in pipelines)
  • always returned NULL if not found or a default value

The main inspiration was get-in from Clojure. get-in takes the object to search as its first argument, followed by a list of the keys, and finally an optional default value.

A R get-in

In R, your function can support a flexible number of argument if you use an ellipsis (the three dots ...), which you later convert to a list. The first version of get-in used a for loop to search in order the arguments parsed from the ellipsis.

getin <- function(l, ...) {
  keys <- list(...) 
  val <- l
  for (key in keys) {
    val <- val[[key]]
  }
  val
}

getin(my_list, "key_b", "key_b1")
## [1] "hello"

This works fine but we get the same NULL/error behavior as before.

# Return NULL
getin(my_list, "key_b", "wrong_key")
## NULL
# Returns "subscript out of bounds" error
getin(my_list, "key_b", "key_b1", "wrong_key")
Error in val[[key]] : subscript out of bounds

You can use tryCatch to capture errors which would normally stop the script and trigger functions instead.

getin <- function(l, ...) {
  keys <- list(...) 
  val <- l
  tryCatch({
    for (key in keys) {
      val <- val[[key]]
    }
  }, error = function(e) { val <- NULL }) 
  val
}

getin(my_list, "key_b", "key_b1", "key_b2")
## [1] "hello"

Hmm… I was hoping for NULL here, but it looks like, although the out of bound error was caught and avoided, the assignment of NULL to val in tryCatch’s error function was not successful. We ended up with hello instead, the last valid value of val (coming from val$key_b$keyb1).

The reason why it does not work as expected is because tryCatch’s error function has its own scope and we are just assigning a new val in it. If this sounds confusing, maybe consider exploring these parts of the Advanced R book. Short answer: to target a value in the outer scope in R, you can use <<-.

getin <- function(l, ...) {
  keys <- list(...) 
  val <- l
  tryCatch({
    for (key in keys) {
      val <- val[[key]]
    }
  }, error = function(e) { val <<- NULL }) 
  val
}

getin(my_list, "key_b", "key_b1", "key_b2")
## NULL

The scope problem could also be avoided altogether by using the tryCatch returned value as the last value of getin. This means returning val at the end of tryCatch’s main function and NULL in the error function.

getin <- function(l, ...) {
  keys <- list(...) 
  val <- l
  tryCatch({
    for (key in keys) {
      val <- val[[key]]
    }
    val
  }, error = function(e) { NULL })
}

getin(my_list, "key_b", "key_b1", "key_b2")
## NULL

Now, in R, there is a rule: everytime you are using loops on list, it is likely that your code could be better with purrr. In tutorials, purrr is most often used for its map functions which apply transformations to each element of a list, but the package also include reduce functions which can return a single result. For example, David Ranzolin elegantly uses reduce to join a list of data frames.

For getin, we want to use reduce to search one key at a time, starting from the provided list, so we need to create a new list that has:

  • the list to search on as its first item
  • the keys to search as the following items
getin <- function(l, ...) {
  search_list <- c(list(l), list(...))
  tryCatch({
    purrr::reduce(search_list, function(x, y) { x[[y]] }) 
  }, error = function(e) { NULL })
}

getin(my_list, "key_b", "key_b1")
## [1] "hello"
getin(my_list, "key_b", "key_b1", "key_b2")
## NULL
# Or, in more concise form

getin <- function(l, ...) {
  tryCatch({
    purrr::reduce(c(list(l), list(...)), ~.x[[.y]]) 
  }, error = function(e) { NULL })
}

getin(my_list, "key_b", "key_b1")
## [1] "hello"
getin(my_list, "key_b", "key_b1", "key_b2")
## NULL

Lastly, let’s add an optional not_found argument, that lets you provide an alternative value to NULL when the key is not found.

getin <- function(l, ..., not_found=NULL) {
  val <- tryCatch({
    purrr::reduce(c(list(l), list(...)), ~.x[[.y]]) 
  }, error = function(e) { NULL })
  if (is.null(val)) {
    not_found
  } else {
    val 
  }
}

Conclusion

As said in the foreword, I now know that this already exists in purrr. It is called pluck and, although at the time of this writing I could not find it on the website, it’s described in RStudio purrr cheatsheet and on Rdocumentation.org.

Annex

Here is a testthat to test getin in one call.

test_getin <- function(getin, my_list) {
  testthat::test_that("getin finds valid keys", {
    testthat::expect_equal(getin(my_list, "key_a"), my_list$key_a)
    testthat::expect_equal(getin(my_list, "key_a", not_found = "a_val"), 
                           my_list$key_a)
    testthat::expect_equal(getin(my_list, "key_b", "key_b1"), 
                           my_list$key_b$key_b1)
    testthat::expect_equal(getin(my_list, "key_b", "key_b1", not_found = 2),
                           my_list$key_b$key_b1)
  })
  testthat::test_that("getin return NULL or not_found for absent keys on all types", {
    testthat::expect_null(getin(my_list, "key_c"))
    testthat::expect_null(getin(my_list, "key_b", "key_b1", "key_b2"))
    testthat::expect_equal(getin(my_list, "key_c", not_found = "bla"), "bla")
    testthat::expect_equal(getin(my_list, "key_b", "key_b1", "key_b2", 
                                 not_found = list(b=2)), list(b=2))
  })
}

test_getin(getin, my_list)