Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quosure based resamplr #11

Open
jrnold opened this issue Jun 26, 2017 · 2 comments
Open

Quosure based resamplr #11

jrnold opened this issue Jun 26, 2017 · 2 comments

Comments

@jrnold
Copy link
Owner

jrnold commented Jun 26, 2017

Use the new rlang quosures and tidy eval for resampling functions. Quosures are unevaluated expressions so dont' take up much memory, and keep the environment in which they should be evaluated, so we don't need to rely on R's internal copy-on-modify mechanics. The latter is nice since I don't want to rely on something that is a little magical, and can easily break without me noticing.

I'm still uncertain about what this looks like, so this issue includes (will include) comments as a puzzle through it.

What any resampling method needs:

  1. Number of elements to resample from or a vector of identifiers.There are two general classes of resampling methods:

    • ungrouped: need either the number of indexes or a vector of identifiers. Note: I
    • grouped: need either an integer vector (values are the number of elements in each group, length is the number of groups) or a list of vectors (values are vectors of identifiers).
  2. An extraction function: a function of arguments, x (the object to extract from) and idx (which gives the elements to extract).

Resampling object

Yeah, so quosures are awesome, but how would this work?

The function that creates them will look something like

create_samples(expr, ...)

The expr can be a quosure or we just grab it unevaluated and capture the environment.

Provide a single expression

create_samples(~x, ...)

The expression can be evaluated for its type and apply the appropriate functions to get the identifiers and extract elements from the object. These could also be optionally provided.

I could just have the user write an expression where .idx stands in for the indexes to be provided.

create_samples(~ x[.idx, , drop = FALSE], list(1:5, 1:8))

The problem with this is then the user needs to provide the identifiers to sample. However, they don't need to provide the extraction method. This is very general, but can be somewhat redundant, since the user has to effectively write the extraction function every time they use it.

Two ideas for the object itself

  1. A quosure object or a subclass of quosures. To draw the samples the user needs to call some function, either tidy_eval or a wrapper provided by this package.
  2. A subclass of function(). This has the nice feature that to evaluate it, the user only has to

The identifiers can be extracted via a function since in either case they'll reside in some environment.

Resampling functions

Expose all the lower-level functions which work on only identifiers or number of obs.

The following functions for each resampling algorithm could be written:

bootstrap(x, ...)
bootstrap_idx(idx, ...)
bootstrap_n(n, ...)

Where x is some arbitrary object, idx is a vector of identifiers, and n is the number of elements. The bootstrap_n form is the lower level function since it is all that is needed for the resampling algorithm, and bootstrap_idx will simply apply bootstrap_n to vectors of identifiers.
Then bootstrap() is only responsible for providing a lower level function with the number of elements or a list of identifiers.

To handle groups: bootstrap() is a generic function with methods:

  • list: use grouped bootstrapping.
  • default: use non-grouped bootstrapping

There is some ambiguity if for some reason identifiers have to be a list, but that's tough shit. Identifiers should be atomic vectors. If that is really needed, the user needs to deal with it in the extraction function.

bootstrap_idx(idx, ...)
bootstrap_n(n, ...)

In base R, there is sample and sample.int, but I can't use that naming convention since . should be reserved for S3 methods, and something like sample_int would suggest that it returns integer values, e.g. map_int in purrr.

One idea would be to have a single generic function with methods, and internal logic that does different things for a scalar integer. I cannot treat a vector with length one as the number of obs, since it doesn't handle the edge case of a single integer identifier.

Notes

I must ensure that resampling a resample object works

@jrnold
Copy link
Owner Author

jrnold commented Jun 27, 2017

Some messing around with this:

library("rlang")

create_samples <- function(q, idx) {
  f <- as_function(q)
  print(f)
  function() f(idx)
}
smpl <- create_samples(~ mtcars[., , drop = TRUE], 1:5)
smpl()

Suppose the use provides a function to generate the item, and a function to extract samples:

create_samples2 <- function(data, idx) {
  q <- enquo(data)
  .f <- function(x, i) x[i, , drop = FALSE]
  out <- quo(.f(!!q, idx))
  structure(out, expr = q, idx = idx, class = c("resamplq", class(out)))
}

print_vec <- function(x, n = length(x)) {
  stringr::str_c(if (length(x) > n) {
    c(x[seq_len(n)], "...")
  } else x, collapse = ", ")
}

print.resamplq <- function(x, ...) {
  idx <- print_vec(environment(x)$idx, 10)
  cat(paste0("<resample: ", deparse(set_attrs(attr(x, "expr"), NULL)), "> ",
      paste(idx, collapse = ", "), "\n", sep = ""))
  invisible(x)
}

q <- create_samples2(mtcars, 1:20)
print(q)
eval_tidy(q)

The required inputs for this are:

  • expression (quosure) to calculate data
  • index values - if not explicit values, as in bootstrap or other algorithms, then the number of observations, or a list of values.
  • function of two args: data, idx to extract the values
create_samples3 <- function(data, idx, ...) {
  q <- enquo(data)
  .f <- function(x, i) x[i, , drop = FALSE]
  # I think this would be better with new_function
  # but I was having a hard time setting the body
  out <- function() { eval_tidy(quo(.f(!!q, idx))) }
  structure(out, expr = q, idx = idx)
}
smpl3 <- create_samples3(mtcars, 1:5)
smpl3
smpl3()

@jrnold
Copy link
Owner Author

jrnold commented Jun 28, 2017

Question: how to specify the indexes and the extraction function?

  1. Evaluate expression and dispatch method

    • this requires the expression to include objects that exist
    • it's not necessary, but it may require the
  2. Require user to provide number or list of indexes and an extraction function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant