A Simulation Study using Piecemeal

Suppose that we want to simulate a function of two variables, $x$ and $y$, for every combination of $x = 1, 2$ and $y = 1, 3, 9, 27$, in a factorial design.

The function ultimately returns $x\times y$ and a random number between 0 and 1, but to get there, it relies on an external constant a equalling to 8 that is defined in its environment and on the package rlang.

Unfortunately, our initial implementation also has some bugs that cause it to crash if the sum of the product and the hundredths digit of the random number is divisible by 4, with an error message that depends on the remainder of dividing it by 8.

a <- 8
f <- function(x, y) {
  p <- x*y
  u <- runif(1)

  errcond <- p + floor(u * 100) %% 10 + a
  if(errcond %% 4 == 0) stop("condition ", errcond %% 8, call. = FALSE)

  # rlang::dbl
  dbl(p = p, u = u)
}

We begin by setting the directory for run results. We will use R’s temporary directory:

outdir <- file.path(tempdir(), "piecemeal_demo")

sim <- piecemeal::init(outdir) # short for `piecemeal::Piecemeal$new(outdir)`

sim$factorial(x = 2^(0:1), y = 3^(0:3))

sim
#> A Piecemeal simulation
#> Output directory: /tmp/RtmpWFcXKT/piecemeal_demo 
#> 
#> Design: 8 treatment configurations by 3 seeds = 24 runs
#> 
#> Cluster:
#>   configuration: makeCluster(2)
#> 
#> Call for each configuration and seed:
#>   function (x, y) 
#>   {
#>       p <- x * y
#>       u <- runif(1)
#>       errcond <- p + floor(u * 100)%%10 + a
#>       if (errcond%%4 == 0) 
#>           stop("condition ", errcond%%8, call. = FALSE)
#>       dbl(p = p, u = u)
#>   } 
#> 
#> Options:
#>   directory split: 1 level(s) by treatment and 1 level(s) by seed
#>   errored runs: auto 
#> 
#> Ready to execute? Yes.

sim$
  cluster(2)$
  factorial(x = 2^(0:1), y = 3^(0:3))$
  nrep(3)$
  worker(f)

We can obtain a list of configurations (combinations of treatment and seed) to be run. (Here, first two are shown.)

head(sim$todo(), 2)
#> [[1]]
#> [[1]]$seed
#> [1] 1
#> 
#> [[1]]$treatment
#> [[1]]$treatment$x
#> [1] 1
#> 
#> [[1]]$treatment$y
#> [1] 1
#> 
#> attr(,"hash")
#> [1] "17c1942f162d67ae0ffcaf64a3e546ce"
#> 
#> [[1]]$fn
#> [1] "17c1942f162d67ae0ffcaf64a3e546ce.1.rds"
#> 
#> [[1]]$subdirs
#> [1] "17" "1" 
#> 
#> 
#> [[2]]
#> [[2]]$seed
#> [1] 2
#> 
#> [[2]]$treatment
#> [[2]]$treatment$x
#> [1] 1
#> 
#> [[2]]$treatment$y
#> [1] 1
#> 
#> attr(,"hash")
#> [1] "17c1942f162d67ae0ffcaf64a3e546ce"
#> 
#> [[2]]$fn
#> [1] "17c1942f162d67ae0ffcaf64a3e546ce.2.rds"
#> 
#> [[2]]$subdirs
#> [1] "17" "2"

We can now execute this setup. It will print a summary of the results of each run.

sim$run()
#> Starting 24 runs. (0 already done.)
#>                                              Status Runs
#> 1 Error in (function (x, y)  : object 'a' not found   24

We can also view a summary of completed, pending, and ongoing runs any time. If any runs have been completed, an estimated time to finish the rest at the current rate will also be estimated.

sim$status() # or summary(sim)
#> A Piecemeal simulation
#> Output directory: /tmp/RtmpWFcXKT/piecemeal_demo 
#> 
#>                                              Result Freq
#> 1 Error in (function (x, y)  : object 'a' not found   24

Looks like all configurations failed, because the worker nodes can’t find the variable a. Let’s export it to the worker nodes, and try again. This will also automatically clear the failed runs next time we run the simulation.

sim$export_vars("a")
sim$run()
#> Deleting...
#> 24 failed runs deleted.
#> Starting 24 runs. (0 already done.)
#>                                                       Status Runs
#> 1                                        Error : condition 0    4
#> 2                                        Error : condition 4    4
#> 3 Error in dbl(p = p, u = u) : could not find function "dbl"   16

We have new errors. Most of the nodes complain that we haven’t loaded the rlang package. Let’s fix that:

sim$setup({library(rlang)})
sim$run()
#> Deleting...
#> 24 failed runs deleted.
#> Starting 24 runs. (0 already done.)
#>                Status Runs
#> 1 Error : condition 0    4
#> 2 Error : condition 4    4
#> 3                  OK   16

Some runs have succeeded. Here’s what the individual run output files look like:

list.files(outdir, recursive = TRUE)
#>  [1] "17/1/17c1942f162d67ae0ffcaf64a3e546ce.1.rds"
#>  [2] "17/2/17c1942f162d67ae0ffcaf64a3e546ce.2.rds"
#>  [3] "17/3/17c1942f162d67ae0ffcaf64a3e546ce.3.rds"
#>  [4] "59/1/59ea3aa86ca8ae93bfd788bf4f62ff81.1.rds"
#>  [5] "59/2/59ea3aa86ca8ae93bfd788bf4f62ff81.2.rds"
#>  [6] "59/3/59ea3aa86ca8ae93bfd788bf4f62ff81.3.rds"
#>  [7] "dc/1/dc4b1f170cacadc9085973210ed8a325.1.rds"
#>  [8] "dc/2/dc4b1f170cacadc9085973210ed8a325.2.rds"
#>  [9] "dc/3/dc4b1f170cacadc9085973210ed8a325.3.rds"
#> [10] "dd/1/dd8f6510b0c3a993e78f8a91974e61e1.1.rds"
#> [11] "dd/2/dd8f6510b0c3a993e78f8a91974e61e1.2.rds"
#> [12] "dd/3/dd8f6510b0c3a993e78f8a91974e61e1.3.rds"
#> [13] "df/1/df1f10261d22e3e86054c56547ce2237.1.rds"
#> [14] "df/2/df1f10261d22e3e86054c56547ce2237.2.rds"
#> [15] "df/3/df1f10261d22e3e86054c56547ce2237.3.rds"
#> [16] "e0/1/e002816a9d603419b28218e468036290.1.rds"
#> [17] "e0/2/e002816a9d603419b28218e468036290.2.rds"
#> [18] "e0/3/e002816a9d603419b28218e468036290.3.rds"
#> [19] "e4/1/e455750c5dedb37b34a136294d2554a8.1.rds"
#> [20] "e4/2/e455750c5dedb37b34a136294d2554a8.2.rds"
#> [21] "e4/3/e455750c5dedb37b34a136294d2554a8.3.rds"
#> [22] "e5/1/e53c8e71d6ccea99c1126598444cb15c.1.rds"
#> [23] "e5/2/e53c8e71d6ccea99c1126598444cb15c.2.rds"
#> [24] "e5/3/e53c8e71d6ccea99c1126598444cb15c.3.rds"

Notice that they are split up into subdirectories. This improves performance on some file systems. It can be controlled by the $options(split=) setting.

sim$run()
#> Starting 0 runs. (24 already done.)
#>    Status Runs
#> 1 SKIPPED   24

At any time, we can obtain the data frame of successful runs. If your configuration or output data structure is more complex, you may need to use custom functions for $result_df() or use $result_list() instead.

sim$result_df()
#> 8/24 runs had returned an error.
#>    x  y  p         u .seed
#> 1  1  1  1 0.2655087     1
#> 2  1  1  1 0.1848823     2
#> 3  1  1  1 0.1680415     3
#> 4  2  1  2 0.1848823     2
#> 5  1 27 27 0.2655087     1
#> 6  1 27 27 0.1848823     2
#> 7  1 27 27 0.1680415     3
#> 8  2 27 54 0.1848823     2
#> 9  1  3  3 0.2655087     1
#> 10 1  3  3 0.1848823     2
#> 11 1  3  3 0.1680415     3
#> 12 2  9 18 0.1848823     2
#> 13 2  3  6 0.1848823     2
#> 14 1  9  9 0.2655087     1
#> 15 1  9  9 0.1848823     2
#> 16 1  9  9 0.1680415     3

The remaining errors are more subtle. We can see which treatments and seeds led to unsuccessful runs:

head(sim$erred(), 2)
#> [[1]]
#> [[1]]$seed
#> [1] 1
#> 
#> [[1]]$treatment
#> [[1]]$treatment$x
#> [1] 2
#> 
#> [[1]]$treatment$y
#> [1] 1
#> 
#> attr(,"hash")
#> [1] "59ea3aa86ca8ae93bfd788bf4f62ff81"
#> 
#> [[1]]$output
#> [1] "Error : condition 0\n"
#> attr(,"class")
#> [1] "try-error"
#> attr(,"condition")
#> <simpleError: condition 0>
#> 
#> [[1]]$config
#> [[1]]$config$fn
#> [1] "59ea3aa86ca8ae93bfd788bf4f62ff81.1.rds"
#> 
#> [[1]]$config$subdirs
#> [1] "59" "1" 
#> 
#> 
#> [[1]]$OK
#> [1] FALSE
#> 
#> 
#> [[2]]
#> [[2]]$seed
#> [1] 3
#> 
#> [[2]]$treatment
#> [[2]]$treatment$x
#> [1] 2
#> 
#> [[2]]$treatment$y
#> [1] 1
#> 
#> attr(,"hash")
#> [1] "59ea3aa86ca8ae93bfd788bf4f62ff81"
#> 
#> [[2]]$output
#> [1] "Error : condition 0\n"
#> attr(,"class")
#> [1] "try-error"
#> attr(,"condition")
#> <simpleError: condition 0>
#> 
#> [[2]]$config
#> [[2]]$config$fn
#> [1] "59ea3aa86ca8ae93bfd788bf4f62ff81.3.rds"
#> 
#> [[2]]$config$subdirs
#> [1] "59" "3" 
#> 
#> 
#> [[2]]$OK
#> [1] FALSE

errcfg <- sim$erred()[[1]]
# debugonce(f) # to step through it
set.seed(errcfg$seed)
do.call(f, errcfg$treatment)
#> Error: condition 0

Let’s say that based on the above, we were able to fix the bug. Our new functions is as follows:

f <- function(x, y) {
  p <- x*y
  u <- runif(1)
  dbl(p = p, u = u)
}

sim$worker(f)
sim
#> A Piecemeal simulation
#> Output directory: /tmp/RtmpWFcXKT/piecemeal_demo 
#> 
#> Design: 8 treatment configurations by 3 seeds = 24 runs
#> 
#> Cluster:
#>   configuration: makeCluster(2)
#> 
#> Setup:
#>   initialise each node with:
#>     {
#>         library(rlang)
#>     } 
#>   variables to copy from each environment:
#>     R_GlobalEnv: 'a'
#> 
#> Call for each configuration and seed:
#>   function (x, y) 
#>   {
#>       p <- x * y
#>       u <- runif(1)
#>       dbl(p = p, u = u)
#>   } 
#> 
#> Options:
#>   directory split: 1 level(s) by treatment and 1 level(s) by seed
#>   errored runs: auto 
#> 
#> Ready to execute? Yes.

sim$run()
#> Deleting...
#> 8 failed runs deleted.
#> Starting 8 runs. (16 already done.)
#>    Status Runs
#> 1      OK    8
#> 2 SKIPPED   16

sim$result_df()
#>    x  y  p         u .seed
#> 1  1  1  1 0.2655087     1
#> 2  1  1  1 0.1848823     2
#> 3  1  1  1 0.1680415     3
#> 4  2  1  2 0.2655087     1
#> 5  2  1  2 0.1848823     2
#> 6  2  1  2 0.1680415     3
#> 7  1 27 27 0.2655087     1
#> 8  1 27 27 0.1848823     2
#> 9  1 27 27 0.1680415     3
#> 10 2 27 54 0.2655087     1
#> 11 2 27 54 0.1848823     2
#> 12 2 27 54 0.1680415     3
#> 13 1  3  3 0.2655087     1
#> 14 1  3  3 0.1848823     2
#> 15 1  3  3 0.1680415     3
#> 16 2  9 18 0.2655087     1
#> 17 2  9 18 0.1848823     2
#> 18 2  9 18 0.1680415     3
#> 19 2  3  6 0.2655087     1
#> 20 2  3  6 0.1848823     2
#> 21 2  3  6 0.1680415     3
#> 22 1  9  9 0.2655087     1
#> 23 1  9  9 0.1848823     2
#> 24 1  9  9 0.1680415     3

Lastly, suppose that we want to run additional 2 replications of each treatment combination. We can add more replications, and the simulation will pick up where it left off.

Incidentally, we can estimate how much longer the simulation will take based on the past runs. (This can also be run while the simulation is running.)

sim$eta()
#> A Piecemeal simulation ETA calculation
#> Output directory: /tmp/RtmpWFcXKT/piecemeal_demo 
#> Based on 23 completions in 2 secs 
#> 
#> Time per completion: 0.09 secs 
#> Completion rate: 11 per sec 
#> Estimated time left: 1 secs 
#> Estimated completion time: 2025-09-16 19:42:41 
#> 
#> In progress (approximate): 0

sim$status() will also calculate and print the estimate, but it is typically much slower.

sim$run()
#> Starting 16 runs. (24 already done.)
#>    Status Runs
#> 1      OK   16
#> 2 SKIPPED   24
sim$result_df()
#>    x  y  p         u .seed
#> 1  1  1  1 0.2655087     1
#> 2  1  1  1 0.1848823     2
#> 3  1  1  1 0.1680415     3
#> 4  1  1  1 0.5858003     4
#> 5  1  1  1 0.2002145     5
#> 6  2  1  2 0.2655087     1
#> 7  2  1  2 0.1848823     2
#> 8  2  1  2 0.1680415     3
#> 9  2  1  2 0.5858003     4
#> 10 2  1  2 0.2002145     5
#> 11 1 27 27 0.2655087     1
#> 12 1 27 27 0.1848823     2
#> 13 1 27 27 0.1680415     3
#> 14 1 27 27 0.5858003     4
#> 15 1 27 27 0.2002145     5
#> 16 2 27 54 0.2655087     1
#> 17 2 27 54 0.1848823     2
#> 18 2 27 54 0.1680415     3
#> 19 2 27 54 0.5858003     4
#> 20 2 27 54 0.2002145     5
#> 21 1  3  3 0.2655087     1
#> 22 1  3  3 0.1848823     2
#> 23 1  3  3 0.1680415     3
#> 24 1  3  3 0.5858003     4
#> 25 1  3  3 0.2002145     5
#> 26 2  9 18 0.2655087     1
#> 27 2  9 18 0.1848823     2
#> 28 2  9 18 0.1680415     3
#> 29 2  9 18 0.5858003     4
#> 30 2  9 18 0.2002145     5
#> 31 2  3  6 0.2655087     1
#> 32 2  3  6 0.1848823     2
#> 33 2  3  6 0.1680415     3
#> 34 2  3  6 0.5858003     4
#> 35 2  3  6 0.2002145     5
#> 36 1  9  9 0.2655087     1
#> 37 1  9  9 0.1848823     2
#> 38 1  9  9 0.1680415     3
#> 39 1  9  9 0.5858003     4
#> 40 1  9  9 0.2002145     5

A Simulation Study using Piecemeal

Pavel N. Krivitsky