Skip to content
This repository was archived by the owner on Mar 20, 2024. It is now read-only.
This repository was archived by the owner on Mar 20, 2024. It is now read-only.

Parallel version of [[ #103

@reinholdsson

Description

@reinholdsson

Although the solution below isn't faster in parallel on my personal computer, it's at least an example on how to read in parallel:

library(foreach)
library(doParallel)
library(Coldbir)
library(data.table)

# Prepare data
N <- 10000000L

x <- data.table(
  a = as.Date(sample(c(1000+1:100), N, replace = T), origin = "1960-01-01"),
  b = sample(1:100, N, replace = T),
  c = sample(1:10000, N, replace = T),
  d = sample(LETTERS, N, replace = T)
)

a <- cdb()
a[] <- x

cols <- names(x)

# Read non-parallel
system.time({res <- a[]})

# Read parallel (by column)
cl <- makeCluster(4)
registerDoParallel(cl)

system.time({
  res <- as.data.table(foreach(i = cols) %dopar% {
    require(Coldbir)
    a[i][[1]]
  })
  setnames(res, cols)
})

stopCluster(cl)

Perhaps we can figure out a better use case.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions