I saw your JuliaCon talk today. Impressive work. README says:
Using hybrid parallelism, where threads are used to communicate within each machine and MPI or Julia's native parallelism between machines is much faster. For MPI, the scaling is almost ideal with the number of machines, but for DArray the results are more erratic.
Would you mind making an example to show how this is done?
I saw your JuliaCon talk today. Impressive work. README says:
Would you mind making an example to show how this is done?