-
Notifications
You must be signed in to change notification settings - Fork 18
Description
With the recent discussions surrounding OpenMatrix, I'd like to propose that OMX support the storage of sparse matrices.
As noted by Pedro in osPlanning/omx-python#12 (comment) we've been testing a version of omx-python internally that supports the storage of sparse matrices in the CSR format with AequilibraE. This has allowed us to store incredibly large OD matrices (500k * 500K) in a manner than doesn't require terabytes of disk/memory. We believe that this functionality could be of use to the community at large.
The code is not public at the moment but the implementation itself is not particularly interesting. It stores a set of three arrays (two integer, one floating point) per matrix that encodes the CSR format under a single group. These groups are then stored under a top-level group named sparse. While this makes the files non-conformant to the OMX specification, it is forward compatible. Sparse matrices become invisible to other OMX software.
I believe that this can be achieved in a handful of ways. With the current HDF5 back end, an additional sparse group could be added that holds the sparse matrices, these could be encoded as CSR, or COO with some additionally metadata to specify the intended dimensions.
As columnar formats have been mentioned in other issues, I'll note that these would essentially switch the default storage format from dense to sparse (COO), dense matrices would see an increase in the explicitly stored information as I don't believe the formats are intended to support thousands of columns.