Random thoughts around decentralized and permissionless data lakes.
- An easy target is blockchain data.
- Everything should be content adressed and inmutable! Easy to get with chain data. I should be able to query any CID without caring where it is.
- Publish the CID of the something like a Delta Catalog JSON file on Ethereum. You can publish your fork or write contracts on top of it. Use any compute engine to run queries on top of that.
- Collaborate on data TrueBlocks style, where more people usinig the service means better data reliability and speed. If there is a section missing, I can send somemthing like a PR to fill that data.
Also from datonic/datadex#22 (comment).
Reading "The Database I Wish I Had" and thinking about something like that for OLAP workloads. Feels like OLAP use cases might be the "killer database" for IPFS/Hypercore/Dat. For analysis, you want data to be inmutable, don't care that much about latency, and have to store large amount of data.
Random thoughts around decentralized and permissionless data lakes.
Also from datonic/datadex#22 (comment).