Light Object Tree DB.
LOTDB is a persistent tree for organizing computation results, metadata, and large data payloads with very little boilerplate.
LOTDB is built around two node types:
BaseNodefor generic hierarchy and metadataDataNodefor hierarchy plus data-oriented read/write behavior
Each node can:
- contain child nodes
- store arbitrary attributes
- be persisted with ZODB
Large payloads can be stored through pluggable backends such as:
blobzarr
This makes LOTDB useful for pipelines where you want to:
- branch variants of computations
- cache intermediate results on disk
- keep metadata close to the computation tree
- avoid manually managing lots of folder/path boilerplate
Install from PyPI:
pip install lotdb
With optional extras:
pip install "lotdb[io,measurements]"
BaseNode= generic tree nodeDataNode= specialized node for data payloadsLOTDB= persistent container for the root tree
The recommended API is node-centered.
from lotdb import BaseNode
root = BaseNode(key="dataset")
node = root.get_node_path(["speaker_01", "session_a", "clip_001"])
node.set_attribute("label", "hello")
print(node.get_attribute("label"))from lotdb import LOTDB
db = LOTDB(path="./data", name="lotdb.fs", new=True)
root = db.open_connection()
root.get_node_path(["speaker_01", "clip_001"]).set_attribute("duration", 1.23)
db.commit()
db.close_connection()
db.close()Use get_data_node(...) when the final node should own data behavior.
import numpy as np
from lotdb import LOTDB
db = LOTDB(path="./data", name="lotdb.fs", new=True)
root = db.open_connection()
capture_1 = root.get_data_node(
["sensor_01", "capture_0001"],
samplerate_hz=1000,
backend="zarr", # or "blob"
data_attribute="imu",
)
capture_2 = root.get_data_node(
["sensor_01", "capture_0002"],
samplerate_hz=1000,
backend="zarr",
data_attribute="imu",
)
data = np.random.randn(2000, 6).astype("float32")
capture_1.write_data(data, database=db)
capture_2.write_data(data * 0.5, database=db)
# same node, second payload
capture_1.write_data(np.random.randn(2000, 2).astype("float32"), database=db, data_attribute="control")
window = capture_1.read_seconds(1.0, 2.0)
for block in capture_1.iter_data_blocks(0.5, block_unit="seconds"):
process(block)
# first iteration layer under the root
for node in root.iterate_tree_level(1):
print(node.key)
# first iteration layer under sensor_01
sensor_node = root.get_node_path(["sensor_01"])
for node in sensor_node.iterate_tree_level(1):
print(node.key)
# all leaves below sensor_01
for leaf in sensor_node.iterate_tree_leaves():
print("leaf", leaf.key)
# buffered iteration over leaves
for batch in sensor_node.iterate_tree_crone_buffered(buffer_size=2):
print([node.key for node in batch])
db.commit()
db.close_connection()
db.close()This is the main API LOTDB is optimized for.
Use BaseNode when you only need:
- hierarchy
- metadata
- relationships
Use DataNode when you want the node itself to own:
write_data(...)replace_data(...)append_data(...)has_data()delete_data()read_data(...)read_seconds(...)iter_data_blocks(...)
get_node(...) remains the generic retriever.
get_data_node(...) ensures the final node is a DataNode.
The lower-level API still exists when you want direct control:
import numpy as np
from lotdb import DataReader, DataWriter
DataWriter.write_array(
node,
np.random.randn(1000, 4).astype("float32"),
samplerate_hz=500,
backend="blob",
data_attribute="signal",
)
signal = DataReader.read_interval(node, "signal", 100, 200)write_data(...) supports explicit payload policies:
if_exists="replace"(default)if_exists="error"if_exists="append"
This is especially useful when one DataNode stores multiple payloads such as:
audiocontrol
Example:
capture.write_data(audio, database=db, data_attribute="audio", if_exists="replace")
capture.write_data(control, database=db, data_attribute="control", if_exists="replace")
capture.append_data(more_audio, database=db, data_attribute="audio")
if capture.has_data("control"):
capture.delete_data("control")External files are now treated as ingestion inputs.
DataWriter.attach_file(...) or DataNode.attach_file(...) loads the file, converts it into the configured backend representation, and stores source metadata on the node attributes.
After that, reads happen only through the backend:
data_node.attach_file("./capture.npy", database=db, data_attribute="imu")
payload = data_node.read_data()Typical formats currently supported:
wavnpycsvtxtpng/jpg/jpeg- raw bytes for unknown formats
Typical source attributes written onto the node:
_source_filepath_source_format_source_filename_source_samplerate_hzfor wav files
The older direct file-writing helper methods were removed to keep the public API focused on backend-managed data and ingestion.
LOTDB is not just a container for arrays.
It is useful when you want:
- a persistent computation tree
- easy branching of variants
- metadata attached directly to pipeline nodes
- cached intermediate states across runs
- backend flexibility for how payloads are stored
- source package:
src/lotdb - tests:
tests/ - API notes:
docs/API.md
Development install:
pip install -e .
With extras:
pip install -e .[io,measurements]
PyPI publishing is configured through GitHub Actions trusted publishing.
Typical release flow:
- bump version in
pyproject.toml - push to
main - create a GitHub release
- GitHub publishes to PyPI