Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 15 additions & 4 deletions .github/workflows/python-package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,20 +18,26 @@ jobs:
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
max-parallel: 6
matrix:
os: [ ubuntu-latest, windows-latest, macos-latest, ubuntu-24.04-arm ]
python-version: [ "3.10", "3.11", "3.12", "3.13", "3.14" ]

steps:
- uses: actions/checkout@v4
- name: Install uv
uses: astral-sh/setup-uv@v5
with:
enable-cache: true
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
env:
POETRY_VIRTUALENVS_PREFER_UV: "true"
run: |
python -m pip install --upgrade pip
python -m pip install poetry
uv pip install --system poetry
cd python
poetry lock
poetry install --with dev
Comment on lines 32 to 43
Copy link
Copy Markdown

@coderabbitai coderabbitai bot Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

actions/setup-python GitHub Action latest version 2025

💡 Result:

The latest release is actions/setup-python v6.1.0 — published Nov 25, 2025. [1][2]

Sources:

  • actions/setup-python repository. [1]
  • Releases page (v6.1.0). [2]

🌐 Web query:

actions/setup-python v4 deprecated v5 v6 release history

💡 Result:

Summary timeline for actions/setup-python (major points)

  • v4 — introduced python-version-file, PyPy fixes and other v4 features. (newreleases.io)
  • v5 — series of v5.x releases that updated dependencies and moved the action to newer Node runtimes (addressing Node 16 deprecation); some v5 users reported caching failures tied to an old actions/cache pin (see issue). (github.com)
  • v6 — v6.0.0 (breaking change) upgrades the action to Node.js 24 (runner requirement mentioned in the release notes); v6.1.0 is a later patch with enhancements and dependency updates. (github.com)

Sources:

  1. actions/setup-python releases (v6, v5, v4 changelogs). [GitHub releases]. (github.com)
  2. Issue reporting actions/cache deprecation failures when using setup-python@v5. (github.com)

If you want exact release dates and full changelogs for each minor release (v4.x, v5.x, v6.x), I can list them with direct links.

Citations:


Upgrade actions/setup-python to v6 — v4 works fine, but v6 is where the action is heading.

v4 isn't actually deprecated; it's just old. The latest is v6.1.0 (released November 2025), which requires Node.js 24 and includes meaningful improvements. If you're modernizing the workflow, v6 is the move. (v5 exists as a midpoint with some caching quirks, so skip it.)

One secondary thought: running poetry lock on every CI run will regenerate the lock file if there's any drift. If someone forgets to commit an updated lock file, CI silently fixes it—masking the mistake. Consider using poetry check --lock or poetry install --no-update instead to verify the lock file matches without regenerating it.

Proposed fix
      - name: Set up Python ${{ matrix.python-version }}
-       uses: actions/setup-python@v4
+       uses: actions/setup-python@v6
        with:
          python-version: ${{ matrix.python-version }}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
env:
POETRY_VIRTUALENVS_PREFER_UV: "true"
run: |
python -m pip install --upgrade pip
python -m pip install poetry
uv pip install --system poetry
cd python
poetry lock
poetry install --with dev
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v6
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
env:
POETRY_VIRTUALENVS_PREFER_UV: "true"
run: |
uv pip install --system poetry
cd python
poetry lock
poetry install --with dev
🧰 Tools
🪛 actionlint (1.7.10)

33-33: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🤖 Prompt for AI Agents
In @.github/workflows/python-package.yml around lines 32 - 43, Update the GitHub
Action usage from actions/setup-python@v4 to the v6 release (e.g.,
actions/setup-python@v6 or `@v6.1.0`) to modernize the runner (note v6 requires
Node.js 24 on the host); then replace the unconditional poetry lock step with a
non-mutating check such as using poetry check --lock or run poetry install
--no-update instead of poetry lock to avoid regenerating the lockfile in CI
(refer to the existing uses: actions/setup-python@v4 and the commands poetry
lock and poetry install --with dev).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an upgrade to v6 could be done in the future, in a dedicated PR.

about the poetry lock file -- will ask the real maintainer(@asuiu ) on how to approach it.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amaximciuc Fair enough. Ship when ready, iterate later. Makes sense to get @asuiu's input on the lock file strategy—better to have consistency across the repo than to optimize one workflow in isolation.

Expand All @@ -51,10 +57,15 @@ jobs:
image: python:3.12-bullseye
steps:
- uses: actions/checkout@v4
- name: Install uv
uses: astral-sh/setup-uv@v5
with:
enable-cache: true
- name: Install dependencies in container
env:
POETRY_VIRTUALENVS_PREFER_UV: "true"
run: |
python -m pip install --upgrade pip
python -m pip install poetry
uv pip install --system poetry
cd python
poetry lock
poetry install --with dev
Expand Down
1 change: 0 additions & 1 deletion python/bucketbase/ibucket.py
Original file line number Diff line number Diff line change
Expand Up @@ -477,7 +477,6 @@ def put_object_stream(self, name: PurePosixPath | str, stream: BinaryIO) -> None
self._lock_object(name)
try:
if self._base_bucket.exists(name):
self._unlock_object(name)
raise FileExistsError(f"Object {name} already exists in AppendOnlySynchronizedBucket")
# we assume that the put_object_stream operation is atomic
self._base_bucket.put_object_stream(name, stream)
Expand Down
56 changes: 53 additions & 3 deletions python/bucketbase/memory_bucket.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
import io
import multiprocessing
import weakref
from contextlib import contextmanager
from multiprocessing.managers import DictProxy, SyncManager
from pathlib import PurePosixPath
from threading import RLock
from typing import BinaryIO, Generator, Iterable, Union
from typing import Any, BinaryIO, Generator, Iterable, Optional, Union

from streamerate import slist, sset
from streamerate import stream as sstream
Expand All @@ -26,8 +29,8 @@ class MemoryBucket(IBucket):
"""

def __init__(self) -> None:
self._objects: dict[str, bytes] = {}
self._lock = RLock()
self._objects: dict[str, bytes] | DictProxy[str, bytes] = {}
self._lock: Any = RLock()

def put_object(self, name: PurePosixPath | str, content: Union[str, bytes, bytearray]) -> None:
_name = self._validate_name(name)
Expand Down Expand Up @@ -138,3 +141,50 @@ def open_write_sync(self, name: PurePosixPath | str) -> Generator[BinaryIO, None
if not exception_occurred:
with self._lock:
self._objects[_name] = content


class SharedMemoryBucket(MemoryBucket):
"""
MemoryBucket shared across processes via a multiprocessing Manager.

CRITICAL: Data persists ONLY as long as the Manager process is running.
Accessing or unpickling the bucket after Manager shutdown will raise errors.

If 'manager' is None, a new one is created and shut down automatically via
weakref finalizer when the bucket is garbage collected. Otherwise, the caller
is responsible for the manager's lifecycle.
"""

def __init__(self, manager: Optional[SyncManager] = None) -> None:
super().__init__()
if manager is None:
ctx = multiprocessing.get_context("spawn")
manager = ctx.Manager()
owns_manager = True
weakref.finalize(self, self._safe_manager_shutdown, manager)
else:
owns_manager = False

self._manager: SyncManager | None = manager
self._owns_manager: bool = owns_manager

# override parent's structures with managed ones. This is a small hack, but it's simple & clear enough
self._objects: dict[str, bytes] | DictProxy[str, bytes] = manager.dict()
self._lock: Any = manager.RLock()

def __getstate__(self) -> dict[str, Any]:
"""Customize pickling to avoid serializing the manager itself."""
state = self.__dict__.copy()
state["_manager"] = None
state["_owns_manager"] = False
return state

@staticmethod
def _safe_manager_shutdown(manager: Any) -> None:
"""Helper to safely shut down a SyncManager during GC."""
try:
shutdown = getattr(manager, "shutdown", None)
if callable(shutdown):
shutdown()
except Exception: # pylint: disable=broad-exception-caught
pass
2 changes: 1 addition & 1 deletion python/pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "bucketbase"
version = "1.5.3" # do not edit manually. kept in sync with `tool.commitizen` config via automation
version = "1.5.4" # do not edit manually. kept in sync with `tool.commitizen` config via automation
description = "bucketbase"
authors = ["Andrei Suiu <andrei.suiu@gmail.com>"]
repository = "https://github.com/asuiu/bucketbase"
Expand Down
9 changes: 4 additions & 5 deletions python/tests/bucket_tester.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,18 +5,17 @@
import tempfile
import threading
import time
import uuid
from io import BytesIO
from pathlib import Path, PurePosixPath
from queue import Queue
from typing import BinaryIO
from unittest import TestCase
import uuid

import pyarrow as pa
import pyarrow.parquet as pq
from streamerate import slist
from streamerate import stream as sstream
from tsx import iTSms

from bucketbase.ibucket import AsyncObjectWriter, IBucket

Expand Down Expand Up @@ -66,11 +65,11 @@ def write(self, data: bytes) -> int:
self.bytes_processed += len(data)
return len(data)

def close(self):
def close(self) -> None:
self._is_closed = True

@property
def closed(self):
def closed(self) -> bool:
return self._is_closed

def readable(self) -> bool:
Expand Down Expand Up @@ -638,7 +637,7 @@ def test_open_write_with_parquet(self) -> None: # pylint: disable=too-many-loca

# Write parquet file using open_write
# Use higher timeout for MinIO due to network latency and data buffering
timeout_for_minio = 6 if "MinioBucket" in str(type(self.storage)) else 3
timeout_for_minio = 30 if "MinioBucket" in str(type(self.storage)) else 3
tested_object: AsyncObjectWriter = self.storage.open_write(parquet_path, timeout_sec=timeout_for_minio) # type: ignore[assignment]
with tested_object as sink:
with pa.output_stream(sink) as arrow_sink:
Expand Down
36 changes: 23 additions & 13 deletions python/tests/test_append_only_fs_bucket.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,23 +127,33 @@ def test_get_after_put_object(self):
def test_lock_object_with_threads(self):
bucket_in_test = AppendOnlyFSBucket(self.base_bucket, self.locks_path)
object_name = "shared_object"
lock_acquired = [False, False] # To track lock acquisition in threads
# Use Events for reliable thread synchronization(handshake), avoiding sleep-based timing issues
lock_held_event = threading.Event()
second_thread_ready_event = threading.Event()
second_thread_completed_event = threading.Event()

def lock_and_release_first():
bucket_in_test._lock_object(object_name)
lock_acquired[0] = True
time.sleep(0.1) # Simulate work by sleeping
lock_held_event.set()
self.assertTrue(second_thread_ready_event.wait(timeout=5.0), "Second thread failed to signal readiness")
time.sleep(0.2)
bucket_in_test._unlock_object(object_name)
lock_acquired[0] = False

def wait_and_lock_second():
time.sleep(0.001) # Ensure this runs after the first thread has acquired the lock
self.assertTrue(lock_held_event.wait(timeout=5.0), "First thread failed to acquire lock in time")

second_thread_ready_event.set()

t1 = time.time()
bucket_in_test._lock_object(object_name)
t2 = time.time()
print(f"Time taken to acquire lock: {t2 - t1}")
self.assertTrue(t2 - t1 > 0.1, "The second thread should have waited for the first thread to release the lock")
lock_acquired[1] = True # Should only reach here after the first thread releases the lock

# Since first thread sleeps for 0.2s *after* we are ready, we should be blocked for a significant time
duration = t2 - t1
print(f"Time taken to acquire lock: {duration}")
self.assertGreater(duration, 0.1, f"Second thread acquired lock too fast ({duration}s), should have been blocked")

second_thread_completed_event.set()
bucket_in_test._unlock_object(object_name)

# Create threads
Expand All @@ -154,12 +164,12 @@ def wait_and_lock_second():
thread2.start()

# Wait for both threads to complete
thread1.join()
thread2.join()
thread1.join(timeout=10.0)
thread2.join(timeout=10.0)
self.assertFalse(thread1.is_alive(), "The first thread did not finish")
self.assertFalse(thread2.is_alive(), "The second thread did not finish")

# Verify that both threads were able to acquire the lock
self.assertFalse(lock_acquired[0], "The first thread should have released the lock")
self.assertTrue(lock_acquired[1], "The second thread should have acquired the lock after the first thread released it")
self.assertTrue(second_thread_completed_event.is_set(), "The second thread did not complete correctly")

def test_get_size(self):
bucket = AppendOnlyFSBucket(self.base_bucket, self.locks_path)
Expand Down
7 changes: 3 additions & 4 deletions python/tests/test_integrated_cached_immutable_bucket.py
Original file line number Diff line number Diff line change
Expand Up @@ -261,11 +261,10 @@ def get_object_from_thread():
thread.start()

for thread in threads:
thread.join(timeout=2.0)
self.assertFalse(thread.is_alive(), "Thread did not complete within timeout")
thread.join(timeout=10.0)
self.assertFalse(thread.is_alive(), f"Thread {thread.name} did not complete within timeout")

# Verify results
self.assertEqual(len(results), num_threads, f"Expected {num_threads} results, but got {len(results)}")
self.assertEqual(len(get_object_calls), 1, "Main bucket's get_object should be called exactly once")
self.assertEqual(len(results), num_threads, "All threads should have retrieved the content")
for result in results:
self.assertEqual(result, content, "All threads should get the same content")
Loading
Loading