Memory-mapped file handling and streaming API#39
Memory-mapped file handling and streaming API#39jgoedeke wants to merge 13 commits intoRecordEvolution:masterfrom
Conversation
- Add pytest configuration for test management - Implement test suite for CLI functionality and Python module - Update README with testing instructions and badge - Fix Dockerfile - Create .dockerignore to exclude unnecessary files from Docker builds - Add GitHub Actions workflows for testing - Clean up makefile to include test commands
…example and tests
86df1f2 to
d416f25
Compare
…es for chunked NumPy export
- Modified `component_group` and `channel` constructors to accept raw buffer pointers instead of vectors. - Enhanced `load_all_data` and `init_metadata` methods for better data initialization and loading. - Implemented `read_chunk` method in `channel` to facilitate chunked data reading with support for raw and scaled modes. - Updated `convert_data_to_type` and `convert_chunk_to_double` functions to handle raw data more efficiently. - Removed redundant `imc_result.hpp` file to streamline the codebase. - Adjusted Python bindings in `imctermite.pyx` to manage C++ instance memory correctly.
- Update GitHub Actions workflow to support testing on multiple OS - Refactor memory mapping in imc_buffer.hpp for Windows compatibility - Improve makefile to handle .pyd files for Python builds - Add comprehensive tests for streaming and chunking functionality in test_streaming.py
…ficiency (RecordEvolution#9) Reduces memory usage by 90% for large datasets while maintaining comparable processing speed.
d416f25 to
27d8215
Compare
…ples and core functionality
ae7a7b5 to
645cba6
Compare
|
I just realized the new ChangesPackaging
CI/CD
Documentation
Build System
TestingAll existing tests pass. CI workflows validated on Ubuntu and Windows with Python 3.10-3.13. |
- Migrate from setup.cfg to pyproject.toml with PEP 517/621 compliance - Update to Python build tools (replace setup.py commands with python -m build) - Upgrade all GitHub Actions to latest versions (@v4, ubuntu-latest) - Remove outdated cibuildwheel version pinning - Add numpy as explicit build and runtime dependency - Bump package version to 3.0.0 - Improve test documentation with development install guidance - Add Python version badge to README - Standardize python3 usage across makefiles
645cba6 to
c2c9109
Compare
|
Hi @mario-fink, I guess the changes I made are a bit overwhelming. To get an artifact via the CI I've merged the branches ( #37, #38 and #39) in my repository together (https://github.com/jgoedeke/IMCtermite/tree/build) this is the corresponding action run: https://github.com/jgoedeke/IMCtermite/actions/runs/21626413617. Is there something I can do to support the review process? |
* restructure package significantly: - move native cython extension to private `_imctermite` module - introduce pure-python `__init__.py` wrapper for public api * rename main class `imctermite` -> `ImcTermite` to follow PEP 8 class naming standards * add deprecated alias for `imctermite` (with warning) for backwards compatibility * add PEP 484 type hints and PEP 561 `py.typed` marker for proper IDE/mypy support * support PEP 519 file system path protocol (allowing `pathlib.Path` objects) * update usage examples and tests to reflect new API * update build configuration (setup.py, pyproject.toml, MANIFEST.in) to support new structure
|
Hi @jgoedeke , first of all, thank you for your amazing contributions to the project! However, I'm having a hard time finding a time slot to review your work, but I'll try to get it done a.s.a.p. . |
This PR introduces memory-mapped file handling and a streaming API to enable efficient processing of large IMC files without loading the entire dataset into memory.
Key Changes:
Performance Improvements:
Benchmarks comparing the new memory-mapped implementation against the baseline show significant improvements in both memory efficiency and load times:
NOTE: This PR is based on #37
Closes #33
Closes #9