Skip to content

BinTools.Read_s exception: Buffer position calculations in pystreambuf.h produce incorrect results #192

@bernhard-42

Description

@bernhard-42

Issue

Currently the BinTools.Write_s and BinTools.Read_s combination regularly fails.

So I asked claude code to stress test the current implementation to find a reproducible example.

Environment:

micromamba create -n cp79 python=3.13
micromamba activate cp79
micromamba install -c conda-forge -c cadquery OCP=7.9.3.0
micromamba install ipython

Reproducing example

  • test.py:

    from OCP.BinTools import BinTools
    from OCP.BRepPrimAPI import BRepPrimAPI_MakeBox, BRepPrimAPI_MakeSphere
    from OCP.BRepAlgoAPI import BRepAlgoAPI_Fuse
    from OCP.TopoDS import TopoDS_Shape
    import io
    
    box = BRepPrimAPI_MakeBox(
        8.189314010683498, 46.85954928498498, 14.39317846913498
    ).Shape()
    
    sphere = BRepPrimAPI_MakeSphere(18.12601828498498).Shape()
    shape =  BRepAlgoAPI_Fuse(box, sphere).Shape()
    
    BinTools.Write_s(shape, "shape.brep")
    
    with open("shape.brep", "rb") as f:
        data = f.read()
    
    buf = io.BytesIO(data)
    result = TopoDS_Shape()
    
    BinTools.Read_s(result, buf)
  • Error

    $ python test.py
    Traceback (most recent call last):
      File "/home/bernhard/test.py", line 22, in <module>
        BinTools.Read_s(result, buf)
        ~~~~~~~~~~~~~~~^^^^^^^^^^^^^
    OCP.Standard.Standard_Failure: EXCEPTION in BinTools_ShapeSet::ReadGeometry(S,OS)
    0x558d8ee274a0 : Standard_Failure: BinTools_SurfaceSet::ReadGeometry: UnExpected BRep_PointRepresentation = -1

Note: It fails on my Mac M1 and on my Linux box:

Analysis

The claude code helped, so take it with a pinch of salt, but I thought it makes sense

This shape

  • serializes to exactly 5797 bytes
  • causes OCCT to request a backward seek during parsing (seek from position 4096 back to 3064)

The backward seek triggers a bug in pystreambuf's position tracking:

Read sequence:
read(1024) × 4 → position 4096
seek(3064) → OCCT wants to re-read earlier data (NORMAL)
read(1024) × 3 → positions 3064 → 4088 → 5112 → 5797 (EOF)
seek(6140) → BUG! pystreambuf calculates wrong position (343 bytes past EOF)
read(1024) → returns 0 bytes → OCCT fails

Why this specific shape?

  • The shape's binary representation requires OCCT to seek backward during parsing
  • Simple shapes (like a plain box) read sequentially without seeks → work fine
  • Complex shapes that need to reference earlier data trigger backward seeks
  • The combination of file size (5797) and seek positions causes pystreambuf's buffer arithmetic to overflow

The bug is in pystreambuf.h line ~399-453 (seekoff_without_calling_python), where buffer position calculations after a backward seek
produce incorrect results.

Note: BinTools.Read_s is reliable

Direction Reliable? Reason
Write (C++ → Python) Yes Sequential, C++ owns buffer, copies to Python
Read (Python → C++) No Pointer aliasing, backward seeks break position tracking

Alternative implementation

A much simpler solution we use in https://github.com/jdegenstein/ocp-addons is

py::bytes serialize_shape(const TopoDS_Shape &shape) {
    std::ostringstream buf;
    BinTools::Write(shape, buf);
    return py::bytes(std::move(buf.str()));
}

TopoDS_Shape deserialize_shape(const py::bytes &buf) {
    std::istringstream stream(buf);
    TopoDS_Shape shape;
    BinTools::Read(shape, stream);
    return shape;
}

py::bytes serialize_location(const TopLoc_Location &location) {
    std::ostringstream buf;
    BinTools_OStream occtStream(buf);
    BinTools_ShapeWriter().WriteLocation(occtStream, location);
    return py::bytes(std::move(buf.str()));
}

TopLoc_Location deserialize_location(const py::bytes &buf) {
    std::istringstream stream(buf);
    BinTools_IStream occtStream(stream);
    // This is not a memory leak and can only be copied due to weird occt impl
    return *BinTools_ShapeReader().ReadLocation(occtStream);
}

Side-by-Side Comparison

Comparing both implementations, claude code came up with:

Aspect pystreambuf ocp-addons
Buffer ownership Shared/complex (Python bytes + C++ pointers) Clear (C++ owns during operation)
Python calls during I/O Many (~N/1024 for N bytes) Zero (only at start/end)
GIL safety Risky (pointers into Python memory) Safe (pure C++ during I/O)
Memory pattern Incremental chunks Single allocation
Seek support Complex, error-prone Native C++ stringstream (reliable)
Error handling Complex (Python exceptions mid-stream) Simple (fails at boundaries)
Code complexity ~500 lines ~15 lines
Streaming large files Theoretically better (incremental) Requires full memory
  • Advantage of the ocp_addons approach:

    1. Clean boundary: Python <-> C++ conversion happens exactly once at the start (for read) or end (for write), not continuously during I/O
    2. No pointer aliasing: std::ostringstream owns its buffer entirely. No raw pointers into Python object internals.
    3. GIL-safe: The entire BinTools operation runs in pure C++ with no Python dependencies mid-stream
    4. Simple type conversion:
      • py::bytes -> std::string (pybind11 handles this cleanly, copies the data)
      • std::string -> py::bytes (single copy at the end via std::move)
    5. Native stream semantics: std::stringstream has well-defined, tested seek/tell behavior
  • The Trade-off

    The pystreambuf approach was designed for streaming large data without loading everything into memory. But in practice:

    • Most CAD shapes are manageable in memory
    • The reliability cost of the complex bridging might outweigh the memory benefit

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions