Skip to content

fix: Specify the end index in Wkb::buf()#87

Closed
yutannihilation wants to merge 2 commits intogeorust:mainfrom
yutannihilation:fix/buf-end-index
Closed

fix: Specify the end index in Wkb::buf()#87
yutannihilation wants to merge 2 commits intogeorust:mainfrom
yutannihilation:fix/buf-end-index

Conversation

@yutannihilation
Copy link
Copy Markdown

  • I agree to follow the project's code of conduct.
  • I added an entry to CHANGES.md if knowledge of this change could be valuable to users.
  • I ran cargo fmt

Fix #86

I was hoping to specify the end index in Wkb::try_new() or in GeometryCollection::try_new(), but I found it difficult because we can call .size() only after we construct the struct. So, this pull request chooses to do it in buf(), in the assumption that calling .size() is cheap and .buf() is probably called less frequently than try_new().

Copy link
Copy Markdown
Member

@kylebarron kylebarron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. This is a little tricky because, while computing the size should be pretty fast, I'm not sure if we want to require computing the size every time we want to access the buffer.

Possible options are:

  • Document that buf will overshoot this geometry's buffer in the case it's part of a geometry collection
  • Call size() every time inside of buf and subset the buffer
  • Cache size() as part of the struct

Any thoughts @Kontinuation as you were the one to expose this API in #85?

@yutannihilation
Copy link
Copy Markdown
Author

Thanks for the comment. Yeah, it's not obvious which option is the best... I'd vote for calling size() every time because it's probably rare that we need to call .buf() repeatedly, but I might be wrong.

For more context, I found this problem when I tried to use Wkb::buf() to implement ST_Dump in SedonaDB (apache/sedona-db#269). At the moment, I switched to use wkb::writer::write_point() etc instead of copying Wkb::buf(). I was hoping I could add .buf() to wkb::reader::Point etc, but I found it would be not so simple (wkb::reader::Point etc holds only the coordinates part of the WKB), and I felt wkb::writer::write_point() is better anyway because it should ensure the endianness.

@yutannihilation
Copy link
Copy Markdown
Author

How about adding unsafe fn buf_unchecked()? Marking it unsafe probably helps users to guess the risk of overshoot and that the safe version does some computation.

@Kontinuation
Copy link
Copy Markdown
Contributor

The proposed patch changes the semantics of Wkb::buf. We know that Wkb::try_new accepts buffers containing trailing data, and Wkb::buf returns the buffer passed to Wkb::try_new as-is without slicing it. I am totally OK with the new semantics that Wkb::buf returns slices truncated to perfectly align with the boundary of the WKB.

The ideal approach is to remove buf field from Wkb and implement buf method for each geometry type. The approach taken by this PR also looks good to me, as it makes minimal changes, and the performance overhead won't be large for simple geometries.

@Kontinuation
Copy link
Copy Markdown
Contributor

This is my attempt to reimplement Wkb::buf by delegating to the buf methods of each geometry type: 40937d4. I still need to take some time adding comprehensive tests for it. Please let me know if this approach is acceptable, and I will submit a follow-up PR with the tests.

@yutannihilation
Copy link
Copy Markdown
Author

Oh, thanks, it looks way better than my approach. I actually wanted the buf method on each geometry type!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Wkb::new() should slice the buffer by the actual length

3 participants