Skip to content

Feature/performance#448

Merged
essweine merged 17 commits intomainfrom
feature/performance
Apr 17, 2026
Merged

Feature/performance#448
essweine merged 17 commits intomainfrom
feature/performance

Conversation

@danfunk
Copy link
Copy Markdown
Collaborator

@danfunk danfunk commented Mar 2, 2026

This adds a "copy-on-write" strategy that seems to improve performance and reduce the size data during serialization. This was done with claude doing most of the work and recommendations, and me just vetting code and asking questions, and keeping a very tight reign on changes - so that we are making as minimal a change to existing code as possible.

I am not expecting you to merge this, so much as have a hard look at it so we can talk about it tomorrow post scrum.

We still hit recursion limit on 1000 loops, so that particular problem isn't addressed yet.

Performance Tests Prior to Changes:

Loop Count (Items) Execution Time (s) Serialize Time (s) Deserialize Time (s) Total Time (s)
20 0.045 0.061 0.045 0.152
100 0.662 1.402 1.109 3.173
200 2.451 6.203 4.439 13.093
300 5.558 16.018 10.378 31.955

Performance Tests after Changes:

Loop Count (Items) Execution Time (s) Serialize Time (s) Deserialize Time (s) Total Time (s)
20 0.019654 0.021859 0.015446 0.056960
100 0.238948 0.435454 0.336278 1.010681
200 0.837055 1.894795 1.382747 4.114597
300 1.887173 4.150990 2.952501 8.990664

@essweine
Copy link
Copy Markdown
Contributor

essweine commented Mar 3, 2026

I removed the custom class and replaced it with two one line static methods for comparing two dictionaries, which the serializer now uses to maintain partial serializations.

Claude's "optimization" basically just replaced the deepcopy with a shallow copy, which has caused problems in the past with shared data references between tasks, and was the reason we were using deepcopy anyway (it certainly wasn't my personal preference!). I'm amazed none of the existing tests failed. I replaced it with DeepMerge which is not quite as aggressive about copying things, but nobody should be surprised if weird data problems crop up in six months.

I changed do_engine_steps (another method I just love) to a non-recursive version and that fixes the recursion issue.

I ran the 300 item test with the following results:

    Execution:      0.780830 seconds
    Serialization:  0.882112 seconds
    Deserialization: 1.717875 seconds
    Total:          3.380817 seconds

So either my code is a legit improvement or I have a faster computer.

danfunk and others added 9 commits March 4, 2026 09:29
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instead of two passes (Python dict-building + json.dumps), serialize_json
now uses a SpiffEncoder that lets the C-level JSON encoder handle plain
data traversal natively, only calling back to Python for registered types.

Periodic serialization test (300 items, 180 serializations):

| Metric                         | Before   | After   | Improvement   |
|--------------------------------|----------|---------|---------------|
| Total time                     | 162.5s   | 45.0s   | 3.6x faster   |
| Total serialization            | 160.7s   | 43.3s   | 3.7x faster   |
| Avg per serialization          | 0.893s   | 0.241s  | 3.7x faster   |
| Final checkpoint (308 tasks)   | 2.07s    | 0.56s   | 3.7x faster   |

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
refactor serialize_json to use custom JSONEncoder for ~3.7x speedup
@essweine essweine merged commit b8a9aaf into main Apr 17, 2026
5 checks passed
@essweine essweine deleted the feature/performance branch April 17, 2026 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants