Merged
Conversation
- Fix erlang_python git repo URL (erlang-python not erlang_python) - Replace py:bind/py:unbind with py:context/py:contexts_started - Replace py:ctx_call with py:call - Replace py:with_context with direct py:call - Update py:call signatures to use options map for timeout
- Create hornbeam_request.erl for pre-parsing HTTP requests in Erlang - Add to_wsgi_header_key/1 for header format conversion - Add build_wsgi_tuple/2 and build_asgi_scope/2 functions - Add BytesIO pool to WSGI runner to reduce allocation overhead - Add environ template for O(1) environ creation - Add run_wsgi_fast/3 and create_environ_from_tuple/1 for fast path
- Add response object pool with reset() method - Add _get_response() and _return_response() pool functions - Pool size of 100 responses for high-throughput scenarios
- Use _ENVIRON_TEMPLATE.copy() instead of inline dict creation - Use pooled BytesIO for wsgi.input - Return BytesIO to pool after request completion
Workers receive requests via channels and loop continuously, reducing Python startup overhead. Features heartbeat monitoring, scheduler affinity routing, and automatic restart on failure. Enable via mount config: pool_enabled => true
Use cowboy stream_reply/stream_body instead of collecting chunks before sending. Reduces memory usage and latency for large streaming responses.
Remove per-mount heartbeat_interval and heartbeat_timeout options. Use module constants (5s interval, 15s timeout) for all workers.
Store channels as tuple instead of individual entries. One lookup instead of two: get tuple, then element().
Store channels as {{pool, MountId}, Ch1, Ch2, ...} in ETS.
Handler gets channel via single lookup_element call.
Remove persistent_term usage entirely.
Store as {{MountId, Idx}, Channel} for direct lookup.
No tuple manipulation needed.
Call py_context:extend_erlang_module_in_context/1 before importing hornbeam_wsgi_worker and hornbeam_asgi_worker. This ensures the erlang module is fully extended with send/call/schedule_inline before the workers check HAS_ERLANG at import time. Also adds noop_asgi.py benchmark app for ASGI testing. Performance results: - WSGI single context: 76K req/sec, 13us latency - WSGI 14 workers: 62-64K req/sec - ASGI single worker: 60K req/sec - ASGI 14 workers: 113K req/sec, 9us latency
- Replace pooled worker architecture with context_call + schedule_inline - WSGI now uses py_nif:context_call() with schedule_inline for yielding - ASGI uses py_event_loop for async execution - Remove hornbeam_worker_arbiter and hornbeam_worker_pool (obsolete) - Simplify hornbeam_handler to single codepath - Update Python workers for schedule_inline continuation pattern - Add max_concurrent config for erlang_python
- Small bodies (< 64KB): buffered path (read fully before Python call) - Large bodies (>= 64KB): stream via py_channel in 64KB chunks - Add handle_wsgi_streaming entry point in Python worker - StreamingBodyReader reads body chunks from channel - Add hornbeam_context_pool:add_paths/1 to set pythonpath in all contexts - Fix Channel.receive() to use timeout_ms parameter - Add wsgi_body_chunk_size and wsgi_streaming_threshold config options
- Replace handle_wsgi_buffered/handle_wsgi_streaming with single handle_wsgi
- Add ChannelBuffer (inherits io.BufferedIOBase) for wsgi.input
- Body delivered via channel for all sizes (small: {body, Data}, large: chunks)
- Add hop-by-hop header filtering for HTTP compliance
- Remove BytesIO pool and StreamingBodyReader class
- Use py_buffer API for request body (zero-copy shared memory) - Use erlang.send instead of erlang.reply for responses - Skip buffer creation for bodyless GET/HEAD/DELETE/OPTIONS - Preload WSGI app at startup in all contexts - Single message path for simple [body] list responses - Use worker mode instead of subinterpreter for contexts Benchmark: 56,720 req/sec (5.8x faster than Gunicorn)
- Rename workers config option to num_contexts for clarity - Default num_contexts to erlang:system_info(schedulers) - Restart context pool when num_contexts changes - Fix _get_app safety check for multi-app scenarios - Remove unused Python runtime functions
- Replace py_event_loop:create_task with spawn_task (fire-and-forget) - Use py_buffer for request body streaming (consistent with WSGI) - Handle async_result messages in receive loops - Add asgi_noop_app.py benchmark app - Simplify ASGI worker to use erlang.run() pattern ASGI performance: ~39k req/sec (WSGI: ~62.5k req/sec)
- Add pre-computed ASGI_SCOPE_TEMPLATE macro for static scope fields - Cache _erlang_send function reference to avoid attribute lookup per call - Store cached send in _ASGISend.__slots__ for instance-level access Performance improvement: - Before: ~39k req/sec - After: ~63k req/sec (+62%) - ASGI now matches WSGI performance
Check body size threshold before checking more_body flag. This ensures large single-chunk responses are streamed instead of buffered, preventing memory issues with large responses. - Reorder threshold check to happen first - Stream if total_size >= BUFFER_THRESHOLD (64KB) - Add test app for large response validation
Fetch lifespan_state once when configuring cowboy routes instead of calling hornbeam_lifespan:get_state() on every request. - Add lifespan_state to HandlerState in start_listener - Add lifespan_state to multi-app HandlerState - Use cached state in build_scope instead of ETS lookup
- Fix quadratic buffering in ASGI send with O(1) size tracking - Use create_task instead of spawn_task to avoid process overhead - Move pythonpath setup to mount registration (not per-request) - Implement ASGI request body streaming with more_body support - Wire up WSGI tuple fast path for O(1) environ creation ASGI now at 86% of WSGI throughput (67.5K vs 78.3K req/s).
Use Python-side lifespan state dict from hornbeam_lifespan_runner instead of the Erlang-provided copy. This ensures state modifications made by request handlers persist across requests per ASGI spec.
- Add context_mode option (worker | owngil) to hornbeam and context pool - owngil mode uses per-interpreter GIL for true parallelism (Python 3.12+) - Update benchmark to support WSGI owngil testing via PYTHON_CONFIG env var - Rebuild erlang_python when PYTHON_CONFIG is set for correct Python version
Use hornbeam_context_pool instead of py:context() to ensure priv/ is in sys.path when calling Python lifespan functions. Also use py_nif:context_call with empty options map to avoid passing timeout as Python kwargs.
- Add _MutableStateProxy in hornbeam_asgi_worker.py that syncs
scope['state'] mutations to Erlang ETS via erlang.send()
- Add update_state/2 and update_state/3 to hornbeam_lifespan.erl
- Add handle_info for {<<"update_state">>, Key, Value} messages
- Read fresh lifespan state from ETS per request (not cached)
- Update lifespan_test_app.py to prefer scope state over module state
- Requires erlang-python with erlang.whereis() support
- Remove unused buffering logic in _ASGISend - Stream all responses directly through ByteChannel - Fix default status code from 400 to 200 on http.response.start
- Remove unused fast path response handler - Remove debug logging statements - Add hop-by-hop header filtering to streaming path - Close request channel when response starts
- Raise RuntimeError if http.response.start sent twice - Raise RuntimeError if http.response.body sent before start - Raise RuntimeError if send called after response completed - Raise OSError on client disconnect per ASGI spec 2.4
Switch from py_event_loop to py_event_loop_pool for better load distribution across multiple event loops. Process affinity ensures ordered execution for requests from the same handler. Benchmark shows improved scaling at higher concurrency: - 200 connections: 25.4k req/s - 400 connections: 27.5k req/s
WSGI worker: - Remove unnecessary decode() calls (erlang_python handles in C) - Add documentation for binary-to-string conversion Lifespan runner: - Add per-mount lifespan support for multi-app mode - Each mount gets isolated state dict - Add startup_mount/shutdown_mount functions hornbeam.erl: - Pass mount_id to lifespan startup for state isolation - Build mount-specific options for lifespan protocol
- Add chunk coalescing in drain_response_channel (4KB threshold, 1ms timeout) to batch small chunks and reduce per-request syscall overhead - Unify scope builders: use hornbeam_request:build_asgi_scope everywhere, remove duplicate build_scope from handler - Optimize hooks: store individual hooks in persistent_term with direct keys for zero-overhead check when no hooks configured
Skip channel/pump for small bodies (<64KB): pass directly to Python. Large bodies still use channel streaming. Reduces process spawns and memory pressure for typical requests.
Prototype using Cowboy's async body reading with push/pull pattern: - cowboy_req:cast for async body chunks - ASGIProtocol class mirroring asyncio.Protocol interface - Buffer + asyncio.Event for ASGI receive() Use worker_class => asgi_loop to test. Not yet optimized for production.
- Remove response channel, use erlang.send() directly for body - Remove drain loop with timer polling - Remove Python buffer/event/reader task pattern - Read from request channel directly in receive() Result: -130 lines, +21% GET, +10% POST throughput
- Rename hornbeam_asgi_loop to hornbeam_asgi - Remove old ASGI handler code from hornbeam_handler.erl - Use direct reply for responses with Content-Length - Use chunked encoding for responses without Content-Length - Fix empty body handling: check Transfer-Encoding too - Simplify Python worker: no buffering, read channel directly Removes ~900 lines, all 38 ASGI tests pass.
- Cache pid/streamid in state for direct send (avoid map lookups) - Use direct Pid ! message instead of cowboy_req:cast for body reading - Replace 5+ message types with simplified protocol: - start_response: headers + first chunk - chunk: subsequent body chunks - fin: end of response - Remove headers_sent/buffered_headers state fields
Module is already imported by Erlang via ensure_all_imported, so use sys.modules lookup with caching instead of importlib.
Replace py_event_loop_pool with py_event_loop:get_loop() to remove pool routing overhead.
- Register lifespan_state_get/set callbacks at startup in hornbeam_lifespan - Replace _MutableStateProxy with _LazyStateProxy in Python - Remove state from ASGI scope building, Python fetches lazily Benefits: - No erlang.whereis() on every request - State only fetched when actually accessed - Direct ETS access via callbacks, no message passing
- hornbeam_context_pool now uses py_context_router for context lifecycle - Cache NIF refs in persistent_term for O(1) access without message passing - Cache state proxies per mount_id to avoid allocation per request - Use py_import:add_path and py_import:ensure_imported for mount setup
- hornbeam_context_pool now caches NIF refs from default pool - Wait for py_context_router to be ready before caching - Remove workers and pool_enabled from mount type (use shared pool) - Simplify mount type to just routing config
- Add [Unreleased] changelog section with performance metrics - Remove per-mount workers option from docs (now uses shared pool) - Add notes explaining shared py_context_router pool architecture
- Delete hornbeam_pool.erl (replaced by hornbeam_context_pool) - Remove unused get_context_rr/0 and stats/0 from hornbeam_context_pool - Remove dead handle_request and _process_environ from WSGI worker - Remove unused streaming code from ASGI runner - Fix comment in hornbeam_sup.erl
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary