Rp request cache tombstones tests#1271
Draft
rparke wants to merge 48 commits into
Draft
Conversation
this script allows an atomic compare-and-swap between two msgpack-encoded payloads that contain float `timestamp` fields. why msgpack? because it has zero overhead when encoding arbitrary strings, unlike e.g. json which may need to add a lot of escaping characters. though this method & script is agnostic to any contents of the msgpack value other than the `timestamp` field, it is assumed that callers to this method will want to further encode the payload of these stored values so that the in-redis lua script (the "critical section" in this scheme) doesn't have to parse large & complex structures just to find the timestamp. why not make the script responsible for the whole timestamp- wrapping/encoding and decoding? to move as much of the encoding and decoding work out of the critical section as possible - the critical section effectively has exclusive access to this key (and probably the whole redis instance) for its whole runtime, so we really want to minimize the work it has to do. with this scheme, the only extra work the critical section needs to do is decoding two (hopefully simple) msgpack values - and in cases where there is no existing value, even that work will be omitted. it's much more affordable performance-wise to have the encoding/decoding work done by the clients because they are distributed & horizontally scalable. this tries to follow the interface of the RedisClient.set method as much as possible, but omits the fancy options like px, nx, xx, as it's not clear whether any of these would make any sense in this context.
even with a delete-before, delete-after pattern, use of the `delete` decorator contains a potential race condition. this is because simply performing a delete when a cache is supposedly invalidated doesn't take into account the fact that there might already be another request in-flight that contains now-stale data. when this other request "lands" it will happily re-populate the cache with the stale (or at least partially stale) data. instead of performing a simple delete, overwrite the cache value with a "tombstone" marker that contains a timestamp of when the cache was supposedly invalidated (a `force_delete` option remains for simply doing pure deletions, mostly to allow for easy migration from previous utils versions, where we will still want new app instances to be invalidating the cache for old key schemes) the `set` decorator is modified to treat these tombstones as an empty value when encountered, and internally wraps its stored values in a msgpack-encoded dict containing a "pessimistic" timestamp for the generation of that value. when setting a cache value it uses the new `RedisClient.set_if_timestamp_newer`, which will skip storing the new value if redis is found to contain another entry with a newer timestamp than this "pessimistic" timestamp, because this means that the new value may contain data from before the deletion/invalidation was issued. this also switches to using msgpack serialization instead of json serialization for efficiency reasons detailed in the addition of `RedisClient.set_if_timestamp_newer`.
more-or-less the same as delete_by_pattern albeit with a slightly more complex implementation due to the use of mset. sadly there is no succinct mechanism to set the TTL of these keys in bulk.
…letion similar to the `delete` method, don't delete matching keys but overwrite them with a timestamped tombstone marker. sadly there isn't an easy way to bulk-set TTL for a number of keys, so these tombstones will live as long as the values they were replacing. a `force_delete` option exists to cause a straightforward deletion - this is intended to assist in rolling out this new utils version, where we will be wanting both old and new versions of the app to be able to invalidate both old and new keys during deployment.
will need some formatting work etc, but fighting code-blocks-in-indented-bullet-lists is beyond me right now.
this is to allow the expected format of the wrapped-method's return value to evolve safely. when this format changes, the schema_version kwarg's value should be incremented to avoid mismatch problems that may occur during or shortly after deployment of the new version. this versioning scheme is "internal" in that the version is contained *in* the cached value as opposed to a scheme where the version might be included in the cache key instead. this has the advantage that it forces two different app versions to compete for the same cache key, rather than old and new app versions having a different "view" of whether a value is cached or not and the age of that cached value. this becomes particularly important when considering how to handle updated values and complete invalidation of stale cached values for all possible app versions that could be active at any time. when a cached value with a mismatched schema version is encountered, it is simply ignored and overwritten with the inner function's return value. though it might seem that the better thing to do would be to only defer to the higher schema version (therefore always be "upgrading"), during a slow or stuck deployment that could result in all of the old versions of the app being prevented from using a cached value at all. in a case where we had more old instances than new instances this could cause a significant performance problem. though an app always overwriting a value with its own schema version could result in a back-and-forth of continual overwriting between two app versions, the probability of the cache holding a value compatible with any particular app version is approximately proportional to the proportional population of app instances running it.
this script allows an atomic compare-and-swap between two msgpack-encoded payloads that contain float `timestamp` fields. why msgpack? because it has zero overhead when encoding arbitrary strings, unlike e.g. json which may need to add a lot of escaping characters. though this method & script is agnostic to any contents of the msgpack value other than the `timestamp` field, it is assumed that callers to this method will want to further encode the payload of these stored values so that the in-redis lua script (the "critical section" in this scheme) doesn't have to parse large & complex structures just to find the timestamp. why not make the script responsible for the whole timestamp- wrapping/encoding and decoding? to move as much of the encoding and decoding work out of the critical section as possible - the critical section effectively has exclusive access to this key (and probably the whole redis instance) for its whole runtime, so we really want to minimize the work it has to do. with this scheme, the only extra work the critical section needs to do is decoding two (hopefully simple) msgpack values - and in cases where there is no existing value, even that work will be omitted. it's much more affordable performance-wise to have the encoding/decoding work done by the clients because they are distributed & horizontally scalable. this tries to follow the interface of the RedisClient.set method as much as possible, but omits the fancy options like px, nx, xx, as it's not clear whether any of these would make any sense in this context.
even with a delete-before, delete-after pattern, use of the `delete` decorator contains a potential race condition. this is because simply performing a delete when a cache is supposedly invalidated doesn't take into account the fact that there might already be another request in-flight that contains now-stale data. when this other request "lands" it will happily re-populate the cache with the stale (or at least partially stale) data. instead of performing a simple delete, overwrite the cache value with a "tombstone" marker that contains a timestamp of when the cache was supposedly invalidated (a `force_delete` option remains for simply doing pure deletions, mostly to allow for easy migration from previous utils versions, where we will still want new app instances to be invalidating the cache for old key schemes) the `set` decorator is modified to treat these tombstones as an empty value when encountered, and internally wraps its stored values in a msgpack-encoded dict containing a "pessimistic" timestamp for the generation of that value. when setting a cache value it uses the new `RedisClient.set_if_timestamp_newer`, which will skip storing the new value if redis is found to contain another entry with a newer timestamp than this "pessimistic" timestamp, because this means that the new value may contain data from before the deletion/invalidation was issued. this also switches to using msgpack serialization instead of json serialization for efficiency reasons detailed in the addition of `RedisClient.set_if_timestamp_newer`.
more-or-less the same as delete_by_pattern albeit with a slightly more complex implementation due to the use of mset. sadly there is no succinct mechanism to set the TTL of these keys in bulk.
…letion similar to the `delete` method, don't delete matching keys but overwrite them with a timestamped tombstone marker. sadly there isn't an easy way to bulk-set TTL for a number of keys, so these tombstones will live as long as the values they were replacing. a `force_delete` option exists to cause a straightforward deletion - this is intended to assist in rolling out this new utils version, where we will be wanting both old and new versions of the app to be able to invalidate both old and new keys during deployment.
will need some formatting work etc, but fighting code-blocks-in-indented-bullet-lists is beyond me right now.
this is to allow the expected format of the wrapped-method's return value to evolve safely. when this format changes, the schema_version kwarg's value should be incremented to avoid mismatch problems that may occur during or shortly after deployment of the new version. this versioning scheme is "internal" in that the version is contained *in* the cached value as opposed to a scheme where the version might be included in the cache key instead. this has the advantage that it forces two different app versions to compete for the same cache key, rather than old and new app versions having a different "view" of whether a value is cached or not and the age of that cached value. this becomes particularly important when considering how to handle updated values and complete invalidation of stale cached values for all possible app versions that could be active at any time. when a cached value with a mismatched schema version is encountered, it is simply ignored and overwritten with the inner function's return value. though it might seem that the better thing to do would be to only defer to the higher schema version (therefore always be "upgrading"), during a slow or stuck deployment that could result in all of the old versions of the app being prevented from using a cached value at all. in a case where we had more old instances than new instances this could cause a significant performance problem. though an app always overwriting a value with its own schema version could result in a back-and-forth of continual overwriting between two app versions, the probability of the cache holding a value compatible with any particular app version is approximately proportional to the proportional population of app instances running it.
…is cache now calls the overwrite script
… the delete method now
…in the correct mpacked format
…style RequestCache to allow mostly-seamless rollout, an existing cache value that can't be decoded by msgpack is assumed to be a json encoding of an old-style RequestCache value and treated as having a schema_version of 0. similarly, if our local schema_version is specified as 0 (through the schema_version arg), the payload will be stored back in redis with a simple un-wrapped json-encoding. as such, when the local schema_version is set to 0 it should behave identically to old-style RequestCache and it should be possible to introduce this wrapping-aware RequestCache seamlesslessly as schema_version 0, before re-deploying it as schema_version=1 to enable the new tombstone and schema-versioning functionality. interruption should be similar to that caused by a regular schema_version bump being rolled out. the RequestCache.delete and RequestCache.delete_by_pattern methods already have force_delete arguments that can be used to serve a similar purpose.
…to rp-request-cache-tombstones-tests
rparke
commented
Oct 13, 2025
| redis_lock_file = root_tmp_dir / "redis_lock" | ||
| app.config["REDIS_ENABLED"] = True | ||
| app.config["REDIS_URL"] = "redis://localhost:6999/0" | ||
| app.config["REDIS_URL"] = "redis://localhost:7000/0" |
Contributor
Author
There was a problem hiding this comment.
DO NOT LET THIS CHANGE THROUGH.
This was just a workaround to get my local redis version working
…s these are the aliases for packb and unpackb methods
…ce_delete default args this is intended for use when migrating from a wrapper-unaware utils version to a wrapper-aware one. initially all apps sharing a redis key should be deployed with a `RequestCache` with DEFAULT_SCHEMA_VERSION = 0 and DEFAULT_FORCE_DELETE=True, after which apps can be deployed with DEFAULT_SCHEMA_VERSION = 1 and DEFAULT_FORCE_DELETE=False to start making full use of the wrapper
…st-cache-tombstones-tests
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.