Skip to content

Rp request cache tombstones tests#1271

Draft
rparke wants to merge 48 commits into
mainfrom
rp-request-cache-tombstones-tests
Draft

Rp request cache tombstones tests#1271
rparke wants to merge 48 commits into
mainfrom
rp-request-cache-tombstones-tests

Conversation

@rparke

@rparke rparke commented Oct 13, 2025

Copy link
Copy Markdown
Contributor

No description provided.

risicle and others added 24 commits June 27, 2025 11:20
this script allows an atomic compare-and-swap between two
msgpack-encoded payloads that contain float `timestamp`
fields.

why msgpack? because it has zero overhead when encoding
arbitrary strings, unlike e.g. json which may need to add
a lot of escaping characters. though this method & script
is agnostic to any contents of the msgpack value other than
the `timestamp` field, it is assumed that callers to this
method will want to further encode the payload of these
stored values so that the in-redis lua script (the
"critical section" in this scheme) doesn't have to parse
large & complex structures just to find the timestamp.

why not make the script responsible for the whole timestamp-
wrapping/encoding and decoding? to move as much of the
encoding and decoding work out of the critical section as
possible - the critical section effectively has exclusive
access to this key (and probably the whole redis instance)
for its whole runtime, so we really want to minimize the
work it has to do. with this scheme, the only extra work
the critical section needs to do is decoding two (hopefully
simple) msgpack values - and in cases where there is no
existing value, even that work will be omitted.

it's much more affordable performance-wise to have the
encoding/decoding work done by the clients because they
are distributed & horizontally scalable.

this tries to follow the interface of the RedisClient.set
method as much as possible, but omits the fancy options
like px, nx, xx, as it's not clear whether any of these
would make any sense in this context.
even with a delete-before, delete-after pattern, use of the
`delete` decorator contains a potential race condition. this is
because simply performing a delete when a cache is supposedly
invalidated doesn't take into account the fact that there
might already be another request in-flight that contains
now-stale data. when this other request "lands" it will
happily re-populate the cache with the stale (or at least
partially stale) data.

instead of performing a simple delete, overwrite the cache
value with a "tombstone" marker that contains a timestamp
of when the cache was supposedly invalidated (a `force_delete`
option remains for simply doing pure deletions, mostly to
allow for easy migration from previous utils versions, where
we will still want new app instances to be invalidating
the cache for old key schemes)

the `set` decorator is modified to treat these tombstones as
an empty value when encountered, and internally wraps its
stored values in a msgpack-encoded dict containing a
"pessimistic" timestamp for the generation of that value.
when setting a cache value it uses the new
`RedisClient.set_if_timestamp_newer`, which will skip storing
the new value if redis is found to contain another entry
with a newer timestamp than this "pessimistic" timestamp,
because this means that the new value may contain data from
before the deletion/invalidation was issued.

this also switches to using msgpack serialization instead of
json serialization for efficiency reasons detailed in the
addition of `RedisClient.set_if_timestamp_newer`.
more-or-less the same as delete_by_pattern albeit with a
slightly more complex implementation due to the use of
mset. sadly there is no succinct mechanism to set the
TTL of these keys in bulk.
…letion

similar to the `delete` method, don't delete matching keys but
overwrite them with a timestamped tombstone marker.

sadly there isn't an easy way to bulk-set TTL for a number of
keys, so these tombstones will live as long as the values they
were replacing.

a `force_delete` option exists to cause a straightforward
deletion - this is intended to assist in rolling out this
new utils version, where we will be wanting both old and
new versions of the app to be able to invalidate both old
and new keys during deployment.
will need some formatting work etc, but fighting
code-blocks-in-indented-bullet-lists is beyond me right now.
this is to allow the expected format of the wrapped-method's
return value to evolve safely. when this format changes, the
schema_version kwarg's value should be incremented to avoid
mismatch problems that may occur during or shortly after
deployment of the new version.

this versioning scheme is "internal" in that the version is
contained *in* the cached value as opposed to a scheme
where the version might be included in the cache key instead.
this has the advantage that it forces two different app
versions to compete for the same cache key, rather than old
and new app versions having a different "view" of whether
a value is cached or not and the age of that cached value.
this becomes particularly important when considering how
to handle updated values and complete invalidation of stale
cached values for all possible app versions that could be
active at any time.

when a cached value with a mismatched schema version is
encountered, it is simply ignored and overwritten with the
inner function's return value. though it might seem that
the better thing to do would be to only defer to the higher
schema version (therefore always be "upgrading"), during
a slow or stuck deployment that could result in all of the
old versions of the app being prevented from using a cached
value at all. in a case where we had more old instances
than new instances this could cause a significant
performance problem. though an app always overwriting a
value with its own schema version could result in a
back-and-forth of continual overwriting between two app
versions, the probability of the cache holding a value
compatible with any particular app version is approximately
proportional to the proportional population of app
instances running it.
this script allows an atomic compare-and-swap between two
msgpack-encoded payloads that contain float `timestamp`
fields.

why msgpack? because it has zero overhead when encoding
arbitrary strings, unlike e.g. json which may need to add
a lot of escaping characters. though this method & script
is agnostic to any contents of the msgpack value other than
the `timestamp` field, it is assumed that callers to this
method will want to further encode the payload of these
stored values so that the in-redis lua script (the
"critical section" in this scheme) doesn't have to parse
large & complex structures just to find the timestamp.

why not make the script responsible for the whole timestamp-
wrapping/encoding and decoding? to move as much of the
encoding and decoding work out of the critical section as
possible - the critical section effectively has exclusive
access to this key (and probably the whole redis instance)
for its whole runtime, so we really want to minimize the
work it has to do. with this scheme, the only extra work
the critical section needs to do is decoding two (hopefully
simple) msgpack values - and in cases where there is no
existing value, even that work will be omitted.

it's much more affordable performance-wise to have the
encoding/decoding work done by the clients because they
are distributed & horizontally scalable.

this tries to follow the interface of the RedisClient.set
method as much as possible, but omits the fancy options
like px, nx, xx, as it's not clear whether any of these
would make any sense in this context.
even with a delete-before, delete-after pattern, use of the
`delete` decorator contains a potential race condition. this is
because simply performing a delete when a cache is supposedly
invalidated doesn't take into account the fact that there
might already be another request in-flight that contains
now-stale data. when this other request "lands" it will
happily re-populate the cache with the stale (or at least
partially stale) data.

instead of performing a simple delete, overwrite the cache
value with a "tombstone" marker that contains a timestamp
of when the cache was supposedly invalidated (a `force_delete`
option remains for simply doing pure deletions, mostly to
allow for easy migration from previous utils versions, where
we will still want new app instances to be invalidating
the cache for old key schemes)

the `set` decorator is modified to treat these tombstones as
an empty value when encountered, and internally wraps its
stored values in a msgpack-encoded dict containing a
"pessimistic" timestamp for the generation of that value.
when setting a cache value it uses the new
`RedisClient.set_if_timestamp_newer`, which will skip storing
the new value if redis is found to contain another entry
with a newer timestamp than this "pessimistic" timestamp,
because this means that the new value may contain data from
before the deletion/invalidation was issued.

this also switches to using msgpack serialization instead of
json serialization for efficiency reasons detailed in the
addition of `RedisClient.set_if_timestamp_newer`.
more-or-less the same as delete_by_pattern albeit with a
slightly more complex implementation due to the use of
mset. sadly there is no succinct mechanism to set the
TTL of these keys in bulk.
…letion

similar to the `delete` method, don't delete matching keys but
overwrite them with a timestamped tombstone marker.

sadly there isn't an easy way to bulk-set TTL for a number of
keys, so these tombstones will live as long as the values they
were replacing.

a `force_delete` option exists to cause a straightforward
deletion - this is intended to assist in rolling out this
new utils version, where we will be wanting both old and
new versions of the app to be able to invalidate both old
and new keys during deployment.
will need some formatting work etc, but fighting
code-blocks-in-indented-bullet-lists is beyond me right now.
this is to allow the expected format of the wrapped-method's
return value to evolve safely. when this format changes, the
schema_version kwarg's value should be incremented to avoid
mismatch problems that may occur during or shortly after
deployment of the new version.

this versioning scheme is "internal" in that the version is
contained *in* the cached value as opposed to a scheme
where the version might be included in the cache key instead.
this has the advantage that it forces two different app
versions to compete for the same cache key, rather than old
and new app versions having a different "view" of whether
a value is cached or not and the age of that cached value.
this becomes particularly important when considering how
to handle updated values and complete invalidation of stale
cached values for all possible app versions that could be
active at any time.

when a cached value with a mismatched schema version is
encountered, it is simply ignored and overwritten with the
inner function's return value. though it might seem that
the better thing to do would be to only defer to the higher
schema version (therefore always be "upgrading"), during
a slow or stuck deployment that could result in all of the
old versions of the app being prevented from using a cached
value at all. in a case where we had more old instances
than new instances this could cause a significant
performance problem. though an app always overwriting a
value with its own schema version could result in a
back-and-forth of continual overwriting between two app
versions, the probability of the cache holding a value
compatible with any particular app version is approximately
proportional to the proportional population of app
instances running it.
…style RequestCache

to allow mostly-seamless rollout, an existing cache value that
can't be decoded by msgpack is assumed to be a json encoding
of an old-style RequestCache value and treated as having a
schema_version of 0.

similarly, if our local schema_version is specified as 0 (through
the schema_version arg), the payload will be stored back in redis
with a simple un-wrapped json-encoding. as such, when the local
schema_version is set to 0 it should behave identically to
old-style RequestCache and it should be possible to introduce this
wrapping-aware RequestCache seamlesslessly as schema_version 0,
before re-deploying it as schema_version=1 to enable the new
tombstone and schema-versioning functionality. interruption should
be similar to that caused by a regular schema_version bump being
rolled out.

the RequestCache.delete and RequestCache.delete_by_pattern methods
already have force_delete arguments that can be used to serve a
similar purpose.
redis_lock_file = root_tmp_dir / "redis_lock"
app.config["REDIS_ENABLED"] = True
app.config["REDIS_URL"] = "redis://localhost:6999/0"
app.config["REDIS_URL"] = "redis://localhost:7000/0"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DO NOT LET THIS CHANGE THROUGH.

This was just a workaround to get my local redis version working

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants