Summary
Tighten TOCTOU (time-of-check-to-time-of-use) windows in the Asset write paths in blueflow/views/asset.py. None of these races corrupt data — the DB-level unique=True and unique_together constraints catch every integrity violation that matters — but several let concurrent writes silently lose updates or surface as 500s instead of clean 4xx responses.
This is correctness work, not a security bug. The audit was triggered during PR #144 review; no exploitation in the wild is known.
Background
Each finding follows the same structural pattern: a Django ORM read (get/filter/first) followed by a Python-side mutation followed by a write (save/create/update), with no row-level locking between the read and the write. Concurrent transactions can change the row in that window. The DB constraints are the only thing keeping the result valid; the application logic is racing.
The TOCTOU mitigations Django offers are well-trodden:
select_for_update() — Postgres SELECT ... FOR UPDATE on the read side, blocks concurrent transactions from modifying the row until commit.
update_or_create() — Django's atomic upsert primitive; uses SELECT ... FOR UPDATE internally.
transaction.atomic() — wraps the read-then-write in a single transaction so locks are held coherently.
- DB-level array operators (e.g.
open_ports_tcp || ARRAY[...]) — atomic at the SQL layer, no app-level read needed.
Findings
M1 (Medium) — upsert() lost updates on concurrent same-MAC writes
File: blueflow/views/asset.py:956-963
asset = Asset.objects.get(mac_address=mac_address) # read (no lock)
# ... mutate fields in Python ...
asset.save() # write
Two scanners hitting the same MAC race: each gets the same snapshot, mutates locally, calls save(). Last write wins; the intermediate update is silently lost.
The upsert endpoint is explicitly designed for passive scanners (Tapirx, etc.) per its docstring — multiple scanners racing on the same MAC is the common case during a discovery sweep, not an exotic scenario.
Fix: wrap in transaction.atomic() + select_for_update(), or replace the entire get/mutate/save block with Asset.objects.update_or_create(mac_address=..., defaults={...}).
M2 (Medium) — bulk_update() silently swallows concurrent deletes
File: blueflow/views/asset.py:873-910
Phase 1 reads all the assets outside any atomic block. Phase 2 writes them inside transaction.atomic(). Between the two, a concurrent transaction can DELETE one of the assets; the corresponding serializer.save() issues an UPDATE that affects 0 rows (Postgres returns success on UPDATE-of-deleted-row). The endpoint responds 200 with phase-1 data, claiming success. The asset is gone.
Fix: move the phase-1 reads inside the existing atomic block and use select_for_update() so locks are held until phase 2 commits.
M3 (Medium) — upsert() open-ports merge uses stale read
File: blueflow/views/asset.py:957-960
if new_ports:
merged = sorted(set(asset.open_ports_tcp) | set(new_ports))
...
The asset.open_ports_tcp snapshot is from the M1 read above. A concurrent transaction adding a port observation between the read and the eventual save() is silently overwritten by the merge.
Port discovery is exactly the kind of incremental, multi-source data this endpoint exists to collect. Two scanners observing different open ports on the same device should both contribute; this race drops one.
Fix: same shape as M1 (atomic + lock), OR replace with a single SQL update that uses Postgres array union semantics directly, eliminating the read entirely.
L1 (Low) — Hostname conflict check leaks IntegrityError as 500
File: blueflow/views/asset.py:935-970 (introduced by PR #144)
The application-level conflict check at line 936-940 can return "no conflict" while a third transaction inserts a colliding hostname before line 966 runs Asset.objects.create(...). The create raises IntegrityError, which propagates as 500.
Severity rationale: Data integrity preserved (the DB unique index catches it — this is by design, see PR #144 commentary). Only the response shape is wrong: race winner sees 400 from the application check; race loser sees 500 instead of 400.
Fix: wrap the create in try/except IntegrityError and return a 400 with the same shape as the application-level conflict response. Symmetric with the existing application-layer check.
L2 (Low) — tags() action fallback fetch may DoesNotExist on concurrent delete
File: blueflow/views/asset.py:719-735
try:
asset_tag.save()
except IntegrityError:
asset_tag = AssetTag.objects.get(asset=asset, tag=tag) # may DoesNotExist
If a concurrent transaction deletes the AssetTag in the microsecond window between the IntegrityError catch and the fallback fetch, the .get() raises DoesNotExist and propagates as 500.
Severity: Very low — purely theoretical. Defer until this races in practice.
Patterns done correctly (kept for reference)
| Pattern |
Where |
Why it's safe |
| Try-create-then-catch |
tags() POST handler (asset.py:717-728) |
Relies on the unique_together constraint on AssetTag to atomically reject duplicates at the DB layer. |
| All-or-nothing batch writes |
bulk_update() phase 2 (asset.py:904) |
The atomic block ensures partial writes don't commit. Read-side is the gap (M2 above). |
| DB-level identity guarantees |
unique=True on mac_address and hostname; unique_together on join tables |
These are the actual contract. Every TOCTOU finding above relies on them as the backstop. |
Recommended fix order
- M1 first — most user-visible, most common in scanner-heavy traffic, single-method change. ~10 lines.
- M3 alongside M1 — same handler, same fix shape. Doing them together costs ~zero extra.
- M2 — second priority. ~5 lines (move reads inside the existing atomic block + add
select_for_update()).
- L1 — small follow-up; convert
IntegrityError to 400 for symmetry with the application-level check. ~5 lines.
- L2 — defer indefinitely. File a separate issue if a real race ever surfaces in production logs.
Tasks
Acceptance criteria
- All M-severity races have either a
select_for_update() lock, an update_or_create() call, or an SQL-atomic update operator covering them.
- L1 returns 400 (not 500) when the IntegrityError race fires.
- The existing test suite is at the same pass/fail count as
develop baseline (no regressions).
- New regression tests exist for at least M1 and M2 to prevent the locks from being removed without surfacing the race.
Out of scope
- TOCTOU in non-Asset views (groups, networks, custom fields, etc.) — separate audit if anyone wants it.
- Permission-check-then-act races — Django's
check_object_permissions runs per-request and isn't a real TOCTOU surface unless permissions are swapped mid-request, which doesn't happen.
- Switching the deployment to
ATOMIC_REQUESTS=True in settings — that's a different conversation about default request semantics, not a TOCTOU fix.
Related
Summary
Tighten TOCTOU (time-of-check-to-time-of-use) windows in the Asset write paths in
blueflow/views/asset.py. None of these races corrupt data — the DB-levelunique=Trueandunique_togetherconstraints catch every integrity violation that matters — but several let concurrent writes silently lose updates or surface as 500s instead of clean 4xx responses.This is correctness work, not a security bug. The audit was triggered during PR #144 review; no exploitation in the wild is known.
Background
Each finding follows the same structural pattern: a Django ORM read (
get/filter/first) followed by a Python-side mutation followed by a write (save/create/update), with no row-level locking between the read and the write. Concurrent transactions can change the row in that window. The DB constraints are the only thing keeping the result valid; the application logic is racing.The TOCTOU mitigations Django offers are well-trodden:
select_for_update()— PostgresSELECT ... FOR UPDATEon the read side, blocks concurrent transactions from modifying the row until commit.update_or_create()— Django's atomic upsert primitive; usesSELECT ... FOR UPDATEinternally.transaction.atomic()— wraps the read-then-write in a single transaction so locks are held coherently.open_ports_tcp || ARRAY[...]) — atomic at the SQL layer, no app-level read needed.Findings
M1 (Medium) —
upsert()lost updates on concurrent same-MAC writesFile:
blueflow/views/asset.py:956-963Two scanners hitting the same MAC race: each
gets the same snapshot, mutates locally, callssave(). Last write wins; the intermediate update is silently lost.The
upsertendpoint is explicitly designed for passive scanners (Tapirx, etc.) per its docstring — multiple scanners racing on the same MAC is the common case during a discovery sweep, not an exotic scenario.Fix: wrap in
transaction.atomic()+select_for_update(), or replace the entire get/mutate/save block withAsset.objects.update_or_create(mac_address=..., defaults={...}).M2 (Medium) —
bulk_update()silently swallows concurrent deletesFile:
blueflow/views/asset.py:873-910Phase 1 reads all the assets outside any atomic block. Phase 2 writes them inside
transaction.atomic(). Between the two, a concurrent transaction can DELETE one of the assets; the correspondingserializer.save()issues an UPDATE that affects 0 rows (Postgres returns success on UPDATE-of-deleted-row). The endpoint responds 200 with phase-1 data, claiming success. The asset is gone.Fix: move the phase-1 reads inside the existing atomic block and use
select_for_update()so locks are held until phase 2 commits.M3 (Medium) —
upsert()open-ports merge uses stale readFile:
blueflow/views/asset.py:957-960The
asset.open_ports_tcpsnapshot is from the M1 read above. A concurrent transaction adding a port observation between the read and the eventualsave()is silently overwritten by the merge.Port discovery is exactly the kind of incremental, multi-source data this endpoint exists to collect. Two scanners observing different open ports on the same device should both contribute; this race drops one.
Fix: same shape as M1 (atomic + lock), OR replace with a single SQL update that uses Postgres array union semantics directly, eliminating the read entirely.
L1 (Low) — Hostname conflict check leaks IntegrityError as 500
File:
blueflow/views/asset.py:935-970(introduced by PR #144)The application-level conflict check at line 936-940 can return "no conflict" while a third transaction inserts a colliding hostname before line 966 runs
Asset.objects.create(...). The create raisesIntegrityError, which propagates as 500.Severity rationale: Data integrity preserved (the DB unique index catches it — this is by design, see PR #144 commentary). Only the response shape is wrong: race winner sees 400 from the application check; race loser sees 500 instead of 400.
Fix: wrap the create in
try/except IntegrityErrorand return a 400 with the same shape as the application-level conflict response. Symmetric with the existing application-layer check.L2 (Low) —
tags()action fallback fetch may DoesNotExist on concurrent deleteFile:
blueflow/views/asset.py:719-735If a concurrent transaction deletes the AssetTag in the microsecond window between the
IntegrityErrorcatch and the fallback fetch, the.get()raisesDoesNotExistand propagates as 500.Severity: Very low — purely theoretical. Defer until this races in practice.
Patterns done correctly (kept for reference)
tags()POST handler (asset.py:717-728)unique_togetherconstraint onAssetTagto atomically reject duplicates at the DB layer.bulk_update()phase 2 (asset.py:904)unique=Trueonmac_addressandhostname;unique_togetheron join tablesRecommended fix order
select_for_update()).IntegrityErrorto 400 for symmetry with the application-level check. ~5 lines.Tasks
select_for_update()(orupdate_or_create()) for the upsert read/save inviews/asset.py:956-963.views/asset.py:957-960(covered by the same fix as M1 ifselect_for_update()is chosen).bulk_update()phase-1 reads inside the existing atomic block atviews/asset.py:904; addselect_for_update().try/except IntegrityErroraround the upsert create (views/asset.py:966) returning a 400 in the same shape as the application-level conflict.transaction.on_commitcallbacks or threading; at minimum, document the TOCTOU window in a comment so the lock isn't accidentally removed by a future refactor.Acceptance criteria
select_for_update()lock, anupdate_or_create()call, or an SQL-atomic update operator covering them.developbaseline (no regressions).Out of scope
check_object_permissionsruns per-request and isn't a real TOCTOU surface unless permissions are swapped mid-request, which doesn't happen.ATOMIC_REQUESTS=Truein settings — that's a different conversation about default request semantics, not a TOCTOU fix.Related
unique_togethermodernization — independent but adjacent)