chore: tighten TOCTOU windows in Asset write paths

## Summary

Tighten TOCTOU (time-of-check-to-time-of-use) windows in the Asset write paths in `blueflow/views/asset.py`. None of these races corrupt data — the DB-level `unique=True` and `unique_together` constraints catch every integrity violation that matters — but several let concurrent writes silently lose updates or surface as 500s instead of clean 4xx responses.

This is correctness work, not a security bug. The audit was triggered during PR #144 review; no exploitation in the wild is known.

## Background

Each finding follows the same structural pattern: a Django ORM read (`get`/`filter`/`first`) followed by a Python-side mutation followed by a write (`save`/`create`/`update`), with **no row-level locking** between the read and the write. Concurrent transactions can change the row in that window. The DB constraints are the only thing keeping the result valid; the application logic is racing.

The TOCTOU mitigations Django offers are well-trodden:

- `select_for_update()` — Postgres `SELECT ... FOR UPDATE` on the read side, blocks concurrent transactions from modifying the row until commit.
- `update_or_create()` — Django's atomic upsert primitive; uses `SELECT ... FOR UPDATE` internally.
- `transaction.atomic()` — wraps the read-then-write in a single transaction so locks are held coherently.
- DB-level array operators (e.g. `open_ports_tcp || ARRAY[...]`) — atomic at the SQL layer, no app-level read needed.

## Findings

### M1 (Medium) — `upsert()` lost updates on concurrent same-MAC writes

**File:** `blueflow/views/asset.py:956-963`

```python
asset = Asset.objects.get(mac_address=mac_address)   # read (no lock)
# ... mutate fields in Python ...
asset.save()                                         # write
```

Two scanners hitting the same MAC race: each `get`s the same snapshot, mutates locally, calls `save()`. Last write wins; the intermediate update is silently lost.

The `upsert` endpoint is explicitly designed for passive scanners (Tapirx, etc.) per its docstring — multiple scanners racing on the same MAC is the common case during a discovery sweep, not an exotic scenario.

**Fix:** wrap in `transaction.atomic()` + `select_for_update()`, or replace the entire get/mutate/save block with `Asset.objects.update_or_create(mac_address=..., defaults={...})`.

---

### M2 (Medium) — `bulk_update()` silently swallows concurrent deletes

**File:** `blueflow/views/asset.py:873-910`

Phase 1 reads all the assets *outside* any atomic block. Phase 2 writes them inside `transaction.atomic()`. Between the two, a concurrent transaction can DELETE one of the assets; the corresponding `serializer.save()` issues an UPDATE that affects 0 rows (Postgres returns success on UPDATE-of-deleted-row). The endpoint responds 200 with phase-1 data, claiming success. The asset is gone.

**Fix:** move the phase-1 reads inside the existing atomic block and use `select_for_update()` so locks are held until phase 2 commits.

---

### M3 (Medium) — `upsert()` open-ports merge uses stale read

**File:** `blueflow/views/asset.py:957-960`

```python
if new_ports:
    merged = sorted(set(asset.open_ports_tcp) | set(new_ports))
    ...
```

The `asset.open_ports_tcp` snapshot is from the M1 read above. A concurrent transaction adding a port observation between the read and the eventual `save()` is silently overwritten by the merge.

Port discovery is exactly the kind of incremental, multi-source data this endpoint exists to collect. Two scanners observing different open ports on the same device should both contribute; this race drops one.

**Fix:** same shape as M1 (atomic + lock), OR replace with a single SQL update that uses Postgres array union semantics directly, eliminating the read entirely.

---

### L1 (Low) — Hostname conflict check leaks IntegrityError as 500

**File:** `blueflow/views/asset.py:935-970` (introduced by PR #144)

The application-level conflict check at line 936-940 can return "no conflict" while a third transaction inserts a colliding hostname before line 966 runs `Asset.objects.create(...)`. The create raises `IntegrityError`, which propagates as 500.

**Severity rationale:** Data integrity preserved (the DB unique index catches it — this is by design, see PR #144 commentary). Only the response shape is wrong: race winner sees 400 from the application check; race loser sees 500 instead of 400.

**Fix:** wrap the create in `try/except IntegrityError` and return a 400 with the same shape as the application-level conflict response. Symmetric with the existing application-layer check.

---

### L2 (Low) — `tags()` action fallback fetch may DoesNotExist on concurrent delete

**File:** `blueflow/views/asset.py:719-735`

```python
try:
    asset_tag.save()
except IntegrityError:
    asset_tag = AssetTag.objects.get(asset=asset, tag=tag)   # may DoesNotExist
```

If a concurrent transaction deletes the AssetTag in the microsecond window between the `IntegrityError` catch and the fallback fetch, the `.get()` raises `DoesNotExist` and propagates as 500.

**Severity:** Very low — purely theoretical. Defer until this races in practice.

---

## Patterns done correctly (kept for reference)

| Pattern | Where | Why it's safe |
|---|---|---|
| Try-create-then-catch | `tags()` POST handler (`asset.py:717-728`) | Relies on the `unique_together` constraint on `AssetTag` to atomically reject duplicates at the DB layer. |
| All-or-nothing batch writes | `bulk_update()` phase 2 (`asset.py:904`) | The atomic block ensures partial writes don't commit. Read-side is the gap (M2 above). |
| DB-level identity guarantees | `unique=True` on `mac_address` and `hostname`; `unique_together` on join tables | These are the actual contract. Every TOCTOU finding above relies on them as the backstop. |

## Recommended fix order

1. **M1 first** — most user-visible, most common in scanner-heavy traffic, single-method change. ~10 lines.
2. **M3 alongside M1** — same handler, same fix shape. Doing them together costs ~zero extra.
3. **M2** — second priority. ~5 lines (move reads inside the existing atomic block + add `select_for_update()`).
4. **L1** — small follow-up; convert `IntegrityError` to 400 for symmetry with the application-level check. ~5 lines.
5. **L2** — defer indefinitely. File a separate issue if a real race ever surfaces in production logs.

## Tasks

- [ ] M1 — Use `select_for_update()` (or `update_or_create()`) for the upsert read/save in `views/asset.py:956-963`.
- [ ] M3 — Eliminate the stale-read open-ports merge in `views/asset.py:957-960` (covered by the same fix as M1 if `select_for_update()` is chosen).
- [ ] M2 — Move `bulk_update()` phase-1 reads inside the existing atomic block at `views/asset.py:904`; add `select_for_update()`.
- [ ] L1 — Add `try/except IntegrityError` around the upsert create (`views/asset.py:966`) returning a 400 in the same shape as the application-level conflict.
- [ ] Tests — for each fix, add a test that simulates the race using `transaction.on_commit` callbacks or threading; at minimum, document the TOCTOU window in a comment so the lock isn't accidentally removed by a future refactor.

## Acceptance criteria

- All M-severity races have either a `select_for_update()` lock, an `update_or_create()` call, or an SQL-atomic update operator covering them.
- L1 returns 400 (not 500) when the IntegrityError race fires.
- The existing test suite is at the same pass/fail count as `develop` baseline (no regressions).
- New regression tests exist for at least M1 and M2 to prevent the locks from being removed without surfacing the race.

## Out of scope

- TOCTOU in non-Asset views (groups, networks, custom fields, etc.) — separate audit if anyone wants it.
- Permission-check-then-act races — Django's `check_object_permissions` runs per-request and isn't a real TOCTOU surface unless permissions are swapped mid-request, which doesn't happen.
- Switching the deployment to `ATOMIC_REQUESTS=True` in settings — that's a different conversation about default request semantics, not a TOCTOU fix.

## Related

- PR #144 (hostname identity, the audit trigger; introduced finding L1)
- PR #58 (bulk_update introduction; M2 has been latent since this merged on 2026-04-03)
- Issue #145 (legacy `unique_together` modernization — independent but adjacent)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: tighten TOCTOU windows in Asset write paths #146

Summary

Background

Findings

M1 (Medium) — `upsert()` lost updates on concurrent same-MAC writes

M2 (Medium) — `bulk_update()` silently swallows concurrent deletes

M3 (Medium) — `upsert()` open-ports merge uses stale read

L1 (Low) — Hostname conflict check leaks IntegrityError as 500

L2 (Low) — `tags()` action fallback fetch may DoesNotExist on concurrent delete

Patterns done correctly (kept for reference)

Recommended fix order

Tasks

Acceptance criteria

Out of scope

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Pattern	Where	Why it's safe
Try-create-then-catch	`tags()` POST handler (`asset.py:717-728`)	Relies on the `unique_together` constraint on `AssetTag` to atomically reject duplicates at the DB layer.
All-or-nothing batch writes	`bulk_update()` phase 2 (`asset.py:904`)	The atomic block ensures partial writes don't commit. Read-side is the gap (M2 above).
DB-level identity guarantees	`unique=True` on `mac_address` and `hostname`; `unique_together` on join tables	These are the actual contract. Every TOCTOU finding above relies on them as the backstop.

chore: tighten TOCTOU windows in Asset write paths #146

Description

Summary

Background

Findings

M1 (Medium) — upsert() lost updates on concurrent same-MAC writes

M2 (Medium) — bulk_update() silently swallows concurrent deletes

M3 (Medium) — upsert() open-ports merge uses stale read

L1 (Low) — Hostname conflict check leaks IntegrityError as 500

L2 (Low) — tags() action fallback fetch may DoesNotExist on concurrent delete

Patterns done correctly (kept for reference)

Recommended fix order

Tasks

Acceptance criteria

Out of scope

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

M1 (Medium) — `upsert()` lost updates on concurrent same-MAC writes

M2 (Medium) — `bulk_update()` silently swallows concurrent deletes

M3 (Medium) — `upsert()` open-ports merge uses stale read

L2 (Low) — `tags()` action fallback fetch may DoesNotExist on concurrent delete