Fix config persistence: missing commands, delete handling, multipart upload#237
Conversation
Multiple related bugs in the Save-to-Flash path:
1. Multipart upload was missing the required filename argument,
causing the backend parser to fail. Added 'config.txt' to the
form.append call. This was likely the primary reason Save-to-Flash
was unreliable.
2. Web UI was not pausing its polling interval during flash write,
causing CPU contention and intermittent crashes. Added isSaving
lock and clearInterval before sendConfig.
3. conf_cmds whitelist was incomplete. Added: syslog, passwd, pvid,
ingress, port name, lag, laghash, isolate, stp, igmp, mtu, bw,
vlan N mgmt, vlan N d.
4. VLAN regex blocked named VLANs. New pattern allows optional name
(starts with letter, matching CLI parser semantics).
5. vlan N d (delete) was not persisted. parseConf now removes the
matching vlan N ... entry from configuration[] when seeing a
delete command, without storing the delete itself. Result: the
saved config describes the end state.
6. configuration[] was not cleared between flashSave invocations,
leading to stale entries from prior interactions.
7. conf_overwrite boundary fix: 'pvid 1' no longer matches 'pvid 10'
etc. Added trailing space in startsWith check.
8. All conf_cmds patterns now anchored with ^...$ for full-line
match. parseConf normalizes whitespace before testing.
9. Port range widened to \d{1,2} so ports 11+ are accepted.
Structural fixes (1, 2, 6, 7, 8) ported from mcaptur's closed PR logicog#219;
remaining fixes (3, 4, 5, 9) and overall regex strategy are new.
The pattern /^vlan\s+\d{1,4}\b/ matched both VLAN membership entries
(vlan N <ports>) and management entries (vlan N mgmt), causing them
to overwrite each other in configuration[]. When saving, whichever
form came last in the cmd_log would dedupe the other out of the
final config — resulting in either lost membership or a stale mgmt
setting.
Split into two patterns using negative lookahead:
- /^vlan\s+\d{1,4}\s+mgmt$/ matches only mgmt entries
- /^vlan\s+\d{1,4}(?!\s+mgmt\b)/ matches everything else
Both dedup independently. Discovered via hardware test on 6XH-X
where 'vlan 44 mgmt' silently dropped 'vlan 44 management 2t 5u'
from saved config, locking out web UI on next boot.
|
I started testing and so far did not find issues, but I want to do more tests later this evening. Also did not really go through the code, yet. |
This correctly describes the symptoms, but I believe the reason is different: The issue is that the tiny web-server in the switch can only handle one request at a time. So while the flashing request is ongoing, the web-server cannot answer to other requests, such as the polling. It is of course still true that flashing busy waits for the flash to be complete, but it does not block the CPU. |
|
Sorry for the delay, finally found time for testing (I hate testing, I love trying to figure out how some piece of hardware works). I am not able to find an issue, and code looks good. So will merge. |
|
Thanks so much for taking care of the issues! |
Base / Compare
logicog/RTLPlayground:mainErdnusschokolade/RTLPlayground:fix/config-persistenceDescription
This addresses multiple issues in the "Save to Flash" path that
together cause configuration to be lost or silently mis-saved across
reboots. Closes #218, closes #134.
Root causes
Multipart upload missing filename argument. The previous
form.append("configuration", new Blob(...))call did not provide afilename, which the backend multipart parser requires. This was
likely the primary reason saves were unreliable to begin with — the
parser would fail before any data reached the flash.
UI polling not paused during flash write. Flash writes block the
SoC CPU. Polling requests landing during the write caused intermittent
crashes. The fix adds an
isSavinglock and clears the poll intervalin
sendConfig, restoring polling in thefinallyblock.Incomplete
conf_cmdswhitelist. Many persistable CLI commandswere silently dropped during
parseConf:syslog,passwd,pvid,ingress,port name,lag,laghash,isolate,stp,igmp,mtu,bw,vlan N mgmt,vlan N d. The new whitelist covers allof these with anchored, precise regex patterns.
VLAN regex blocked named VLANs. The old pattern required digit
port tokens immediately after the VLAN ID, so any name (e.g.
vlan 10 iot 1u 2t) failed to match. The new pattern allows an optional namethat must start with a letter, matching the CLI parser's
isletter()behavior.
vlan N dwas not persisted. Delete commands were filtered outentirely, so deleted VLANs reappeared after reboot.
parseConfnowintercepts
vlan N dbefore the whitelist test, removes the matchingvlan N ...entry fromconfiguration[], and does not store thedelete command itself. The saved config describes the end state, not
the command sequence.
configuration[]not cleared between saves. Stale entries fromprevious interactions in the same session could leak into the saved
config. Cleared at the start of
flashSave.conf_overwriteboundary bug.pvid 1matched againstpvid 10,pvid 11, etc. Boundary fix using\bin patterns andadding the trailing space in the
startsWithcheck.conf_overwriteover-broadvlanpattern. The pattern/^vlan\s+\d{1,4}\b/matched both membership and managemententries, so
vlan 44 mgmtwould silently removevlan 44 management 2t 5uand vice versa — locking out the web UIon next boot. Split into two patterns using negative lookahead so
membership and management entries dedup independently.
Patterns now anchored. All
conf_cmdspatterns use^...$forfull-line match.
parseConfnormalizes whitespace before testing.Port range widened. Port tokens now accept two digits, so ports
11 and above are persisted correctly (side fix while reworking the
regex).
Relationship to other PRs
This fix is technically independent of #233 (VLAN overview UI) — it
builds cleanly on current
mainwithout any of the overview code.That said, #233 makes this bug significantly more visible. With the
overview UI, users actively name and manage VLANs through the
interface, so losing those settings on reboot becomes apparent
immediately. While I understand a UI rewrite is in progress (per the
discussion in #219), the underlying persistence bugs affect every
non-trivial configuration today and seem worth fixing in the current
UI rather than waiting for the rewrite.
Credit
A substantial part of the structural rework in
config.jsandsystem.jsis ported from @mcaptur's closed PR #219, which @logicoghad already reviewed and positively tested. Specifically taken from
that PR:
"config.txt")isSavinglock andclearInterval/restore aroundsendConfigconfiguration = []at the start offlashSaveawait-based merge inflashSave(replacing nested.then())parseConf^-anchoring strategy forconf_cmdsconf_overwrite(trailing-spacestartsWith)breakafter firstconf_overwritematchcmd_log_clearafter successful saveNew in this PR:
conf_cmdswhitelist (precise patterns instead of.*wildcards) covering syslog, passwd, pvid, ingress, port name, lag,
laghash, isolate, stp, igmp, mtu, bw, vlan mgmt, vlan delete
isletter()semanticsvlan N dend-state handling (saved config describes end staterather than command sequence)
conf_overwriteto keep VLANmembership and VLAN management entries independent
\b-based prefix matching inconf_overwrite(replaces theearlier trailing
\s+approach which broke the boundary check)Testing
Tested on KP-9000-6XH-X. Verified positive cases (command set,
"Save settings to flash", reboot, confirmed active and present in
saved config):
vlan 10 iot 1u 2t)vlan N d(delete) — saved config describes end state, nodelete command stored, no leftover entry
vlan N mgmttogether withvlan N <members>— both persistand remain active after reboot (regression test for the original
membership/management overwrite scenario)
pvid,ingress,port name,passwd,lag,laghash,isolate,stp off,igmp on/off,mtu,bwentries, results consistent
Note:
stp onwas not exercised in detail. During a quick test,enabling STP made the switch's management interface unreachable
(traffic between normal ports continued to forward, but the CPU
port appeared not to recover). This looks like a separate STP
implementation issue and is not in scope here — the whitelist entry
stp on/offpersists correctly either way.Verified negative cases (invalid or incomplete commands should not
end up in saved config):
vlanalone,pvidalone,xyz 123,stp maybe— allcorrectly dropped by the whitelist
(Side observation during testing:
stp maybeis silentlyinterpreted as "disable" by the CLI parser. Whitelist filtering
prevents this from being persisted, so reboot does not propagate
the unintended disable. Separate issue, not in scope here.)