Summary
Running real LLM-generated Python workloads through Pyex surfaced several conformance gaps where valid Python 3 is rejected or expected stdlib is missing. Each silently turns correct model-generated code into an error, costing the agent turns. Grouping them here; related to #80 (raw-f-string lexer).
All repros run via the sandbox; the control (print(sum(range(10))) → 45) works.
Parser gaps (valid Python 3 rejected)
1. PEP 3132 starred unpacking in for targets
for a, b, *c in [(1, 2, 3, 4)]:
print(c)
- Expected:
[3, 4]
- Actual:
expected variable name in for loop at line 1
Valid since Python 3.0 (PEP 3132). Models reach for this when iterating over variable-length tuples.
2. try: with an inline statement
try: x = 1
except Exception: pass
print("ok")
- Expected:
ok
- Actual:
expected ':' after try at line 1
try: followed by a simple statement on the same line is valid CPython grammar (compound stmt with inline suite), same as if x: y = 1.
3. Raw-f-string prefix rf"..." / fr"..." — see #80
Tracked separately in #80; listing here for completeness as the same theme.
Stdlib gaps
4. ssl module absent
- Actual:
ImportError: no module named 'ssl'. Available modules: os, abc, base64, bis...
Code falling back from a higher-level client to stdlib HTTPS hits a dead end.
5. urllib.request.Request missing
import urllib.request as u
print(type(u.Request))
- Actual:
AttributeError: module 'urllib.request' has no attribute 'Request'
The urllib.request shim lacks Request, the standard way to build a request with headers — a very common idiom.
Why it matters
These are model-agnostic: any model emitting standard Python (starred unpacking, inline try, urllib.request.Request, raw-f-strings) hits a silent fault and burns turns recovering. A conformant parser + more complete stdlib coverage (or, where a construct is intentionally unsupported, a precise error naming it) would remove this class of friction.
Summary
Running real LLM-generated Python workloads through Pyex surfaced several conformance gaps where valid Python 3 is rejected or expected stdlib is missing. Each silently turns correct model-generated code into an error, costing the agent turns. Grouping them here; related to #80 (raw-f-string lexer).
All repros run via the sandbox; the control (
print(sum(range(10)))→45) works.Parser gaps (valid Python 3 rejected)
1. PEP 3132 starred unpacking in
fortargets[3, 4]expected variable name in for loop at line 1Valid since Python 3.0 (PEP 3132). Models reach for this when iterating over variable-length tuples.
2.
try:with an inline statementokexpected ':' after try at line 1try:followed by a simple statement on the same line is valid CPython grammar (compound stmt with inline suite), same asif x: y = 1.3. Raw-f-string prefix
rf"..."/fr"..."— see #80Tracked separately in #80; listing here for completeness as the same theme.
Stdlib gaps
4.
sslmodule absentImportError: no module named 'ssl'. Available modules: os, abc, base64, bis...Code falling back from a higher-level client to stdlib HTTPS hits a dead end.
5.
urllib.request.RequestmissingAttributeError: module 'urllib.request' has no attribute 'Request'The
urllib.requestshim lacksRequest, the standard way to build a request with headers — a very common idiom.Why it matters
These are model-agnostic: any model emitting standard Python (starred unpacking, inline
try,urllib.request.Request, raw-f-strings) hits a silent fault and burns turns recovering. A conformant parser + more complete stdlib coverage (or, where a construct is intentionally unsupported, a precise error naming it) would remove this class of friction.