Conversation
|
can we merge this PR please? |
|
Python 3.13 is supported actually :) #44 This PR is a just a better patch ^^ |
|
The last 2 PyPi releases support Python3.13 via PR #44, which uses the cgi backport package. The most recent release pins BeautifulSoup to avoid some upcoming breaking changes (see #47). I haven't had time to review what this PR does differently from the cgi/legacy-cgi packages (if anything), and test it against a wider corpus of materials. Before changing anything like this, I typically make sure any changes pass an internal test suite that parses a 10k+ international documents to ensure there are no regressions. The current tests are typically recreations of edge cases encountered when parsing those docs. I should get to this within the next 2 weeks. |
jvanasco
left a comment
There was a problem hiding this comment.
Even if you correct the quoting, I'm going to have to think about this PR quite a bit.
Headers are weird and many in the wild are malformed / not-rfc-compliant. I have some concerns about the potential impact of percent-encoded ASCII spaces in here as well.
| return None | ||
| for param in content_type.replace(" ", "").split(";"): | ||
| if "charset=" in param: | ||
| return param.split("=")[-1] |
There was a problem hiding this comment.
this should be (or similar):
return param.split("=")[-1].strip("'\"")
content-type can often look like either: the rfc allows for quoted values, not just "tokens".
content-type: charset=utf8
content-type: charset="utf8"
reference: https://datatracker.ietf.org/doc/html/rfc5987#section-3.2.2
There was a problem hiding this comment.
But in this case
return param.split("=")[-1].strip("'\"")
If we are text/html;charset=utf8, value return with script is *
[
"text/html;charset",
"utf8"
]
no ?
| raise NotParsable( | ||
| "NotParseable document detected! " | ||
| "content-type:'[%s]" % content_type, | ||
| "NotParseable document detected! content-type:'[%s]" % content_type, |
There was a problem hiding this comment.
Haha, yeah... Linter made this 😅
(It run on every save. To avoid having a commit that does linter+ modification, I preferred to make two commits. But yes, in this string use f-string will be a better choice)
Actually Python 3.13 isn't supported because cgi module is missing. in PR #44 (issue #43 ) I fix it by adding package
legacy-cgiIn this PR I entirely remove cgi package by rewriting function. All tests pass :)
The second commit I just running a linter on the modified file.