Skip to content

HTML-escape table cell text to prevent < being parsed as tags#997

Open
Br1an67 wants to merge 1 commit intodatalab-to:masterfrom
Br1an67:fix/issue-881-escape-table-html
Open

HTML-escape table cell text to prevent < being parsed as tags#997
Br1an67 wants to merge 1 commit intodatalab-to:masterfrom
Br1an67:fix/issue-881-escape-table-html

Conversation

@Br1an67
Copy link

@Br1an67 Br1an67 commented Mar 1, 2026

Summary

Fix < characters in table cells being parsed as HTML tags by escaping cell text content before embedding it in the HTML representation.

In TableCell.assemble_html(), cell text lines were joined directly into HTML without escaping. When a cell contained < (e.g., <LOQ), it was interpreted as a broken HTML tag by the downstream HTML parser (BeautifulSoup/markdownify), corrupting the table output.

Closes #881

Changes

  • Added html.escape() to each text line in TableCell.assemble_html() before joining them into the HTML <td>/<th> element
  • This ensures <, >, and & in cell text are properly escaped as &lt;, &gt;, and &amp;

Example

Before: <td><LOQ</td> → parsed as broken HTML tag
After: <td>&lt;LOQ</td> → correctly preserved as literal text

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG: Output] < character in table cell is parsed as HTML tag

1 participant