-
Notifications
You must be signed in to change notification settings - Fork 15
⚡ Bolt: optimize _escape_latex regex compilation
#448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| ## 2024-05-14 - Regex and Dictionary Hoisting in Escape Functions | ||
| **Learning:** In utility functions like `_escape_latex` that are called recursively or in loops over large data structures (e.g., ASTs or JSON), compiling a regex pattern and recreating a large static dictionary on every function call significantly degrades performance. It causes redundant object allocations and CPU cycles, turning $O(N)$ operations into $O(N^2)$ effectively when parsing many items. | ||
| **Action:** Hoist static dictionaries and regex compilations (`re.compile`) to module-level constants. This guarantees they are initialized only once when the module is imported, drastically improving the performance of the function, especially in recursive operations like `apply_escaping_recursive`. |
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -473,6 +473,24 @@ def is_clean_handle(handle): | |||||||||||||
| return "LinkedIn Profile" | ||||||||||||||
|
|
||||||||||||||
|
|
||||||||||||||
| LATEX_SPECIAL_CHARS = { | ||||||||||||||
| "\\": r"\textbackslash{}", | ||||||||||||||
| "&": r"\&", | ||||||||||||||
| "%": r"\%", | ||||||||||||||
| "$": r"\$", | ||||||||||||||
| "#": r"\#", | ||||||||||||||
| # "_": r"\_", # Don't escape: used for markdown italic/bold (_text_ and __text__) | ||||||||||||||
| "{": r"\{", | ||||||||||||||
| "}": r"\}", | ||||||||||||||
| # "~": r"\textasciitilde{}", # Don't escape: used for markdown strikethrough (~~text~~) | ||||||||||||||
| "^": r"\textasciicircum{}", | ||||||||||||||
| "<": r"\textless{}", | ||||||||||||||
| ">": r"\textgreater{}", | ||||||||||||||
| "|": r"\textbar{}", | ||||||||||||||
| "-": r"{-}", | ||||||||||||||
| } | ||||||||||||||
| LATEX_ESCAPE_PATTERN = re.compile("|".join(re.escape(key) for key in LATEX_SPECIAL_CHARS.keys())) | ||||||||||||||
|
Comment on lines
+490
to
+492
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To further optimize performance, hoist the replacement callback to a module-level function. This avoids the overhead of creating a new "-": r"{-", # Protect hyphens that might be misinterpreted as math operators
}
LATEX_ESCAPE_PATTERN = re.compile("|".join(re.escape(key) for key in LATEX_SPECIAL_CHARS))
def _latex_char_replacer(match):
return LATEX_SPECIAL_CHARS[match.group(0)] |
||||||||||||||
|
|
||||||||||||||
| def _escape_latex(text): | ||||||||||||||
| r"""Escapes special LaTeX characters in a string to prevent compilation errors. | ||||||||||||||
|
|
||||||||||||||
|
|
@@ -489,25 +507,9 @@ def _escape_latex(text): | |||||||||||||
| if not isinstance(text, str): | ||||||||||||||
| return text | ||||||||||||||
|
|
||||||||||||||
| latex_special_chars = { | ||||||||||||||
| "\\": r"\textbackslash{}", | ||||||||||||||
| "&": r"\&", | ||||||||||||||
| "%": r"\%", | ||||||||||||||
| "$": r"\$", | ||||||||||||||
| "#": r"\#", | ||||||||||||||
| # "_": r"\_", # Don't escape: used for markdown italic/bold (_text_ and __text__) | ||||||||||||||
| "{": r"\{", | ||||||||||||||
| "}": r"\}", | ||||||||||||||
| # "~": r"\textasciitilde{}", # Don't escape: used for markdown strikethrough (~~text~~) | ||||||||||||||
| "^": r"\textasciicircum{}", | ||||||||||||||
| "<": r"\textless{}", | ||||||||||||||
| ">": r"\textgreater{}", | ||||||||||||||
| "|": r"\textbar{}", | ||||||||||||||
| "-": r"{-}", | ||||||||||||||
| } | ||||||||||||||
|
|
||||||||||||||
| pattern = re.compile("|".join(re.escape(key) for key in latex_special_chars.keys())) | ||||||||||||||
| escaped_text = pattern.sub(lambda match: latex_special_chars[match.group(0)], text) | ||||||||||||||
| # Bolt Optimization: Regex compilation and mapping dictionary are hoisted to module-level constants | ||||||||||||||
| # to avoid O(N) redundant processing on every function call. | ||||||||||||||
| escaped_text = LATEX_ESCAPE_PATTERN.sub(lambda match: LATEX_SPECIAL_CHARS[match.group(0)], text) | ||||||||||||||
|
Comment on lines
+510
to
+512
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use the hoisted
Suggested change
|
||||||||||||||
| return escaped_text | ||||||||||||||
|
|
||||||||||||||
|
|
||||||||||||||
|
|
||||||||||||||
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -146,6 +146,36 @@ def calculate_columns(num_items, max_columns=4, min_items_per_column=2): | |||||||||||||
| return max_columns # Default to max columns if all checks pass | ||||||||||||||
|
|
||||||||||||||
|
|
||||||||||||||
| # Order matters for some replacements (e.g., '\' before '&') | ||||||||||||||
| # NOTE: We intentionally DO NOT escape certain characters used in markdown syntax: | ||||||||||||||
| # - ~ (tilde) is used for strikethrough: ~~text~~ | ||||||||||||||
| # - * (asterisk) is used for bold/italic: **text** or *text* | ||||||||||||||
| # - _ (underscore) is used for bold/italic: __text__ or _text_ | ||||||||||||||
| # - + (plus) is used for underline: ++text++ | ||||||||||||||
| # These will be converted to LaTeX commands by the markdown filters. | ||||||||||||||
| # Users should avoid literal underscores/tildes in text, or use asterisks for bold/italic instead. | ||||||||||||||
| LATEX_SPECIAL_CHARS = { | ||||||||||||||
| "\\": r"\textbackslash{}", # Backslash must be escaped first | ||||||||||||||
| "&": r"\&", | ||||||||||||||
| "%": r"\%", | ||||||||||||||
| "$": r"\$", | ||||||||||||||
| "#": r"\#", | ||||||||||||||
| # "_": r"\_", # NOT escaped - used for markdown bold/italic (__text__ and _text_) | ||||||||||||||
| "{": r"\{", | ||||||||||||||
| "}": r"\}", | ||||||||||||||
| # "~": r"\textasciitilde{}", # NOT escaped - used for markdown strikethrough (~~text~~) | ||||||||||||||
| "^": r"\textasciicircum{}", | ||||||||||||||
| "<": r"\textless{}", | ||||||||||||||
| ">": r"\textgreater{}", | ||||||||||||||
| "|": r"\textbar{}", | ||||||||||||||
| # Hyphen/dash handling: default hyphen is good, but for en/em dashes use text-specific commands | ||||||||||||||
| "-": r"{-}", # Protect hyphens that might be misinterpreted as math operators | ||||||||||||||
| } | ||||||||||||||
|
|
||||||||||||||
| # Use a regular expression to find and replace all special characters | ||||||||||||||
| # This approach ensures each character is handled once | ||||||||||||||
| LATEX_ESCAPE_PATTERN = re.compile("|".join(re.escape(key) for key in LATEX_SPECIAL_CHARS.keys())) | ||||||||||||||
|
Comment on lines
+175
to
+177
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hoist the replacement callback to a module-level function to avoid repeated lambda allocations, and remove the redundant # Use a regular expression to find and replace all special characters
# This approach ensures each character is handled once
LATEX_ESCAPE_PATTERN = re.compile("|".join(re.escape(key) for key in LATEX_SPECIAL_CHARS))
def _latex_char_replacer(match):
return LATEX_SPECIAL_CHARS[match.group(0)] |
||||||||||||||
|
|
||||||||||||||
| def _escape_latex(text): | ||||||||||||||
| """ | ||||||||||||||
| Escapes special LaTeX characters in a string to prevent compilation errors. | ||||||||||||||
|
|
@@ -162,37 +192,9 @@ def _escape_latex(text): | |||||||||||||
| # as they don't need LaTeX escaping. | ||||||||||||||
| return text | ||||||||||||||
|
|
||||||||||||||
| # Define a mapping for LaTeX special characters | ||||||||||||||
| # Order matters for some replacements (e.g., '\' before '&') | ||||||||||||||
| # NOTE: We intentionally DO NOT escape certain characters used in markdown syntax: | ||||||||||||||
| # - ~ (tilde) is used for strikethrough: ~~text~~ | ||||||||||||||
| # - * (asterisk) is used for bold/italic: **text** or *text* | ||||||||||||||
| # - _ (underscore) is used for bold/italic: __text__ or _text_ | ||||||||||||||
| # - + (plus) is used for underline: ++text++ | ||||||||||||||
| # These will be converted to LaTeX commands by the markdown filters. | ||||||||||||||
| # Users should avoid literal underscores/tildes in text, or use asterisks for bold/italic instead. | ||||||||||||||
| latex_special_chars = { | ||||||||||||||
| "\\": r"\textbackslash{}", # Backslash must be escaped first | ||||||||||||||
| "&": r"\&", | ||||||||||||||
| "%": r"\%", | ||||||||||||||
| "$": r"\$", | ||||||||||||||
| "#": r"\#", | ||||||||||||||
| # "_": r"\_", # NOT escaped - used for markdown bold/italic (__text__ and _text_) | ||||||||||||||
| "{": r"\{", | ||||||||||||||
| "}": r"\}", | ||||||||||||||
| # "~": r"\textasciitilde{}", # NOT escaped - used for markdown strikethrough (~~text~~) | ||||||||||||||
| "^": r"\textasciicircum{}", | ||||||||||||||
| "<": r"\textless{}", | ||||||||||||||
| ">": r"\textgreater{}", | ||||||||||||||
| "|": r"\textbar{}", | ||||||||||||||
| # Hyphen/dash handling: default hyphen is good, but for en/em dashes use text-specific commands | ||||||||||||||
| "-": r"{-}", # Protect hyphens that might be misinterpreted as math operators | ||||||||||||||
| } | ||||||||||||||
|
|
||||||||||||||
| # Use a regular expression to find and replace all special characters | ||||||||||||||
| # This approach ensures each character is handled once | ||||||||||||||
| pattern = re.compile("|".join(re.escape(key) for key in latex_special_chars.keys())) | ||||||||||||||
| escaped_text = pattern.sub(lambda match: latex_special_chars[match.group(0)], text) | ||||||||||||||
| # Bolt Optimization: Regex compilation and mapping dictionary are hoisted to module-level constants | ||||||||||||||
| # to avoid O(N) redundant processing on every function call. | ||||||||||||||
| escaped_text = LATEX_ESCAPE_PATTERN.sub(lambda match: LATEX_SPECIAL_CHARS[match.group(0)], text) | ||||||||||||||
|
Comment on lines
+195
to
+197
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use the hoisted
Suggested change
|
||||||||||||||
|
|
||||||||||||||
| return escaped_text | ||||||||||||||
|
|
||||||||||||||
|
|
||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This dictionary and the associated LaTeX escaping logic are duplicated in
resume_generator_latex.py. Consider consolidating these into a shared utility module to improve maintainability and ensure consistency across the application. Centralizing configuration mappings helps prevent logic from becoming out of sync.References