Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .jules/bolt.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## 2024-05-14 - Regex and Dictionary Hoisting in Escape Functions
**Learning:** In utility functions like `_escape_latex` that are called recursively or in loops over large data structures (e.g., ASTs or JSON), compiling a regex pattern and recreating a large static dictionary on every function call significantly degrades performance. It causes redundant object allocations and CPU cycles, turning $O(N)$ operations into $O(N^2)$ effectively when parsing many items.
**Action:** Hoist static dictionaries and regex compilations (`re.compile`) to module-level constants. This guarantees they are initialized only once when the module is imported, drastically improving the performance of the function, especially in recursive operations like `apply_escaping_recursive`.
40 changes: 21 additions & 19 deletions app.py
Original file line number Diff line number Diff line change
Expand Up @@ -473,6 +473,24 @@ def is_clean_handle(handle):
return "LinkedIn Profile"


LATEX_SPECIAL_CHARS = {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This dictionary and the associated LaTeX escaping logic are duplicated in resume_generator_latex.py. Consider consolidating these into a shared utility module to improve maintainability and ensure consistency across the application. Centralizing configuration mappings helps prevent logic from becoming out of sync.

References
  1. Avoid duplicating configuration data or mappings across different files to prevent them from becoming out of sync and to improve maintainability.

"\\": r"\textbackslash{}",
"&": r"\&",
"%": r"\%",
"$": r"\$",
"#": r"\#",
# "_": r"\_", # Don't escape: used for markdown italic/bold (_text_ and __text__)
"{": r"\{",
"}": r"\}",
# "~": r"\textasciitilde{}", # Don't escape: used for markdown strikethrough (~~text~~)
"^": r"\textasciicircum{}",
"<": r"\textless{}",
">": r"\textgreater{}",
"|": r"\textbar{}",
"-": r"{-}",
}
LATEX_ESCAPE_PATTERN = re.compile("|".join(re.escape(key) for key in LATEX_SPECIAL_CHARS.keys()))
Comment on lines +490 to +492
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To further optimize performance, hoist the replacement callback to a module-level function. This avoids the overhead of creating a new lambda object on every call to _escape_latex. Additionally, .keys() is redundant when iterating over a dictionary, and adding a descriptive comment for the hyphen escape improves clarity.

    "-": r"{-",  # Protect hyphens that might be misinterpreted as math operators
}
LATEX_ESCAPE_PATTERN = re.compile("|".join(re.escape(key) for key in LATEX_SPECIAL_CHARS))

def _latex_char_replacer(match):
    return LATEX_SPECIAL_CHARS[match.group(0)]


def _escape_latex(text):
r"""Escapes special LaTeX characters in a string to prevent compilation errors.

Expand All @@ -489,25 +507,9 @@ def _escape_latex(text):
if not isinstance(text, str):
return text

latex_special_chars = {
"\\": r"\textbackslash{}",
"&": r"\&",
"%": r"\%",
"$": r"\$",
"#": r"\#",
# "_": r"\_", # Don't escape: used for markdown italic/bold (_text_ and __text__)
"{": r"\{",
"}": r"\}",
# "~": r"\textasciitilde{}", # Don't escape: used for markdown strikethrough (~~text~~)
"^": r"\textasciicircum{}",
"<": r"\textless{}",
">": r"\textgreater{}",
"|": r"\textbar{}",
"-": r"{-}",
}

pattern = re.compile("|".join(re.escape(key) for key in latex_special_chars.keys()))
escaped_text = pattern.sub(lambda match: latex_special_chars[match.group(0)], text)
# Bolt Optimization: Regex compilation and mapping dictionary are hoisted to module-level constants
# to avoid O(N) redundant processing on every function call.
escaped_text = LATEX_ESCAPE_PATTERN.sub(lambda match: LATEX_SPECIAL_CHARS[match.group(0)], text)
Comment on lines +510 to +512
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Use the hoisted _latex_char_replacer function to avoid repeated lambda allocations.

Suggested change
# Bolt Optimization: Regex compilation and mapping dictionary are hoisted to module-level constants
# to avoid O(N) redundant processing on every function call.
escaped_text = LATEX_ESCAPE_PATTERN.sub(lambda match: LATEX_SPECIAL_CHARS[match.group(0)], text)
# Bolt Optimization: Regex compilation and mapping dictionary are hoisted to module-level constants
# to avoid redundant processing on every function call.
escaped_text = LATEX_ESCAPE_PATTERN.sub(_latex_char_replacer, text)

return escaped_text


Expand Down
64 changes: 33 additions & 31 deletions resume_generator_latex.py
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,36 @@ def calculate_columns(num_items, max_columns=4, min_items_per_column=2):
return max_columns # Default to max columns if all checks pass


# Order matters for some replacements (e.g., '\' before '&')
# NOTE: We intentionally DO NOT escape certain characters used in markdown syntax:
# - ~ (tilde) is used for strikethrough: ~~text~~
# - * (asterisk) is used for bold/italic: **text** or *text*
# - _ (underscore) is used for bold/italic: __text__ or _text_
# - + (plus) is used for underline: ++text++
# These will be converted to LaTeX commands by the markdown filters.
# Users should avoid literal underscores/tildes in text, or use asterisks for bold/italic instead.
LATEX_SPECIAL_CHARS = {
"\\": r"\textbackslash{}", # Backslash must be escaped first
"&": r"\&",
"%": r"\%",
"$": r"\$",
"#": r"\#",
# "_": r"\_", # NOT escaped - used for markdown bold/italic (__text__ and _text_)
"{": r"\{",
"}": r"\}",
# "~": r"\textasciitilde{}", # NOT escaped - used for markdown strikethrough (~~text~~)
"^": r"\textasciicircum{}",
"<": r"\textless{}",
">": r"\textgreater{}",
"|": r"\textbar{}",
# Hyphen/dash handling: default hyphen is good, but for en/em dashes use text-specific commands
"-": r"{-}", # Protect hyphens that might be misinterpreted as math operators
}

# Use a regular expression to find and replace all special characters
# This approach ensures each character is handled once
LATEX_ESCAPE_PATTERN = re.compile("|".join(re.escape(key) for key in LATEX_SPECIAL_CHARS.keys()))
Comment on lines +175 to +177
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Hoist the replacement callback to a module-level function to avoid repeated lambda allocations, and remove the redundant .keys() call.

# Use a regular expression to find and replace all special characters
# This approach ensures each character is handled once
LATEX_ESCAPE_PATTERN = re.compile("|".join(re.escape(key) for key in LATEX_SPECIAL_CHARS))

def _latex_char_replacer(match):
    return LATEX_SPECIAL_CHARS[match.group(0)]


def _escape_latex(text):
"""
Escapes special LaTeX characters in a string to prevent compilation errors.
Expand All @@ -162,37 +192,9 @@ def _escape_latex(text):
# as they don't need LaTeX escaping.
return text

# Define a mapping for LaTeX special characters
# Order matters for some replacements (e.g., '\' before '&')
# NOTE: We intentionally DO NOT escape certain characters used in markdown syntax:
# - ~ (tilde) is used for strikethrough: ~~text~~
# - * (asterisk) is used for bold/italic: **text** or *text*
# - _ (underscore) is used for bold/italic: __text__ or _text_
# - + (plus) is used for underline: ++text++
# These will be converted to LaTeX commands by the markdown filters.
# Users should avoid literal underscores/tildes in text, or use asterisks for bold/italic instead.
latex_special_chars = {
"\\": r"\textbackslash{}", # Backslash must be escaped first
"&": r"\&",
"%": r"\%",
"$": r"\$",
"#": r"\#",
# "_": r"\_", # NOT escaped - used for markdown bold/italic (__text__ and _text_)
"{": r"\{",
"}": r"\}",
# "~": r"\textasciitilde{}", # NOT escaped - used for markdown strikethrough (~~text~~)
"^": r"\textasciicircum{}",
"<": r"\textless{}",
">": r"\textgreater{}",
"|": r"\textbar{}",
# Hyphen/dash handling: default hyphen is good, but for en/em dashes use text-specific commands
"-": r"{-}", # Protect hyphens that might be misinterpreted as math operators
}

# Use a regular expression to find and replace all special characters
# This approach ensures each character is handled once
pattern = re.compile("|".join(re.escape(key) for key in latex_special_chars.keys()))
escaped_text = pattern.sub(lambda match: latex_special_chars[match.group(0)], text)
# Bolt Optimization: Regex compilation and mapping dictionary are hoisted to module-level constants
# to avoid O(N) redundant processing on every function call.
escaped_text = LATEX_ESCAPE_PATTERN.sub(lambda match: LATEX_SPECIAL_CHARS[match.group(0)], text)
Comment on lines +195 to +197
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Use the hoisted _latex_char_replacer function here.

Suggested change
# Bolt Optimization: Regex compilation and mapping dictionary are hoisted to module-level constants
# to avoid O(N) redundant processing on every function call.
escaped_text = LATEX_ESCAPE_PATTERN.sub(lambda match: LATEX_SPECIAL_CHARS[match.group(0)], text)
# Bolt Optimization: Regex compilation and mapping dictionary are hoisted to module-level constants
# to avoid redundant processing on every function call.
escaped_text = LATEX_ESCAPE_PATTERN.sub(_latex_char_replacer, text)


return escaped_text

Expand Down
Loading