Skip to content

Fix version regex to prevent false matches on keys containing 'version'#60

Closed
ArieLevs wants to merge 1 commit into
masterfrom
fix/issue-58-version-regex
Closed

Fix version regex to prevent false matches on keys containing 'version'#60
ArieLevs wants to merge 1 commit into
masterfrom
fix/issue-58-version-regex

Conversation

@ArieLevs

@ArieLevs ArieLevs commented Jan 2, 2026

Copy link
Copy Markdown
Owner

Summary

  • Fixed the version regex pattern to require version to appear at the start of a line (with optional leading whitespace)
  • This prevents false matches on keys like target-version or this-is-a-version-containing-key
  • Updated tests to use realistic multiline content matching actual file formats

Test plan

  • All 22 existing tests pass
  • Verified fix works with pyproject.toml containing target-version = "py312"
  • Verified fix works with pyproject.toml containing this-is-a-version-containing-key = "1.1.7"
  • Version bumping correctly updates only the actual version key

Fixes #58

🤖 Generated with Claude Code

The regex pattern now requires 'version' to appear at the start of a line
(with optional leading whitespace) to prevent false matches on keys like
'target-version' or 'this-is-a-version-containing-key'.

Fixes #58

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@codecov

codecov Bot commented Jan 2, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.42%. Comparing base (36ba60e) to head (1559d84).
⚠️ Report is 4 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master      #60   +/-   ##
=======================================
  Coverage   95.42%   95.42%           
=======================================
  Files           5        5           
  Lines         459      459           
=======================================
  Hits          438      438           
  Misses         21       21           
Flag Coverage Δ
unittests 95.42% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@claude

claude Bot commented Jan 2, 2026

Copy link
Copy Markdown

Code Review for PR #60

Summary

This PR fixes issue #58 by updating the version regex pattern to prevent false matches on keys containing the word "version" (like target-version or this-is-a-version-containing-key). The fix adds anchoring to require version to appear at the start of a line with optional leading whitespace.


✅ Strengths

  1. Correct Problem Identification: The fix properly addresses the root cause - the regex was too permissive and matched any occurrence of "version" in a line.

  2. Minimal, Focused Change: The core fix is a single-line change to the regex pattern, which is appropriate for this type of bug fix.

  3. Good Test Updates: Tests were updated to use realistic multiline content that better represents actual file formats (e.g., setup.py function calls with newlines and indentation).

  4. Comprehensive Test Coverage: All 22 existing tests pass, and the fix was verified against the specific problematic cases mentioned in issue bump crash when version string exists more then once #58.


🔍 Code Quality Analysis

The Regex Change (src/pybump.py:13)

-regex_version_pattern = re.compile(r"((?:__)?version(?:__)? ?= ?[\"'])(.+?)([\"'])")
+regex_version_pattern = re.compile(r"(^[ \t]*(?:__)?version(?:__)? ?= ?[\"'])(.+?)([\"'])", re.MULTILINE)

Positives:

  • Uses ^[ \t]* to anchor to start of line with optional spaces/tabs
  • Correctly adds re.MULTILINE flag so ^ matches line starts (not just string start)
  • Maintains backward compatibility with existing version formats (__version__, version)

Potential Issues:

  1. Edge Case - Mixed Spaces and Tabs: Using [ \t]* is correct for matching indentation, but consider documenting that the regex supports both.

  2. Capture Group Semantics: The regex still uses 3 capture groups:

    • Group 1: Everything before version value (including quote)
    • Group 2: Version value
    • Group 3: Closing quote

    This is correct and maintains compatibility with set_version_in_file() which uses \g<1>{} \g<3> substitution.


🧪 Test Coverage Assessment

Updated Tests Look Good

The test updates in test_pybump.py improve realism:

Before:

content_pre = 'some text before version="3.17.5", and some text after'

After:

content_pre = 'some text before\nversion="3.17.5"\nand some text after'

This better represents actual file content where version appears at the start of a line.

Missing Test Cases

While the PR description mentions verification, I recommend adding explicit test cases for the reported issue:

def test_version_regex_ignores_non_version_keys(self):
    """Test that keys containing 'version' but not being the version key are ignored"""
    # Should NOT match target-version
    content_with_target_version = '''
    [project]
    name = "test"
    version = "1.0.0"
    target-version = "py312"
    '''
    self.assertEqual(get_version_from_file(content_with_target_version), '1.0.0')
    
    # Should NOT match custom-version-key
    content_with_custom_key = '''
    name = "test"
    this-is-a-version-containing-key = "1.1.7"
    version = "2.0.0"
    '''
    self.assertEqual(get_version_from_file(content_with_custom_key), '2.0.0')

🐛 Potential Issues

1. Inline Version Assignments Could Be Problematic

The regex now requires version at the start of a line. Consider this edge case:

# This would NOT match anymore:
if True: version = "1.0.0"

# But this WOULD match (with indentation):
    version = "1.0.0"

Question: Is this the intended behavior? Based on the typical file formats PyBump supports (setup.py, pyproject.toml, Chart.yaml), this seems correct, but it's worth documenting.

2. Comment Characters Before Version

What about this case?

# version = "1.0.0"  # This is commented out
version = "2.0.0"     # This is the real version

The regex would match both lines (since both start with optional whitespace + version). However, this is likely acceptable since PyBump expects uncommented version declarations.

3. No Test for the Specific Error Message

The original issue showed:

RuntimeError: More than one 'version' found: [('version = "', '2.0.0', '"'), ('version = "', 'py312', '"')]

Consider adding a test that specifically verifies this error is no longer raised:

def test_no_error_with_target_version_key(self):
    """Regression test for issue #58"""
    content = '''
    [tool.black]
    target-version = "py312"
    
    [project]
    version = "2.0.0"
    '''
    # Should NOT raise "More than one 'version' found"
    self.assertEqual(get_version_from_file(content), '2.0.0')

⚡ Performance Considerations

The change has negligible performance impact:

  • re.MULTILINE flag is compile-time, not runtime
  • The ^[ \t]* anchor may actually make matching faster by failing early on non-matching lines

🔒 Security Considerations

No security concerns identified. The regex change:

  • Does not introduce ReDoS (Regular Expression Denial of Service) vulnerabilities
  • Does not change how version strings are validated (still uses the same capture groups)
  • Does not affect file I/O operations

📋 Recommendations

High Priority

  1. Add explicit test cases for the specific scenarios mentioned in issue bump crash when version string exists more then once #58 (target-version, custom-version-key)
  2. Document the regex behavior - add a comment explaining that version must appear at line start (with optional indentation)

Medium Priority

  1. Consider edge cases - Verify behavior with tab-indented files, mixed indentation
  2. Update error messages - The error "More than one 'version' found" could be more helpful by showing which lines matched

Low Priority

  1. Regex readability - Consider using a verbose regex with comments for future maintainability:
regex_version_pattern = re.compile(r'''
    ^              # Start of line
    [ \t]*         # Optional whitespace (spaces or tabs)
    (?:__)?        # Optional double underscore prefix
    version        # The literal string 'version'
    (?:__)?        # Optional double underscore suffix
    [ ]*=[ ]*      # Equals sign with optional spaces
    [\"']        # Opening quote
    (.+?)          # Version value (captured)
    [\"']        # Closing quote
''', re.MULTILINE | re.VERBOSE)

✅ Conclusion

Overall Assessment: LGTM with minor suggestions

This is a solid fix that correctly addresses the reported issue. The core change is minimal and appropriate. The main improvement would be adding explicit test cases for the reported scenarios to prevent regression.

Suggested Actions:

Great work on identifying and fixing this issue! 🎉

@ArieLevs ArieLevs closed this Jan 2, 2026
@ArieLevs ArieLevs deleted the fix/issue-58-version-regex branch January 2, 2026 11:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bump crash when version string exists more then once

1 participant