swallow out-of-range numeric timestamps in JSON parser#4
Conversation
…mtimestamp raises ValueError (year out of range, e.g. 1e12 = year 33658, 1e15 seconds) and OverflowError (1e308 — larger than platform time_t); a negative timestamp can raise OSError on some platforms. _extract_timestamp is internal and its contract is 'returns Optional[datetime]', so a busted numeric field is just 'no timestamp', not 'the whole JSON line is unparseable'. The old code let the exception escape parse(), dropping the entire ParsedEntry (level, message, source, metadata) and surfacing a traceback to the CLI caller for what should be a per-line parse failure. Wrapping the call in try/except (ValueError, OSError, OverflowError) keeps the rest of the entry, sets timestamp to None, and lets the CLI analyze the other 10,000 lines in the file as usual
Reviewer's GuideThis PR hardens the JSON log parser so that out-of-range numeric Unix timestamps no longer cause parse() to fail the entire log entry, instead treating them as missing timestamps, and adds tests to cover the new behavior for various overflow cases. Sequence diagram for handling out-of-range numeric timestamps in JSON parsersequenceDiagram
actor CLIUser
participant LogAnalyzerCLI
participant JSONLogParser
participant datetime
CLIUser->>LogAnalyzerCLI: analyze_logs
LogAnalyzerCLI->>JSONLogParser: parse(line)
JSONLogParser->>JSONLogParser: _extract_timestamp(data)
JSONLogParser->>datetime: fromtimestamp(value)
datetime-->>JSONLogParser: [raises ValueError/OSError/OverflowError]
JSONLogParser-->>JSONLogParser: _extract_timestamp returns None
JSONLogParser-->>LogAnalyzerCLI: ParsedEntry(timestamp=None)
LogAnalyzerCLI-->>CLIUser: results for remaining lines
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
📝 WalkthroughWalkthroughThis PR hardens JSON log timestamp extraction to gracefully handle numeric Unix timestamps that overflow ChangesJSON timestamp overflow resilience
Estimated code review effort🎯 2 (Simple) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The 1e12 threshold used to distinguish seconds from milliseconds is a magic number; consider extracting it into a named constant (with a brief comment) so the intent and units are clear and easier to adjust later.
- Swallowing ValueError/OSError/OverflowError and returning None makes malformed timestamps silent; consider emitting a debug-level log or other lightweight signal when this occurs to aid diagnosing bad log producers without breaking parsing.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The 1e12 threshold used to distinguish seconds from milliseconds is a magic number; consider extracting it into a named constant (with a brief comment) so the intent and units are clear and easier to adjust later.
- Swallowing ValueError/OSError/OverflowError and returning None makes malformed timestamps silent; consider emitting a debug-level log or other lightweight signal when this occurs to aid diagnosing bad log producers without breaking parsing.
## Individual Comments
### Comment 1
<location path="src/log_analyzer_cli/parsers/json_log.py" line_range="89-96" />
<code_context>
for field in self.TIMESTAMP_FIELDS:
if field in data:
value = data[field]
if isinstance(value, (int, float)):
- if value > 1e12:
- return datetime.fromtimestamp(value / 1000)
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Exclude booleans from the numeric timestamp path to avoid misinterpreting them as timestamps.
Since `bool` is a subclass of `int`, JSON `true`/`false` would pass this `isinstance(value, (int, float))` check and be treated as timestamps. To avoid converting boolean flags, tighten the condition, e.g. `isinstance(value, (int, float)) and not isinstance(value, bool)`.
```suggestion
value = data[field]
if isinstance(value, (int, float)) and not isinstance(value, bool):
try:
if value > 1e12:
return datetime.fromtimestamp(value / 1000)
return datetime.fromtimestamp(value)
except (ValueError, OSError, OverflowError):
return None
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| value = data[field] | ||
| if isinstance(value, (int, float)): | ||
| if value > 1e12: | ||
| return datetime.fromtimestamp(value / 1000) | ||
| return datetime.fromtimestamp(value) | ||
| try: | ||
| if value > 1e12: | ||
| return datetime.fromtimestamp(value / 1000) | ||
| return datetime.fromtimestamp(value) | ||
| except (ValueError, OSError, OverflowError): | ||
| return None |
There was a problem hiding this comment.
suggestion (bug_risk): Exclude booleans from the numeric timestamp path to avoid misinterpreting them as timestamps.
Since bool is a subclass of int, JSON true/false would pass this isinstance(value, (int, float)) check and be treated as timestamps. To avoid converting boolean flags, tighten the condition, e.g. isinstance(value, (int, float)) and not isinstance(value, bool).
| value = data[field] | |
| if isinstance(value, (int, float)): | |
| if value > 1e12: | |
| return datetime.fromtimestamp(value / 1000) | |
| return datetime.fromtimestamp(value) | |
| try: | |
| if value > 1e12: | |
| return datetime.fromtimestamp(value / 1000) | |
| return datetime.fromtimestamp(value) | |
| except (ValueError, OSError, OverflowError): | |
| return None | |
| value = data[field] | |
| if isinstance(value, (int, float)) and not isinstance(value, bool): | |
| try: | |
| if value > 1e12: | |
| return datetime.fromtimestamp(value / 1000) | |
| return datetime.fromtimestamp(value) | |
| except (ValueError, OSError, OverflowError): | |
| return None |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/log_analyzer_cli/parsers/json_log.py (2)
11-11:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winRemove unused import.
The
detect_log_levelimport is flagged by flake8 as unused. It should be removed to fix the CI pipeline failure.🔧 Proposed fix
-from log_analyzer_cli.utils import detect_log_level🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/log_analyzer_cli/parsers/json_log.py` at line 11, Remove the unused import detect_log_level from the top of the json_log parser module (the import statement "from log_analyzer_cli.utils import detect_log_level"); simply delete that import so flake8 no longer flags detect_log_level as unused while leaving any other imports in json_log.py intact.Source: Linters/SAST tools
27-27:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winFix formatting issues flagged by flake8.
Multiple formatting issues are causing CI failures:
- Lines 27, 32, 62: Trailing whitespace (W291)
- Line 63: Continuation line under-indented for visual indent (E128)
🔧 Proposed fix
Remove trailing whitespace on lines 27, 32, and 62, and fix the indentation on line 63:
TIMESTAMP_FIELDS = [ - "timestamp", "time", "ts", "`@timestamp`", "datetime", + "timestamp", "time", "ts", "`@timestamp`", "datetime", "date", "created_at", "created", "logged_at" ] LEVEL_FIELDS = [ - "level", "severity", "loglevel", "log_level", "lvl", + "level", "severity", "loglevel", "log_level", "lvl", "priority", "importance" ]- metadata = {k: v for k, v in data.items() - if k not in self.TIMESTAMP_FIELDS + self.LEVEL_FIELDS + self.MESSAGE_FIELDS} + metadata = {k: v for k, v in data.items() + if k not in self.TIMESTAMP_FIELDS + self.LEVEL_FIELDS + self.MESSAGE_FIELDS}Also applies to: 32-32, 62-63
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/log_analyzer_cli/parsers/json_log.py` at line 27, Remove trailing whitespace and fix the continuation indentation in the JSON log parser list of timestamp keys: strip trailing spaces from the string list entries (the line containing "timestamp", "time", "ts", "`@timestamp`", "datetime", and the later multiline list entries) and align the continuation line so it is indented to match the opening bracket (visual indent) — update the timestamp keys list (e.g., TIMESTAMP_KEYS or timestamp_fields in json_log.py) accordingly to eliminate W291 and E128 errors.Source: Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@src/log_analyzer_cli/parsers/json_log.py`:
- Line 11: Remove the unused import detect_log_level from the top of the
json_log parser module (the import statement "from log_analyzer_cli.utils import
detect_log_level"); simply delete that import so flake8 no longer flags
detect_log_level as unused while leaving any other imports in json_log.py
intact.
- Line 27: Remove trailing whitespace and fix the continuation indentation in
the JSON log parser list of timestamp keys: strip trailing spaces from the
string list entries (the line containing "timestamp", "time", "ts",
"`@timestamp`", "datetime", and the later multiline list entries) and align the
continuation line so it is indented to match the opening bracket (visual indent)
— update the timestamp keys list (e.g., TIMESTAMP_KEYS or timestamp_fields in
json_log.py) accordingly to eliminate W291 and E128 errors.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: cd12ff06-d30e-4ce0-84c3-52ec2adbbbee
📒 Files selected for processing (2)
src/log_analyzer_cli/parsers/json_log.pytests/test_parsers.py
datetime.fromtimestamp raises ValueError (year out of range, e.g. 1e12 seconds = year 33658) and OverflowError (1e308 — larger than platform time_t) on numeric Unix timestamps in JSON log lines. _extract_timestamp's contract is 'returns Optional[datetime]', so a busted numeric field should be 'no timestamp', not 'the whole JSON line is unparseable'. The previous code let the exception escape parse(), which dropped the entire ParsedEntry (level, message, source, metadata) and surfaced a traceback to the CLI caller for what should be a per-line parse failure. Wrapping the call in try/except (ValueError, OSError, OverflowError) keeps the rest of the entry, sets timestamp to None, and lets the CLI analyze the other 10,000 lines in the file as usual.
Summary by Sourcery
Handle out-of-range numeric Unix timestamps in JSON logs without failing entry parsing.
Bug Fixes:
Tests:
Summary by CodeRabbit
Bug Fixes
Tests