Skip to content

swallow out-of-range numeric timestamps in JSON parser#4

Open
HrachShah wants to merge 1 commit into
mainfrom
fix/json-parser-out-of-range-timestamp-2
Open

swallow out-of-range numeric timestamps in JSON parser#4
HrachShah wants to merge 1 commit into
mainfrom
fix/json-parser-out-of-range-timestamp-2

Conversation

@HrachShah

@HrachShah HrachShah commented Jun 9, 2026

Copy link
Copy Markdown
Owner

datetime.fromtimestamp raises ValueError (year out of range, e.g. 1e12 seconds = year 33658) and OverflowError (1e308 — larger than platform time_t) on numeric Unix timestamps in JSON log lines. _extract_timestamp's contract is 'returns Optional[datetime]', so a busted numeric field should be 'no timestamp', not 'the whole JSON line is unparseable'. The previous code let the exception escape parse(), which dropped the entire ParsedEntry (level, message, source, metadata) and surfaced a traceback to the CLI caller for what should be a per-line parse failure. Wrapping the call in try/except (ValueError, OSError, OverflowError) keeps the rest of the entry, sets timestamp to None, and lets the CLI analyze the other 10,000 lines in the file as usual.

Open in Capy

Summary by Sourcery

Handle out-of-range numeric Unix timestamps in JSON logs without failing entry parsing.

Bug Fixes:

  • Prevent datetime.fromtimestamp errors from bubbling up when JSON timestamp fields contain extremely large or otherwise invalid numeric values by treating them as missing timestamps.

Tests:

  • Add regression tests verifying that out-of-range numeric JSON timestamps yield entries with timestamp=None while preserving other fields.

Summary by CodeRabbit

  • Bug Fixes

    • JSON log parser now gracefully handles out-of-range or invalid numeric timestamps instead of crashing, allowing log processing to continue with the timestamp marked as unavailable.
  • Tests

    • Added test coverage for out-of-range numeric timestamps in JSON log parsing.

…mtimestamp raises ValueError (year out of range, e.g. 1e12 = year 33658, 1e15 seconds) and OverflowError (1e308 — larger than platform time_t); a negative timestamp can raise OSError on some platforms. _extract_timestamp is internal and its contract is 'returns Optional[datetime]', so a busted numeric field is just 'no timestamp', not 'the whole JSON line is unparseable'. The old code let the exception escape parse(), dropping the entire ParsedEntry (level, message, source, metadata) and surfacing a traceback to the CLI caller for what should be a per-line parse failure. Wrapping the call in try/except (ValueError, OSError, OverflowError) keeps the rest of the entry, sets timestamp to None, and lets the CLI analyze the other 10,000 lines in the file as usual
@sourcery-ai

sourcery-ai Bot commented Jun 9, 2026

Copy link
Copy Markdown

Reviewer's Guide

This PR hardens the JSON log parser so that out-of-range numeric Unix timestamps no longer cause parse() to fail the entire log entry, instead treating them as missing timestamps, and adds tests to cover the new behavior for various overflow cases.

Sequence diagram for handling out-of-range numeric timestamps in JSON parser

sequenceDiagram
    actor CLIUser
    participant LogAnalyzerCLI
    participant JSONLogParser
    participant datetime

    CLIUser->>LogAnalyzerCLI: analyze_logs
    LogAnalyzerCLI->>JSONLogParser: parse(line)
    JSONLogParser->>JSONLogParser: _extract_timestamp(data)
    JSONLogParser->>datetime: fromtimestamp(value)
    datetime-->>JSONLogParser: [raises ValueError/OSError/OverflowError]
    JSONLogParser-->>JSONLogParser: _extract_timestamp returns None
    JSONLogParser-->>LogAnalyzerCLI: ParsedEntry(timestamp=None)
    LogAnalyzerCLI-->>CLIUser: results for remaining lines
Loading

File-Level Changes

Change Details Files
Handle out-of-range numeric Unix timestamps gracefully in JSONLogParser._extract_timestamp
  • Expanded _extract_timestamp docstring to clarify behavior and edge cases for numeric Unix timestamps in JSON logs.
  • Wrapped datetime.fromtimestamp calls for numeric timestamps in a try/except block catching ValueError, OSError, and OverflowError.
  • Maintained existing seconds vs. milliseconds heuristic (values > 1e12 treated as milliseconds) while returning None instead of propagating exceptions when conversion fails.
src/log_analyzer_cli/parsers/json_log.py
Add regression tests for JSON numeric timestamps that overflow datetime.fromtimestamp
  • Introduced a new test that feeds JSONLogParser numeric timestamps that are too large for datetime.fromtimestamp, asserting parse() returns an entry with timestamp set to None rather than raising.
  • Covered both seconds and milliseconds representations that map beyond supported datetime ranges, as well as extremely large floats that overflow platform time_t.
  • Verified that other fields (level, message) are still parsed and normalized correctly even when timestamp is invalid.
tests/test_parsers.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR hardens JSON log timestamp extraction to gracefully handle numeric Unix timestamps that overflow datetime.fromtimestamp(). The implementation catches conversion errors and returns None; a test verifies the parser still yields a complete entry with a None timestamp instead of failing.

Changes

JSON timestamp overflow resilience

Layer / File(s) Summary
Timestamp error handling with safe conversion
src/log_analyzer_cli/parsers/json_log.py
_extract_timestamp wraps numeric timestamp conversions in try/except to catch ValueError, OSError, and OverflowError; on failure returns None instead of propagating exceptions. Docstring expanded to document out-of-range behavior.
Out-of-range numeric timestamp test
tests/test_parsers.py
New test method verifies that parse() gracefully handles extremely large numeric timestamp values, returning a non-None entry with timestamp set to None, while preserving level normalization and message content.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~20 minutes

Poem

A timestamp too vast for the clock to bear,
Returns None with graceful care,
No exception raised in the JSON night,
The parser hops on, forever bright! 🐰✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: hardening JSON timestamp extraction to gracefully handle out-of-range numeric values instead of raising exceptions.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/json-parser-out-of-range-timestamp-2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The 1e12 threshold used to distinguish seconds from milliseconds is a magic number; consider extracting it into a named constant (with a brief comment) so the intent and units are clear and easier to adjust later.
  • Swallowing ValueError/OSError/OverflowError and returning None makes malformed timestamps silent; consider emitting a debug-level log or other lightweight signal when this occurs to aid diagnosing bad log producers without breaking parsing.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The 1e12 threshold used to distinguish seconds from milliseconds is a magic number; consider extracting it into a named constant (with a brief comment) so the intent and units are clear and easier to adjust later.
- Swallowing ValueError/OSError/OverflowError and returning None makes malformed timestamps silent; consider emitting a debug-level log or other lightweight signal when this occurs to aid diagnosing bad log producers without breaking parsing.

## Individual Comments

### Comment 1
<location path="src/log_analyzer_cli/parsers/json_log.py" line_range="89-96" />
<code_context>
         for field in self.TIMESTAMP_FIELDS:
             if field in data:
                 value = data[field]
                 if isinstance(value, (int, float)):
-                    if value > 1e12:
-                        return datetime.fromtimestamp(value / 1000)
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Exclude booleans from the numeric timestamp path to avoid misinterpreting them as timestamps.

Since `bool` is a subclass of `int`, JSON `true`/`false` would pass this `isinstance(value, (int, float))` check and be treated as timestamps. To avoid converting boolean flags, tighten the condition, e.g. `isinstance(value, (int, float)) and not isinstance(value, bool)`.

```suggestion
                value = data[field]
                if isinstance(value, (int, float)) and not isinstance(value, bool):
                    try:
                        if value > 1e12:
                            return datetime.fromtimestamp(value / 1000)
                        return datetime.fromtimestamp(value)
                    except (ValueError, OSError, OverflowError):
                        return None
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines 89 to +96
value = data[field]
if isinstance(value, (int, float)):
if value > 1e12:
return datetime.fromtimestamp(value / 1000)
return datetime.fromtimestamp(value)
try:
if value > 1e12:
return datetime.fromtimestamp(value / 1000)
return datetime.fromtimestamp(value)
except (ValueError, OSError, OverflowError):
return None

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Exclude booleans from the numeric timestamp path to avoid misinterpreting them as timestamps.

Since bool is a subclass of int, JSON true/false would pass this isinstance(value, (int, float)) check and be treated as timestamps. To avoid converting boolean flags, tighten the condition, e.g. isinstance(value, (int, float)) and not isinstance(value, bool).

Suggested change
value = data[field]
if isinstance(value, (int, float)):
if value > 1e12:
return datetime.fromtimestamp(value / 1000)
return datetime.fromtimestamp(value)
try:
if value > 1e12:
return datetime.fromtimestamp(value / 1000)
return datetime.fromtimestamp(value)
except (ValueError, OSError, OverflowError):
return None
value = data[field]
if isinstance(value, (int, float)) and not isinstance(value, bool):
try:
if value > 1e12:
return datetime.fromtimestamp(value / 1000)
return datetime.fromtimestamp(value)
except (ValueError, OSError, OverflowError):
return None

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/log_analyzer_cli/parsers/json_log.py (2)

11-11: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Remove unused import.

The detect_log_level import is flagged by flake8 as unused. It should be removed to fix the CI pipeline failure.

🔧 Proposed fix
-from log_analyzer_cli.utils import detect_log_level
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/log_analyzer_cli/parsers/json_log.py` at line 11, Remove the unused
import detect_log_level from the top of the json_log parser module (the import
statement "from log_analyzer_cli.utils import detect_log_level"); simply delete
that import so flake8 no longer flags detect_log_level as unused while leaving
any other imports in json_log.py intact.

Source: Linters/SAST tools


27-27: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Fix formatting issues flagged by flake8.

Multiple formatting issues are causing CI failures:

  • Lines 27, 32, 62: Trailing whitespace (W291)
  • Line 63: Continuation line under-indented for visual indent (E128)
🔧 Proposed fix

Remove trailing whitespace on lines 27, 32, and 62, and fix the indentation on line 63:

     TIMESTAMP_FIELDS = [
-        "timestamp", "time", "ts", "`@timestamp`", "datetime", 
+        "timestamp", "time", "ts", "`@timestamp`", "datetime",
         "date", "created_at", "created", "logged_at"
     ]
     
     LEVEL_FIELDS = [
-        "level", "severity", "loglevel", "log_level", "lvl", 
+        "level", "severity", "loglevel", "log_level", "lvl",
         "priority", "importance"
     ]
-        metadata = {k: v for k, v in data.items() 
-                   if k not in self.TIMESTAMP_FIELDS + self.LEVEL_FIELDS + self.MESSAGE_FIELDS}
+        metadata = {k: v for k, v in data.items()
+                    if k not in self.TIMESTAMP_FIELDS + self.LEVEL_FIELDS + self.MESSAGE_FIELDS}

Also applies to: 32-32, 62-63

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/log_analyzer_cli/parsers/json_log.py` at line 27, Remove trailing
whitespace and fix the continuation indentation in the JSON log parser list of
timestamp keys: strip trailing spaces from the string list entries (the line
containing "timestamp", "time", "ts", "`@timestamp`", "datetime", and the later
multiline list entries) and align the continuation line so it is indented to
match the opening bracket (visual indent) — update the timestamp keys list
(e.g., TIMESTAMP_KEYS or timestamp_fields in json_log.py) accordingly to
eliminate W291 and E128 errors.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/log_analyzer_cli/parsers/json_log.py`:
- Line 11: Remove the unused import detect_log_level from the top of the
json_log parser module (the import statement "from log_analyzer_cli.utils import
detect_log_level"); simply delete that import so flake8 no longer flags
detect_log_level as unused while leaving any other imports in json_log.py
intact.
- Line 27: Remove trailing whitespace and fix the continuation indentation in
the JSON log parser list of timestamp keys: strip trailing spaces from the
string list entries (the line containing "timestamp", "time", "ts",
"`@timestamp`", "datetime", and the later multiline list entries) and align the
continuation line so it is indented to match the opening bracket (visual indent)
— update the timestamp keys list (e.g., TIMESTAMP_KEYS or timestamp_fields in
json_log.py) accordingly to eliminate W291 and E128 errors.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cd12ff06-d30e-4ce0-84c3-52ec2adbbbee

📥 Commits

Reviewing files that changed from the base of the PR and between e93757f and 3213155.

📒 Files selected for processing (2)
  • src/log_analyzer_cli/parsers/json_log.py
  • tests/test_parsers.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant