Skip to content

Conversation

@taylorobyen
Copy link

Summary

NIFI-15305 Fixes PutDatabaseRecord's inconsistent timestamp parsing when handling epoch timestamps.

Epoch timestamps are supposed to be in milliseconds, but when fractional milliseconds are included in the timestamp it is incorrectly handled as seconds resulting in dates very far in the future.

Behavior prior to my patch:
Correct handling of a timestamp with only whole milliseconds:

{"ts": 1765056655230}

Timestamp loaded into PostgreSQL: 2025-12-06 16:30:55.230 -0500

Incorrect handling of a timestamp that includes fractional milliseconds:

{"ts": 1765056655230.746}

Timestamp loaded into PostgreSQL: 57902-06-03 07:20:30746. -0400

This bug was introduced in #8332, which was trying to allow for microsecond precision for epoch timestamps. Before #8332, the fractional milliseconds were truncated as the timestamp was always converted from String to long.

My implementation allows for microsecond precision while handling all epoch timestamps as milliseconds.

Tracking

Please complete the following tracking steps prior to pull request creation.

Issue Tracking

Pull Request Tracking

  • Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-00000
  • Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-00000

Pull Request Formatting

  • Pull Request based on current revision of the main branch
  • Pull Request refers to a feature branch with one commit containing changes

Verification

Please indicate the verification steps performed prior to pull request creation.

Build

  • Build completed using ./mvnw clean install -P contrib-check
    • JDK 21
    • JDK 25

Licensing

  • New dependencies are compatible with the Apache License 2.0 according to the License Policy
  • New dependencies are documented in applicable LICENSE and NOTICE files

Documentation

  • Documentation formatting appears as expected in rendered files

Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for proposing this improvement @taylorobyen.

On initial read, this appears to break expected behavior. As indicated in the changes for NIFI-12710, the parsing logic expects that a floating point number contains seconds before the period, and nanoseconds after the period.

The proposed changes appear to alter the behavior, expecting milliseconds instead of seconds before the period.

Can you provide more detail on the expected use case where the input field contains milliseconds.nanoseconds?

There may be options to support a change based on maximum expected numbers of seconds versus milliseconds, but current behavior should be preserved.

// Less precise timestamp than other tests as double is less precise than BigDecimal
final double timestamp = 1764673335503.607;

final BigDecimal bd = new BigDecimal(Double.toString(timestamp));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason for converting the timestamp double to a String, as opposed to just passing it to the BigDecimal constructor?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing a double directly to the BigDecimal constructor preserves the binary floating-point approximation rather than the intended decimal value. I’ve updated this to use BigDecimal.valueOf() instead.

@taylorobyen
Copy link
Author

taylorobyen commented Dec 25, 2025

If it is intended behavior to parse the timestamp differently based off the input, I think the reader controller services should have their descriptions updated. In NIFI-12710, it modifies the parsing under the hood, but nothing was updated description wise to indicate to the end user that this behavior has changed. When I upgraded my production instances of NiFi from 1.20.0 to NiFi 2.5.0 and noticed that my timestamps were now being parsed incorrectly the first place I checked was the JsonTreeReader timestamp field to see if parsing had changed. It wasn't until I checked out the project and viewed the source code that I found that I need to provide my milliseconds as integers to have my timestamps parsed correctly. The current description from JsonTreeReader only mentions milliseconds (other readers such as CSV also have the same description):

image

My use case for these timestamps is that I load JSON payloads into my PostgreSQL database which have timestamp fields generated in Python i.e:

ms = int(time.time() * 1000)

Which will generate something like 1766663148057.1487. To work around the current behavior, I'm rounding down to an integer so that the timestamps are parsed correctly.

Instead of inferring the epoch unit from the numeric value, it may be better to let the user explicitly specify the unit, defaulting to milliseconds to preserve existing behavior. For example, the timestamp format could accept epoch_seconds, epoch_millis, or epoch_micros. This avoids ambiguity and is less error-prone than inference.

@exceptionfactory
Copy link
Contributor

Thanks for the reply @taylorobyen.

After evaluating the details, the core of the current problem is that the conversion process does not handle floating point numbers, represented as strings, with the same level of sanity checking as long numbers.

It is important to preserve current parsing behavior in a way that also addresses the current issue, if at all possible. With that goal in mind, I put together an alternative pull request that addresses the core issue in #10697. If you are able to evaluate that approach and test it in your environment, that would be helpful for comparison.

@exceptionfactory
Copy link
Contributor

Thanks again for raising this issue and proposing an initial approach @taylorobyen. I'm closing this in favor the alternative in #10697 that preserves existing behavior, but handles this scenario. If you observe additional issues, feel free to raise a new Jira issue for evaluation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants