Skip to content

Better forwarding to improve read performance a little#484

Merged
liuzicheng1987 merged 3 commits into
getml:mainfrom
GregTheMadMonk:speedups
Aug 24, 2025
Merged

Better forwarding to improve read performance a little#484
liuzicheng1987 merged 3 commits into
getml:mainfrom
GregTheMadMonk:speedups

Conversation

@GregTheMadMonk
Copy link
Copy Markdown
Contributor

@GregTheMadMonk GregTheMadMonk commented Aug 19, 2025

Hello!

This isn't really a significant change, but it helps speed up rfl::json::read<rfl::Generic> almost twice in the release build (at least in the benchmarked case).

Sorry for not using your benchmark suite here, I had problems getting the vcpkg dependencies properly compiled with GCC15 that ships with my system.

I also tried to look into improving the debug build performance of rfl::to_generic, but couldn't find anything to improve there, at least not as simple.

Benchmark results (on my PC, Ryzen 3600)

In release mode

Before:

--------------------------------------------------------------
Benchmark                    Time             CPU   Iterations
--------------------------------------------------------------
read_struct               95.5 ns         94.4 ns      7332806
read_generic               645 ns          637 ns      1117257
write_struct               104 ns          103 ns      6884743
write_generic              144 ns          142 ns      4990100
to_generic                 115 ns          114 ns      6160175
to_generic_via_json        774 ns          765 ns       932490

After:

--------------------------------------------------------------
Benchmark                    Time             CPU   Iterations
--------------------------------------------------------------
read_struct               94.9 ns         93.7 ns      7444236
read_generic               367 ns          363 ns      1945679
write_struct               104 ns          103 ns      6845176
write_generic              132 ns          130 ns      5413376
to_generic                 119 ns          118 ns      5871778
to_generic_via_json        497 ns          491 ns      1442071

In debug mode

Before:

--------------------------------------------------------------
Benchmark                    Time             CPU   Iterations
--------------------------------------------------------------
read_struct               1461 ns         1445 ns       493132
read_generic              9450 ns         9337 ns        74690
write_struct               657 ns          649 ns      1092616
write_generic             1083 ns         1072 ns       646018
to_generic                3980 ns         3937 ns       175912
to_generic_via_json      10213 ns        10119 ns        69557

After:

--------------------------------------------------------------
Benchmark                    Time             CPU   Iterations
--------------------------------------------------------------
read_struct               1485 ns         1465 ns       483753
read_generic              7859 ns         7753 ns        91012
write_struct               665 ns          659 ns      1067324
write_generic             1073 ns         1063 ns       660308
to_generic                3925 ns         3899 ns       177444
to_generic_via_json       8641 ns         8539 ns        81955

Benchmark code

#include <benchmark/benchmark.h>

#include <string>

#include <rfl.hpp>
#include <rfl/json.hpp>

constexpr auto jsd = R"json({ "str": "hello", "i": 42 })json";

struct S { std::string str; int i; };

const S s_etalon{ .str = "hello", .i = 42 };
const rfl::Generic::Object g_etalon = [] () {
    rfl::Generic::Object ret;
    ret["str"] = "hello";
    ret["i"]   = 42;
    return ret;
} ();

using rfl::json::read;
using rfl::json::write;

void read_struct(benchmark::State& state) {
    for (auto _ : state) {
        benchmark::DoNotOptimize( read<S>(jsd).value() );
    }
} // <-- void read_struct(state)

void read_generic(benchmark::State& state) {
    for (auto _ : state) {
        benchmark::DoNotOptimize( read<rfl::Generic::Object>(jsd).value() );
    }
} // <-- void read_generic(state)

void write_struct(benchmark::State& state) {
    for (auto _ : state) {
        benchmark::DoNotOptimize( write(s_etalon) );
    }
} // <-- void write_struct(state)

void write_generic(benchmark::State& state) {
    for (auto _ : state) {
        benchmark::DoNotOptimize( write(g_etalon) );
    }
} // <-- void write_generic(state)

void to_generic(benchmark::State& state) {
    for (auto _ : state) {
        benchmark::DoNotOptimize( rfl::to_generic(s_etalon) );
    }
} // <-- void to_generic(state)

void to_generic_via_json(benchmark::State& state) {
    for (auto _ : state) {
        benchmark::DoNotOptimize( rfl::json::read<rfl::Generic::Object>(rfl::json::write(s_etalon)).value() );
    }
} // <-- void to_generic(state)

BENCHMARK(read_struct);
BENCHMARK(read_generic);
BENCHMARK(write_struct);
BENCHMARK(write_generic);
BENCHMARK(to_generic);
BENCHMARK(to_generic_via_json);

BENCHMARK_MAIN();

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @GregTheMadMonk, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces targeted optimizations to the rfl::json library, primarily focusing on improving read performance by leveraging C++11 move semantics more effectively. The changes aim to reduce unnecessary data copying, particularly within the rfl::json::readrfl::Generic path, leading to notable speedups in release builds. Additionally, the PR enhances error handling efficiency and optimizes memory allocation in parsing routines.

Highlights

  • Performance Enhancement through Move Semantics: Optimized data flow by consistently applying std::move and std::forward in parsing and error handling logic, significantly reducing object copying and improving read operation performance, as demonstrated by benchmarks.
  • Improved Error Object Handling: The rfl::Error class now supports move construction and assignment, allowing for more efficient propagation and storage of error messages without redundant string copies.
  • Optimized Variant Parsing Memory: Pre-allocation of memory for error vectors in rfl::Variant and std::variant parsing paths using errors.reserve helps mitigate reallocations and improve performance when multiple parsing errors occur.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several excellent performance improvements by leveraging move semantics and perfect forwarding more effectively across the parsing logic. The changes are well-implemented and the benchmark results clearly demonstrate significant speedups, especially for rfl::json::read<rfl::Generic>. I have found one potential issue regarding exception safety in a noexcept function that could lead to program termination under certain conditions. Please see the detailed comment for a suggested fix.

@liuzicheng1987
Copy link
Copy Markdown
Contributor

@GregTheMadMonk that sounds great, but some of the tests appear to be failing. Could you take a look?

@GregTheMadMonk
Copy link
Copy Markdown
Contributor Author

@liuzicheng1987 if you're talking about the compilation errors (-Werror=maybe-uninitialized), then I'm honestly kind of out of ideas for what may cause it, and I also observed it with main too. I had to add -w to my CMAKE_CXX_FLAGS in order to compile tests locally.

The "offending" piece of code in NamedTupleParser.hpp:

  static std::optional<Error> read_object_or_array_with_default(
      const R& _r, const InputObjectOrArrayType& _obj_or_arr,
      NamedTupleType* _view) noexcept {
    std::vector<Error> errors;
    const auto reader = ViewReaderWithDefaultType(&_r, _view, &errors);
    std::optional<Error> err; // <-- declaration here
    if constexpr (_no_field_names) {
      err = _r.read_array(reader, _obj_or_arr);
    } else {
      err = _r.read_object(reader, _obj_or_arr);
    }
    if (err) {
      return err; // <-- -Werror=maybe-uninitialized - why?!
    }
    if (errors.size() != 0) {
      return to_single_error_message(errors);
    }
    return std::nullopt;
  }
};

(and a similar function above it) appears safe and straightforward to me... I tried to reproduce the error in a similar scenario and didn't succeed: https://godbolt.org/z/4scKPvnnM (this also didn't produce the same -Werror for me locally)

My only two ideas would be either a) GCC gets confused and it's a misfire of this warning or b) there is something in the particular implementations of .read_object/.read_array used that perform a maybe-uninitialized access to the data which GCC sees but doesn't report as the operation that set the optional into that potentially incorrect state

@liuzicheng1987
Copy link
Copy Markdown
Contributor

Ok. Could you try to just set err = std::nullopt when you first declare it?

@GregTheMadMonk
Copy link
Copy Markdown
Contributor Author

It didn't help, sadly, for me at least. Same with replacing if (err) with if (err.has_value()):
image

@liuzicheng1987
Copy link
Copy Markdown
Contributor

@GregTheMadMonk , this should fix the issue. We'll wait for the tests to run through and then this is ready to be merged.

Thank you for the contribution!

@liuzicheng1987 liuzicheng1987 merged commit dcc4c1d into getml:main Aug 24, 2025
133 checks passed
@GregTheMadMonk
Copy link
Copy Markdown
Contributor Author

Thank you!

How did you come to this fix? Or was that the warning system in GCC has actually misfired and you had to do it by trial and error?

@liuzicheng1987
Copy link
Copy Markdown
Contributor

@GregTheMadMonk I had figured that the problem only occurs where we have a mutable optional variable, moreover I recall having seen something like this before. So that's how I figured it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants