Small, header-only library for serializing arbitrary objects to and from raw bytes. Using C++26 static reflection for a serialization that just works, without any boilerplate code.
Important
This project makes use of C++26 static reflection. C++26 is still in the future, and regular compilers do not support static reflection yet. The library was developed using a fork of Clang that has experimental support for reflection. Once C++26 is finalized and compilers start to support it, this little library will be more useful in daily use. Until then, it is just a little toy project for showcasing the awesomeness of static reflection.
Consider a typical class with private members of different types, sub-objects and maybe one or two base classes that bring their own state:
class Vehicle : public Entity, TaskComponent
{
public:
// ...
private:
int m_typeID;
std::vector<std::string> m_labels;
std::unique_ptr<Engine> m_engine;
};Everything well scoped and nicely encapsulated. And everything is just fine, until you have to serialize this complex class, for example to send it across the network, or to take a snapshot and save it on disk. Now you have to somehow access and collect all that encapsulated state, and not only of that one class, but of all its base classes and sub-components.
This usually ends up with a lot of tedious, boilerplate code or weird accessors that enumerate all the different members.
With C++26 static reflection, this becomes much easier. You don't have to change anything about your class, you don't have to write any additional code, and the following snippet just works:
Vehicle vehicle = /*...*/
theg::serialization_vector_sink sink;
sink.write(vehicle);That's it. Using static reflection, the library traversed and accessed all members of the class, all of the base classes and their members, looped through every container and condensed all that state into a series of raw bytes. You can now do whatever you want with those bytes and eventually re-construct the original object.
std::vector<std::byte> bytes = sink.release();
// Send the bytes across the network, save it to disk, store it in memory,
// and then eventually:
theg::serialization_data_source source(bytes.data(), bytes.size());
auto vehicle = sink.read<Vehicle>();- Serializes and de-serializes arbitrary objects, without any boilerplate code
- Built-in support for all standard containers & Co (
std::string,std::map,std::variant, etc.) - Can be extended to support custom "primitives" (e.g., custom string types or containers)
- Can serialize and de-serialize members like handles or raw pointers by providing additional logic for resolving them
- Basic versioning to read older versions of classes / structures and to automatically convert them to the expected version
- Optimizes serialization by e.g. writing vectors of trivially serializable types in one single go (automatically detects trivial serializability of classes and structures)
For serializing objects, you need a serialization_sink and for de-serialization you need a serialization_source. The library provides two very basic sink/source implementations that use either a flat, in-memory storage of bytes (serialization_vector_sink / serialization_data_source) or read/write directly to a binary file (serialization_file_sink / serialization_file_source). You can implement your own sinks and sources by deriving from the base classes, for example to write directly to a network stream or a pipe.
Every serialization_sink offers a write method to which you can simply pass any object. If the provided type cannot be serialized (see Limitations), you will get a compile-time error.
#include <theg/serialization.hpp>
#include <theg/serialization/file_sink.hpp>
theg::serialization_file_sink sink("snapshot.bin");
int someInteger;
MyData someStruct;
std::unordered_map<int, std::unique_ptr<Vehicle>> someContainer;
sink.write(someInteger);
sink.write(someStruct);
sink.write(someContainer);serialization_source offers some variations of read methods via which you restore objects from the underlying byte stream.
#include <theg/serialization.hpp>
#include <theg/serialization/file_source.hpp>
theg::serialization_file_source source("snapshot.bin");
auto someInteger = source.read<int>();
auto someStruct = source.read_unique<MyData>(); // Same read, but constructed on the heap
std::unordered_map<int, std::unique_ptr<Vehicle>> someContainer;
source.read(someContainer); // Restore state into an existing object (in-place)If you call source.read<MyClass>(), the library needs to be able to construct your class. So if your class is not default-constructible already, this call will cause a compilation error. In this case, you can add a constructor to your class that will be used only by the de-serialization:
#include <theg/serialization/construct.hpp>
class MyClass
{
public:
explicit MyClass(theg::serialization_constructor) {} // No need to actually do anything, state will be initialized by the de-serialization
};This is often the only adaptation you need to make to support de-serialization.
The serialization library essentially supports some primitive, basic types (int, std::string, std::vector, etc.) and any compound of those primitives (std::vector<std::string>, structs, classes). Which is already surprisingly versatile. But there are some limitations to serializing objects:
- In general, it is not possible to serialize raw pointers
T*or shared pointersstd::shared_ptr<T*>, because they can reference objects that might be also referenced or even owned by other objects outside the serialized compound. So when re-constructing the object, it is not possible to restore a coherent state that aligns with the rest of your application.std::unique_ptrare supported, because they represent a fully owned sub-component (like a regular member, just at a seperate memory location)- You can add your own custom support for resolving raw pointers, shared pointers or handles
- As noted above, you have to make sure to restore objects from a
serialization_sourcewith the correct types and in the correct order. The library cannot dynamically restore arbitrary types and does not check for type correctness.- A small exception from that is type versioning
- If you use your own, custom primitive types (like custom string types or containers), you have to register them with the library to support serialization
The library requires specific support for "primitive" types like std::string, std::variant or std::map. They are composed of raw pointers or require special construction and traversal. The library comes with support for most std types, but you can add support for your own primitives by specializing theg::serialization_traits. A trait specialization has to provide the following methods:
- A
writemethod for serializing the type - At least one
readmethod for de-serializing the type (either constructing or in-place, see example below) - Optionally, a
read_uniquemethod that constructs the type on the heap (useful for types that are regularly constructed on the heap or that require heap construction)
#include <theg/serialization/traits.hpp>
template <>
struct theg::serialization_traits<MyCustomString>
{
template <class Serialization>
void write(const MyCustomString& string, Serialization serialization)
{
// Either dispatch to lower-level primitives or write raw bytes
serialization.write(string.size()); // First, write a `size_t`
serialization.write_bytes(string.data(), string.size()); // Then write the raw bytes
}
// In-place read into existing object
template <class Deserialization>
void read(MyCustomString& string, Deserialization deserialization)
{
// Either dispatch to lower-level primitives or read raw bytes
auto size = deserialization.read(theg::type<size_t>); // Use type tags to read a specific type
string.resize(size);
deserialization.read_bytes(string.data(), size); // Read raw bytes
}
// Constructing read into new object
template <class Deserialization>
MyCustomString read(theg::type_t<MyCustomString>, Deserialization deserialization) // Take matching type tag as first parameter
{
// Either dispatch to lower-level primitives or read raw bytes
auto size = deserialization.read(theg::type<size_t>); // Use type tags to read a specific type
MyCustomString string(size); // Construct new string
deserialization.read_bytes(string.data(), size); // Read raw bytes
return string;
}
// Constructing read into new, heap-allocated object (useful for some types, not so much for a string)
template <class Deserialization>
std::unique_ptr<MyCustomString> read_unique(theg::type_t<MyCustomString>, Deserialization deserialization) // Take matching type tag as first parameter
{
// Either dispatch to lower-level primitives or read raw bytes
auto size = deserialization.read(theg::type<size_t>); // Use type tags to read a specific type
auto string = std::make_unique<MyCustomString>(size); // Construct new string
deserialization.read_bytes(string->data(), size); // Read raw bytes
return string;
}
};By default, the library cannot serialize / de-serialize raw pointers, shared pointers, or similar types like handles. They require additional knowledge of how they are used and how to re-construct them.
This additional knowledge can be provided to the library via additional, "stateful" traits that take care of resolving these pointers:
- When writing to a sink or reading from a source, provide an additional object that can use internal state
- That object implements
readandwritemethods just like regular traits (see above)- It writes by resolving pointers to some serializable type (e.g. to a unique ID)
- It ready by resolving the serialized type back to the raw pointer (e.g. by looking up that ID)
- The object can be a compound of multiple traits / capable of resolving multiple types
You can find a full example here, but the usage will look something like this:
std::vector<MyObject*> vectorOfRawPointers;
MyObjectResolver resolver(&someState); // Construct a resolver, usually by providing some internal state that is required to perform the resolution
sink.write(vectorOfRawPointers, &resolver); // The resolver will be used whenever the serialization encounters a type that can be handled by it
// ...
source.read(vectorOfRawPointers, &resolver); // Same applies to the de-serializationThe library has very basic support for versioning types, so that you can still read byte streams from an older version of your code into some new object / struct layout.
The idea is to maintain the old layout in your code, and to use the THEG_SERIALIZATION_VERSION macro to establish a versioning between new and old layout:
struct MyData_v0
{
int value;
char str[32]; // This will surely be enough
THEG_SERIALIZATION_VERSION(0, void); // Mark this version 0, with no previous type
};
struct MyData
{
int value;
std::string str; // -.-
THEG_SERIALIZATION_VERSION(1, MyData_v0); // Mark this version 1, with a reference to the previous type
explicit MyData(const MyData_v0& old) : value(old.value), str(old.str) {} // Provide a conversion from previous to this version
};
// Now you can read `MyData` from a byte stream just as usual, and if that stream contains the old layout, it will automatically
// be converted to the new layout.
auto myData = source.read<MyData>();You can find a full example here
The versioning supports multiple versions of the same type, and can automatically convert all the necessary conversion through multiple versions. But it does come with some limitations:
- Your type needs to be versioned right from the beginning, if you add
THEG_SERIALIZATION_VERSIONonly after you changed to a new layout, the library will not be able to read old streams (because the old stream is missing the version information). - The automatic conversion only works from old type to new type (you can't read an old layout from a stream that contains a newer layout).
Some types do not encode actual state that can be serialized in a meaningful way. They can even cause issuse when serialization is attempted. Instead, they should simply be skipped and ignored for serialization purposes. Typical examples are mutexes or runtime caches.
Such types can be registered as "transient types":
#include <theg/serialization/traits.hpp>
template <>
struct theg::is_transient_serializable<MyMutex> : std::true_type {};- Add automatic serialization / de-serialization to text formats based on member names (e.g. JSON)
- Add more checks to detect type mismatches and similar errors (will add more overhead to the stream)
- Add CMake installation
I used the Bloomberg P2996 fork of LLVM for developing this library. I had no issues with the compiler and use it for personal projects, but it is considered experimental and not production-ready.
On Linux, you can perform the Bootstrapping build of LLVM and use the entire toolchain.
On Windows, you need to build clang-cl. I use the clang-cl compiler with the VS / MSVC standard library, with an adapted version of the P2996 header. This works best for me, and if you want to do the same, you have to define THEG_CLANG_P2996_MSVC_META_WORKAROUND=1 in your build environment. This will cause the library headers to include a built-in header and import its definitions into the theg::detail::meta namespace.