FastCast

fast_cast is a modern C++ header-only library providing a high-performance polymorphic cast alternative to dynamic_cast. It accelerates runtime type conversions in polymorphic hierarchies while maintaining safety and correctness.

The code is heavily inspired by FastDynamicCast. This is effectively a modern C++ implementation of FastDynamicCast. This is the basic idea in a single image

Improvements from FastDynamicCast

The library also includes compile-time optimizations:

If a cast can be resolved with static_cast or identity, it avoids any runtime overhead.
Failed casts are cached to reduce repeated dynamic lookups.

Features

Pointer, reference, and std::shared_ptr casting.
Fast paths for:
- Exact types (identity cast)
- static_cast-safe conversions
Dynamic path with per-thread vtable offset caching.
Failed cast caching for repeated misses.
Header-only, lightweight, and requires only C++17+ (C++23 constexpr enhancements included).
Works with complex inheritance, including multiple and virtual inheritance.

Installation

Simply include the header in your project:

#include "fastcast.hpp"

No linking is required.

Usage

Pointer Casting

struct Base { virtual ~Base() = default; };
struct Derived : Base {};

Derived d;
Base* bp = &d;

Derived* dp1 = fast_cast<Derived*>(bp); // fast_dynamic_cast equivalent

Reference Casting

Derived d;
Base& br = d;

try {
    Derived& dr = fast_cast<Derived&>(br); // throws std::bad_cast on failure
} catch (const std::bad_cast& e) {
    std::cerr << "Invalid cast\n";
}

Shared Pointer Casting

auto sp_base = std::make_shared<Derived>();
auto sp_derived = fast_dynamic_pointer_cast<Derived>(sp_base);

Identity Casting

Derived* dp = &d;
auto same = fast_cast<Derived*>(dp); // trivial, no runtime overhead

Benchmarks

Run on a 16-core 4.768 GHz CPU (mean of 5 repetitions, -DCMAKE_BUILD_TYPE=Release). Reproduce with ./measure and regenerate the plots with benchmark/plot_results.py (see the benchmark directory). CPU frequency scaling was enabled, so sub-nanosecond figures are throughput-limited and should be read as "effectively free", not as precise latencies.

Where the speedup comes from: the cache

fast_cast caches one (vtable → offset) entry per (From, To) type pair. The first cast of a given dynamic type ("cold") falls through to a real dynamic_cast plus the caching bookkeeping; every subsequent cast of that same type ("hot") is just a load and a pointer adjustment. The honest comparison therefore has two regimes:

`ComplexA* → ComplexB*`	fast_cast	dynamic_cast
Cold (cache miss, varying types)	28.4 ns	28.0 ns
Hot (cache hit, repeated type)	0.56 ns	20.1 ns

On a cold call fast_cast is ~as fast as (marginally slower than) dynamic_cast — it does the same dynamic_cast and then records the result. The win is entirely in the hot path, where it is ~35–50× faster. If your workload casts wildly varying dynamic types and never repeats, fast_cast is not faster than dynamic_cast.

Hot-path latency (repeated casts of the same object)

Most real call sites cast the same handful of objects/types repeatedly, which keeps the cache hot:

Per call (hot)	fast_cast	dynamic_cast
Pointer success	0.56 ns	4.11 ns
Pointer failure	0.56 ns	8.49 ns
Reused reference	0.56 ns	4.14 ns

Failure caching makes repeated misses just as cheap as hits.

Throughput (2,000,000 reference casts, including object construction)

2,000,000 casts	fast_cast	dynamic_cast
Simple hierarchy	2.28 ms	9.21 ms
Complex hierarchy	2.43 ms	52.1 ms

The complex (virtual + multiple inheritance) hierarchy is where dynamic_cast is most expensive and the cache pays off most.

Static-safe path

Derived* → Base* is resolved at compile time, so fast_cast, static_cast, and dynamic_cast all measure ~0.12 ns — identical and effectively free.

Observations

Static-safe / identity casts are free, matching static_cast.
A first-time (cold) polymorphic cast is on par with dynamic_cast — fast_cast adds only a small constant of bookkeeping over it.
Repeated (hot) casts hit the cache and are roughly an order of magnitude faster.
Failure caching makes repeated invalid casts as cheap as successful hot casts.

Requirements

C++17 minimum (C++23 optional for constexpr enhancements)
Header-only, no linking required
Optional: Catch2 v2 (for tests; vendored as a single header at tests/catch.hpp, no fetch required), Google Benchmark (for benchmarks)

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.github/workflows		.github/workflows
benchmark		benchmark
tests		tests
.clang-format		.clang-format
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
CMakeSettings.json		CMakeSettings.json
LICENSE.txt		LICENSE.txt
README.md		README.md
fastcast.hpp		fastcast.hpp
ptroffset.png		ptroffset.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FastCast

Improvements from FastDynamicCast

Features

Installation

Usage

Pointer Casting

Reference Casting

Shared Pointer Casting

Identity Casting

Benchmarks

Where the speedup comes from: the cache

Hot-path latency (repeated casts of the same object)

Throughput (2,000,000 reference casts, including object construction)

Static-safe path

Observations

Requirements

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FastCast

Improvements from FastDynamicCast

Features

Installation

Usage

Pointer Casting

Reference Casting

Shared Pointer Casting

Identity Casting

Benchmarks

Where the speedup comes from: the cache

Hot-path latency (repeated casts of the same object)

Throughput (2,000,000 reference casts, including object construction)

Static-safe path

Observations

Requirements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages