Skip to content

Memory Safe Cpp Subset

Wu edited this page Mar 30, 2024 · 3 revisions

Intro

A static analyzer can find bugs in c++ code, but it cannot analyze arbitrary c++ code. For code which cannot be proven right, the analyzer can either:

  • ignore it to avoid false positives
  • reject it to avoid true negatives

According to BS's opinion, the second way is preferred to make c++ really safe.

Related

New languages just somehow build the analyzer rules inside the compiler. As a concrete example, Jakt is a memory-safe systems programming language which transpiles to C++.

The rules are:

  • Automatic reference counting
  • Strong typing
  • Bounds checking
  • No raw pointers in safe mode

For pointers and references, jakt defines the following rules:

  • Classes in Jakt have reference semantics
  • Strings are builtin and ref-counted
  • StringView are allowed, but can only bind to static strings
  • Arrays are builtin and ref-counted
  • Slices of an Array keep the underlying data alive via automatic reference counting.
  • Reference types
    • Allowed in function parameters
    • Allowed in local scope
    • No references in structs
    • No references in return types

Jakt is safe, but somehow too strict. Compared to Rust, jakt adds more limitations to achieve safety. For c++, we can use the same way:

  • Define a simple subset of C++ which can be proven right by analyzer
  • Gradually improve the analyzer to allow more code patterns

Memory-safe C++

For simplicity, we will use cppsafe to refer to Memory-safe C++ subset and the analyzer.

The definition is based on C++ lifetime profile but add more restrictions.

Disallowed features

  • No bare new/delete
  • No owning pointers
  • No casts
  • No pointer arithmetics

Currently null and move are not considered.

Types

We divide all types into the following categories as building blocks of cppsafe's type system.

  • Owner: owns some T, T should not be Pointer or Aggregate with pointers inside.
  • Pointer: points to some T, T should not be Pointer or Aggregate with pointers inside.
  • Aggregate: as C++20's definition
  • Value: everything else, for class fields, Pointers are not allowed, either directly or indirectly(as a field of an aggregate)

Local scopes

Types are allowed as local variables. Cppsafe will track points-to-sets to achieve memory-safety. Specially, aggregates will be exploded into the scope.

See C++ lifetime profile for details.

Non-member functions

For Ret foo(A, B&, C*, Pointer*)

All aggregates will be expanded. We only consider pointer-correctness here.

Output

Output includes return values or pointer to pointers parameters. The rules are:

  • If Ret is an Owner or Value, ALLOW
  • If Ret is a Pointer, ALLOW, its pset will be derived from function arguments
  • If Ret is a Pointer to Pointer, BAN
  • If Ret is a Pointer to X, and there a Pointer to X in parameters, ALLOW

For example:

int foo(int*);  // ok
int* foo(int*);  // ok
std::string foo();  // ok

struct Aggr {
    int x;
    int* y;
};

Aggr foo(int*);  // ok, expands to (int, int*) foo(int*)
Aggr* foo(int*); // ban
Aggr* foo(Aggr*);  // ok
Aggr* foo(Aggr&, Aggr&);  // ok

std::vector<int*> foo();  // ban
std::vector<Aggr> foo();  // ban

struct Point {
    int x;
    int y;
};
std::vector<Aggr> foo();  // ok

Input

Input pointers are identified to derive the pset of output pointers.

For Ret foo(A), the rules are:

  • if A is Owner/Value/Pointer, ALLOW
  • if A is Owner&/Value&, ALLOW
  • if A is Pointer&, ALLOW
  • if A is Aggr, and has no inner pointers, ALLOW
  • if A is Aggr, and has direct pointers, ALLOW
  • if A is Aggr, and has indirect pointers, Partial ALLOWED, caller must ensure indirect pointers are valid and not used by output
  • if A is Aggr&, and has no inner pointers, ALLOW
  • if A is Aggr&, and has direct pointers, ALLOW
  • if A is Aggr&, and has indirect pointers, Partial ALLOWED, caller must ensure indirect pointers are valid and not used by output

For example:

void foo(int*);  // ok
void foo(std::vector<int>*); // ok

void foo(int**);  // ok
void foo(int*&);  // ok
void foo(const int*&);  // ok

void foo(string_view);  // ok
void foo(string_view*); // ok

struct Aggr1
{
    int x;
    int y;
};
void foo(Aggr1);  // ok
void foo(Aggr1*); // ok, expands to void foo(Agg1* agg, int* aggr.a, int*, aggr.b);
void foo(Aggr1**);  // ok, pointer to pointer

struct Aggr2
{
    int* x;
    int y;
};
void foo(Aggr2);  // ok, expands to void foo(int* aggr2.x);
void foo(Aggr2*); // ok, expands to void foo(int** aggr2.x, int* aggr2.y);
void foo(Aggr2**);  // BAN, pointers to pointers to aggregate with inner pointers.

struct Aggr3
{
    int* x;
    Aggr y;
};
void foo(Agg3);  // ok, expands to void foo(int* aggr3.x);
void foo(Agg3*); // ok, expands to void foo(Agg3* agg3, int** agg3.x, int* agg3.y.x, int* agg3.y.y);
void foo(Agg3**); // BAN

struct Aggr4
{
    Aggr3* x;
};
void foo(Agg4);  // ok, expands to void foo(Agg3*, aggr4.x, int** agg4.agg3.x, int* agg4.agg3.x.x, int* agg4.agg3.x.y)
void foo(Agg4*); // partial ok, callers must ensure all pointers are valid, callees should not use any inner pointers as output
void foo(Agg4**);  // BAN

Constructors

For Values, no inner pointers are allowed.

For Owners, don't analyze.

For Aggregates, no constructors are allowed.

For Pointers, *this will be an implicit output.

Member functions

For Values, no inner pointers are allowed.

For Owners, don't analyze.

For Aggregates, this as an implicit inout parameter. but pset won't change unless specified by annotation.

struct Aggr
{
    int* x;
    string_view y;

    void Update(int* a, string_view b);  // by default, nothing changes

    [[clang::annotate("lifetime_post", "this->x", "a")]]
    [[clang::annotate("lifetime_post", "this->y", "b")]]
    void Update2(int* a, string_view b);
};

For Pointers, annotations are required to state: add/erase/reset pset members.

struct [[gsl::Pointer]] ScopedExecutor
{
    template <Fn>
    [[clang::annotate("gsl::lifetime_extend")]]
    void Submit(Fn&& fn);  // pset(*this) += pset(*fn);

    [[clang::annotate("gsl::lifetime_reset")]]
    void Wait();  // pset(*this) = {global}
};

struct [[gsl::Pointer(int)]] Ptr
{
    void Foo(int* a, int& b);  // by default, replace, pset(*this) = {a, b}
};

Annotations

We can use annotations to state preconditions and postconditions.

TBD

  • Owners of Pointers
  • Containers
  • Use after move
  • Nullness

Clone this wiki locally