-
Notifications
You must be signed in to change notification settings - Fork 2
Memory Safe Cpp Subset
A static analyzer can find bugs in c++ code, but it cannot analyze arbitrary c++ code. For code which cannot be proven right, the analyzer can either:
- ignore it to avoid false positives
- reject it to avoid true negatives
According to BS's opinion, the second way is preferred to make c++ really safe.
New languages just somehow build the analyzer rules inside the compiler. As a concrete example, Jakt is a memory-safe systems programming language which transpiles to C++.
The rules are:
- Automatic reference counting
- Strong typing
- Bounds checking
- No raw pointers in safe mode
For pointers and references, jakt defines the following rules:
- Classes in Jakt have reference semantics
- Strings are builtin and ref-counted
- StringView are allowed, but can only bind to static strings
- Arrays are builtin and ref-counted
- Slices of an Array keep the underlying data alive via automatic reference counting.
- Reference types
- Allowed in function parameters
- Allowed in local scope
- No references in structs
- No references in return types
Jakt is safe, but somehow too strict. Compared to Rust, jakt adds more limitations to achieve safety. For c++, we can use the same way:
- Define a simple subset of C++ which can be proven right by analyzer
- Gradually improve the analyzer to allow more code patterns
For simplicity, we will use cppsafe to refer to Memory-safe C++ subset and the analyzer.
The definition is based on C++ lifetime profile but add more restrictions.
- No bare new/delete
- No owning pointers
- No casts
- No pointer arithmetics
Currently null and move are not considered.
We divide all types into the following categories as building blocks of cppsafe's type system.
- Owner: owns some T, T should not be Pointer or Aggregate with pointers inside.
- Pointer: points to some T, T should not be Pointer or Aggregate with pointers inside.
- Aggregate: as C++20's definition
- Value: everything else, for class fields, Pointers are not allowed, either directly or indirectly(as a field of an aggregate)
Types are allowed as local variables. Cppsafe will track points-to-sets to achieve memory-safety. Specially, aggregates will be exploded into the scope.
See C++ lifetime profile for details.
For Ret foo(A, B&, C*, Pointer*)
All aggregates will be expanded. We only consider pointer-correctness here.
Output includes return values or pointer to pointers parameters. The rules are:
- If Ret is an Owner or Value, ALLOW
- If Ret is a Pointer, ALLOW, its pset will be derived from function arguments
- If Ret is a Pointer to Pointer, BAN
- If Ret is a Pointer to X, and there a Pointer to X in parameters, ALLOW
For example:
int foo(int*); // ok
int* foo(int*); // ok
std::string foo(); // ok
struct Aggr {
int x;
int* y;
};
Aggr foo(int*); // ok, expands to (int, int*) foo(int*)
Aggr* foo(int*); // ban
Aggr* foo(Aggr*); // ok
Aggr* foo(Aggr&, Aggr&); // ok
std::vector<int*> foo(); // ban
std::vector<Aggr> foo(); // ban
struct Point {
int x;
int y;
};
std::vector<Aggr> foo(); // okInput pointers are identified to derive the pset of output pointers.
For Ret foo(A), the rules are:
- if A is
Owner/Value/Pointer, ALLOW - if A is
Owner&/Value&, ALLOW - if A is
Pointer&, ALLOW - if A is Aggr, and has no inner pointers, ALLOW
- if A is Aggr, and has direct pointers, ALLOW
- if A is Aggr, and has indirect pointers, Partial ALLOWED, caller must ensure indirect pointers are valid and not used by output
- if A is Aggr&, and has no inner pointers, ALLOW
- if A is Aggr&, and has direct pointers, ALLOW
- if A is Aggr&, and has indirect pointers, Partial ALLOWED, caller must ensure indirect pointers are valid and not used by output
For example:
void foo(int*); // ok
void foo(std::vector<int>*); // ok
void foo(int**); // ok
void foo(int*&); // ok
void foo(const int*&); // ok
void foo(string_view); // ok
void foo(string_view*); // ok
struct Aggr1
{
int x;
int y;
};
void foo(Aggr1); // ok
void foo(Aggr1*); // ok, expands to void foo(Agg1* agg, int* aggr.a, int*, aggr.b);
void foo(Aggr1**); // ok, pointer to pointer
struct Aggr2
{
int* x;
int y;
};
void foo(Aggr2); // ok, expands to void foo(int* aggr2.x);
void foo(Aggr2*); // ok, expands to void foo(int** aggr2.x, int* aggr2.y);
void foo(Aggr2**); // BAN, pointers to pointers to aggregate with inner pointers.
struct Aggr3
{
int* x;
Aggr y;
};
void foo(Agg3); // ok, expands to void foo(int* aggr3.x);
void foo(Agg3*); // ok, expands to void foo(Agg3* agg3, int** agg3.x, int* agg3.y.x, int* agg3.y.y);
void foo(Agg3**); // BAN
struct Aggr4
{
Aggr3* x;
};
void foo(Agg4); // ok, expands to void foo(Agg3*, aggr4.x, int** agg4.agg3.x, int* agg4.agg3.x.x, int* agg4.agg3.x.y)
void foo(Agg4*); // partial ok, callers must ensure all pointers are valid, callees should not use any inner pointers as output
void foo(Agg4**); // BANFor Values, no inner pointers are allowed.
For Owners, don't analyze.
For Aggregates, no constructors are allowed.
For Pointers, *this will be an implicit output.
For Values, no inner pointers are allowed.
For Owners, don't analyze.
For Aggregates, this as an implicit inout parameter. but pset won't change unless specified by annotation.
struct Aggr
{
int* x;
string_view y;
void Update(int* a, string_view b); // by default, nothing changes
[[clang::annotate("lifetime_post", "this->x", "a")]]
[[clang::annotate("lifetime_post", "this->y", "b")]]
void Update2(int* a, string_view b);
};For Pointers, annotations are required to state: add/erase/reset pset members.
struct [[gsl::Pointer]] ScopedExecutor
{
template <Fn>
[[clang::annotate("gsl::lifetime_extend")]]
void Submit(Fn&& fn); // pset(*this) += pset(*fn);
[[clang::annotate("gsl::lifetime_reset")]]
void Wait(); // pset(*this) = {global}
};
struct [[gsl::Pointer(int)]] Ptr
{
void Foo(int* a, int& b); // by default, replace, pset(*this) = {a, b}
};We can use annotations to state preconditions and postconditions.
- Owners of Pointers
- Containers
- Use after move
- Nullness