Specifying SAFECode Type System in Maude
SAFECode is a system for
guaranteeing memory safety to programs written in imperative languages
like C, C++. This project aims to formalize the SAFECode system as a type extension to the C
type system (actually an extension to the LLVM type system similar
to the C type system). The language along with the typing rules are
specified in a rewriting logic based algebraic specification framework called Maude .
Maude specifications are executable giving us a type checker for our type system.
SAFECode Idea
The most novel aspect of SAFECOde is that it guarantees memory safety in the presence of dangling pointers to freed memory.
The main idea underlying SAFECode is that it is possible to divide the heap in to seperate regions (or pools) in the memory.
(This division is done statically using a context sensitive and flow insensitive pointer analysis called Data Structure Analysis).
This division is such that each pool contains only objects of the same C type, i.e. each pool is type homogenous.
The key principle that SAFECode uses to guarantee memory safety is the type homogeneity principle :
If we allocate an object on top of a previously freed memory object with the same alignment, then dereferencing dangling pointer to freed memory can not cause memory safety violation. This principle allows us to reuse memory, tolerate dangling pointer errors without any overhead. Since the pools identified by our pointer analysis are already type homogenous, we can directly apply the principle and get memory safety. More details about SAFECode are included in the journal publication "Memory Safety Without Garbage Collection for Embedded Applications".
Example
We explain the SAFECode transformations with the help of a C example. The program given below has usage of a dangling pointer to freed memory (This compiled with a normal C compiler would give a core dump).
main() {
int *x, y, z;
x = malloc(4);
*x = 20;
y = x ; //x & y point to the same element
*y = 30;
....
z = malloc(4);
*z = 50;
...
free(x); // free the element x
...
x = malloc(4) ; // allocate another element
*y = 40 ; // dangling pointer usage
....
free(z);
free(x);
}
The program after the SAFECode transformation. is given below. The pointer analysis algorithm used by our technique will identify that
x and z are not aliased and put them in different memory pools PD1 and PD2. Here each of the two pools is type homogenous and contains objects of type int. So we can use the type homogeneity principle to guarantee memory safety (but get admittedly unexpected result for programs with memory errors).
f() {
int *x, y, z;
PD1 = poolinit(sizeof(int)); // poolinit initializes a pool PD1
PD2 = poolinit(sizeof(int)); // this initializes another pool PD2
x = poolalloc(PD1) ; // poolalloc allocates an element out of the pool PD1
*x = 20;
y = x ;
*y = 30;
....
z = poolalloc(PD2); // allocates an element out of pool PD2
*z = 50;
...
poolfree(PD1, x); // free the element x
...
x = poolalloc(PD1) ; // allocate another element PD1
*y = 40 ; // dangling pointer usage
....
poolfree(PD2, z);
poolfree(PD1, x);
pooldestroy(PD1);
pooldestroy(PD2);
}
SAFECode Type System
This project aims to address the question, can we formally prove that SAFECode guarantees memory safety? Towards this end, we design a new language (syntax, typing rules and semantics)
to express and analyze programs after SAFECode transformations. With the help of this type system, we aim to formally prove the following results (stated informally) :
"If an expression in the program has a type t, it will continue to have type T after every step in the semantics" (preservation theorem) and "The program never gets stuck" (progress theorem).
These are called the progress and preservation theorems in the literature. The design and specification of the type system are the goals of this project. The proof of the theorems are still being worked out.
With this formal specification and the accompanied proofs of the theorems we get two important benefits.
-
A formal guarantee that SAFECode system is indeed memory safe.
-
We don't have to rely on the correctness of the complex pointer analysis for the safety of the program. Before running a program we can simply type-check it in the new type system and if it types, then the program is going to be "memory safe" at run-time.
We now explain the SAFECode type system with the help of the same example given before. The main difference between C and the SAFECode type system is that every pointer in the SAFECode language has a "region variable" attribute that indicates the region the pointer points to.
For example, (int@rho1 z) means that z is a pointer to int in region named rho1. This is adapted from the type system of
Cyclone . (Cyclone is also a region based type system, but with a major difference, it does not allow for reuse of memory within a region).
Example modified to match the new syntax
main() {
poolinit(rho1, int) ph1 { //rho1 is the name of the pool, ph1 is the region handle
poolinit(rho2, int) ph2 { //rho2 is the name of the pool, ph2 is the region handle
int@rho1 x, y; //x and y point to region rho1
int@rho2 z; // z points to region rho2
x = poolalloc(ph1, 1); //alocate one element of rho1
store x,20;
...
y = x ;
*y = 30;
....
z = poolalloc(ph2); // allocates an element out of pool rho2
*z = 50;
...
poolfree(ph1, x); // free the element x
...
x = poolalloc(ph1) ; // allocate another element out of rho1
*y = 40 ; // dangling pointer usage
....
poolfree(Ph2, z);
poolfree(Ph1, x);
}
}
}
Note that if we had a statement in the above program
y = z;
This wouldn't type check because the type of y is int@rho1 and type of z is int@rho2 and they are different! This will help us catch any bugs in the implementation of the pointer analysis.
Specification in Maude
We first specified the core LLVM syntax and semantics in LLVM.
You can find the specification here . Note it is not complete (see TODO List)
Specification of SAFECode extensions to LLVM. The Syntax and typing rule specfications are done, but the semantics is not yet done.
You can find the specification here
ToDo List
- For core LLVM semantics,
- Add function call support
- Add struct type support
- Add pointer arithmetic support
- Add stack allocation support
- For SAFECode system
- Semantics: Complete the semantics specification.
- Typing rules: Add typing rules for global variables, stack allocations, function calls, structs.
-
Functions and Structs have to be made polymorphic in region type (or region vars).
Example: A function f, defined as int f (int@rho p) has the universal type
((for-all rho) int@rho -> int). This can be instantiated at multiple places with different rho.
-
Once the SAFECode semantics are specified, prove the progress and preservation theorems for the SAFECode type system.
-
Instead of using the pointer analysis for inferring the region types,
investigate the use of type inference algorithms to determine the region attributes.