TCOM 401/CIS 551 on 01/17/2006
Notes:
The CERT/CC (Computer Emergency Response Team Coordination Center) described buffer overflow as the most critical software flaw causing vulnerabilities, like illegally executing system calls and deleting data, caused by the insecurity of C and C++ programming languages which do not enforce array bound checks.
A string in C can be visualized as an initialized group of contiguous memory which is terminated by null character (‘/0’). The memory following the null character is not initialized.
From the diagram available on the PPT slides, there are three parts in a C memory model. The RAM (Random Access Memory) can be visualized as a column of memory addresses (0 to 2^32 -1) with data entered at every address.
The lowest level (located at the higher memory addresses) in this RAM is occupied by the “Stack” which decrements the address allocated with every entry into the Stack. It holds the local variables and the control operation by storing the return address of a function.
Just above the Stack is the “Heap” which increments the address allocated with every entry into the Heap. It contains the dynamically allocated data structures.
Lastly there is the “Code” which is present above the Heap which contains the locally allocated data.
An example is described where a function ‘f()’ calls a function ‘g(parameters)’, in order to describe the flow of control and the allocation of memory on the RAM
Ideally, the ESP register/counter first points to the top of the function ‘f()’ stack frame. Remember the Stack allocation starts from the highest address allocated and then decrements as more data is added to Stack.
When the function ‘g(parameters)’ is called, the Input parameters occupy the memory immediately above the f() stack frame. The Return address, Base pointer (Frame pointer) and Local variable respectively occupy the lower addresses in the RAM while building the Stack.
The Base pointer tells the Stack pointer where to point to after resetting the Stack. The Return address stores the address that has to be executed next.
The next PPT slide explains how a buffer overflow attack may be carried out.
A function ‘g(char *text)’ may be called from function ‘f()’.
There is a local variable ‘buffer[128]’ with 128 bytes initialized in the called function.
A string copy instruction is carried out with the character array ‘text’ being copied to the ‘buffer’.
If ‘text’ has been initialized with an array of more than 128 bytes, then a buffer overflow occurs.
This attack can be easily visualized by the next PPT slide
First the f() stack frame is allocated memory in the RAM above which the ‘text’ is allocated memory, say 132 bytes which is clearly above 128 bytes. The Address slab indicated on the PPT slide lies within the 132 bytes of ‘text’ and is engineered to maliciously force the Return address to point to, after being overwritten by the buffer overflow attack.
The Return address, Base pointer and ‘buffer[128]’ are allocated the lower addresses above the ‘text’.
It is pretty clear now to understand what happens. The 132 bytes of ‘text’ is copied into the ‘buffer[128]’ which overflows and if engineered correctly, forces Return address to point to some random address or even worse to execute malicious system calls.
The ‘payload’ is code that is part of the data input into the buffer overflow. Having the “blame.c” source code can help create the code to put in the buffer by running the debugger to find the position so as to place the address that the Return address pointer must point to after the buffer overflow has occurred.
To ensure the success of the attack, the ‘payload’ can be constructed with a bunch of “NO-OP” operations, followed by the attack code, attack data and finally a bunch of the same address repeated multiple times. The address should point to the code earlier in the buffer (e.g. one of the NO-OPs in the 'landing pad').
More information about attack code can be retrieved from the ‘gcc’ and ‘gdb’ documentation. Remember, when using the Intel X86 processor that it is "Little Endian" assigned which means that the low-order byte of the number is stored in memory at the lowest address, and the high-order byte at the highest address.
C and C++ uses null terminated string representation and stores no string length information. Hence the assumption which makes buffer overflows possible is that strings will always have the null character (‘/0’) at the end.
Hence the first rule of thumb for security against buffer overflow attacks in C and C++ is the use of array size assigned instructions like ‘strncpy()’, ‘snprintf()’ and ‘fgets()’ over ‘strcpy’, ‘sprintf’ and ‘gets’.
There many tools available for C and C++ programming support like libsafe, Purify, Splint, Stackguard, Pointguard, etc.
In conclusion, use modern programming languages with garbage-collection ability like Java and C# instead of C and C++. Also latest versions of operating systems now enable the Stack pointer to be initialized to a (pseudo) random address. This makes buffer overflow attacks more difficult to execute malicious code, which have been engineered at a specific location in the ‘payload’.
-Amit Mohan Easow