This paper is co-authored by Vikas Kumar and Sanjay Goel. This article is the second in this series about software bugs. The third part is . Vikas can be approached at (http://www.linkedin.com/pub/vikas-kumar/a/658/500)
In the previous article, we started discussing about software bugs. We proposed that software bugs can be classified based on the source of the misconception. These misconceptions can be related (i) programming fundamentals, (ii) operating systems resources, (iii) complier, (iv) database, or (v) software design.
In this article we list some common software bugs that are grounded in misconceptions about operating systems resouces and services. To elucidate bug code snippet in ‘C’ programming language is used. In this catalogue, we don’t claim to have captured all the misconceptions related to operating system resources and services. Hence, this catalogues is only partial.
These operating system related bugs are firther classified under categories of memory related bugs, synchronization related bugs, inter thread, or inter process related bugs.
1. Memory related bugs
Memory is critical for any useful real world program. Memory related bugs are classified under memory corruption, invalid memory access and memory leak.
1.1 Memory Corruption
Memory used in the program comes either from stack of current task or from heap of system. Thus, memory corruption bugs can be further classified as stack corruption and heap corruption.
1.1.1 Stack Corruption
Each task or thread has its own stack whose size is limited by value specified at the time of creation of task or thread. An activation frame is associated with each function call within current process. This activation frame contains enough memory for calling parameters, return address, saved registers and local variables. As function call sequence proceeds a stack of activation frame is built. Since activation frame contains return address and saved register values, any inadvertent tampering with this can result in anomalous program behavior. Some typical scenarios which results in stack corruption are:
184.108.40.206 Local Buffer overrun
This bug is quite common. A novice programmer may use following code to get user input.
If user enters characters which exceed 40 then program behavior will be unpredictable.
220.127.116.11 Returning pointer to a variable declared on stack
In this case the pointer being returned is not valid since its pointing to an activation frame of the function for which stack unwinding has taken place. Accessing this memory is not safe (Vipindeep & Jalote, 2005).
18.104.22.168 Declaring local storage which exceeds the size of stack of the process
Stack size of a process is determined at the time of creation of process. It is specified either by user or set in development environment. If local variable declared exceeds stack size of the process an exception of stack overflow will occur leading to system crash.
22.214.171.124 Function arguments passed are too large to be accommodated in stack
Sometimes, many function arguments are being passed by value which can again lead to stack overflow and cause system crash. To avoid this, function arguments can be passed by reference.
1.1.2 Heap Corruption
Variables declared on stack have limitation of being local in scope and its size must be determined at compile time. To overcome these limitations, dynamic memory allocation is frequently used in program that allocates memory from system heap. This memory is again limited by free available memory in the system and is managed by OS. For managing heap, OS performs bookkeeping of additional information. This entails that program must not violate restrictions or must fulfill the expectations of memory management functions provided by OS. Typical scenarios of heap corruption are:
126.96.36.199 Dynamically Allocated buffer overrun or underrun
char* input = (char*) malloc (40);
In this case if user enters more than 40 characters then those characters will be written to memory which may not belong to this task. In worst case, this will result in system crash. Even if the memory belongs to this task, it may contain bookkeeping data for other previously allocated memory. Thus, it will cause failure in subsequent freeing of the memory.
188.8.131.52 Freeing already freed memory
Once a dynamically allocated memory has been freed, heap manager may delete all data corresponding to it and add it to available free memory in system heap. Subsequently, this memory may be allocated to some function in same task or to some different task. In both cases, freeing this already freed memory will cause incorrect program behavior. In first case, any subsequent access of memory, after second free will give unpredictable output. In second case, freeing of already freed memory will result in system crash.
184.108.40.206 Freeing memory not allocated dynamically.
char* input = (char*) malloc (40);
In this scenario, memory address returned by ‘malloc’ system call has been changed and ‘free’ system call is called with changed address. This system call will fail because it expects bookkeeping data at the address provided by its argument, but since we have passed modified address, it may result in unpredictable program behaviour depending on data present at the passed address.
1.2 Invalid Memory Access
Accessing any memory which is not part of process address space or is uninitialized constitutes invalid memory access. OS maintains address space of a process. Checks for valid memory access are implemented through segmentation or paging or combination of both. If OS detects invalid access it terminates the process doing invalid access. The memory being accessed may belong to some other process. Thus, killing the process doing invalid memory access prevents memory corruption of the other process. Typical scenarios of invalid memory access are:
1.2.1 NULL pointer access
Accessing pointer with NULL value is accessing 0 memory location. Since, this location is never allocated to any process and is used by OS for its own purpose, accessing this pointer results in accessing memory outside its address space. Hence the process is terminated.
1.2.2 Uninitialized memory access
Often programmers forget to initialize variable used in program. This results in accessing memory which can have any value in it. But, code segment reading those variables expects some valid value. This violation of expectation can cause incorrect behaviour. Hence, it’s necessary that all variable are initialized with default valid values. Constructors in C++ are meant for this particular purpose.
1.2.3 Dereferencing pointer to freed memory
Once allocated memory is freed by a process, its address space may shrinks to exclude the freed memory. Subsequently, this freed memory may get associated to another process. Thus, dereferencing freed memory is not safe and can lead to invalid memory access.
1.3 Memory Leaks
Memory leak happens when memory allocated from heap is not returned to it, even when allocated memory is no longer required. Memory leak is a serious issue that can eventually bring down the system. When all heap memory is exhausted, any subsequent call for dynamic memory allocation will fail. Typically, memory leaks happen in following scenarios:
1.3.1 Memory allocated but not freed in all legs of error handling
int* a = (int*)malloc(10);
//Code to use a;
//Free not done in error leg
if(ERROR == fun1())
In above code, if fun1() returns error then allocated memory will not be freed, thereby causing memory leak.
1.3.2 Allocated memory handle changed
int* a = (int*)malloc(10);
//Code to use a;
//point a to something else
int k =10;
//dynamically allocated memory not freed
In above code, value of pointer to dynamically allocated memory is changed. Thus, free system call will not free allocated memory and memory leak will occur.
1.3.3 Freeing array of pointer holding dynamically allocated memory
char *a = (char**)malloc(10);
a[i] = (char*) malloc(10);
In this case, memory allocated by first malloc is being freed and memory allocated in ‘for’ loop is being leaked.
2. Synchronization related bugs
Correct and consistent synchronization is critical in multithreaded or multi process application for ensuring expected system behavior. Typical scenarios of synchronization related bugs are:
2.1 Lack of or inconsistent synchronization
A process accessing shared resource needs to check weather race conditions can occur while accessing shared resource: two or more threads updating shared resource at the same time. Consequently, in race conditions, shared resource will have garbled value. If such possibility exists then every access of shared resource must be synchronized. Otherwise, program behaviour will be inconsistent.
2.2 Lock acquired but not released in all scenarios
A process taking lock on shared resource must release the lock as soon as access to shared resource is over and must do so in all scenarios. Sometimes, programmer forgets to release lock in error handling section of code which may result in system hang.
2.3 Taking recursive lock which is not supported by OS
If implementation of lock makes process waiting for lock do busy-waiting (keep looping while lock is not available) then taking recursive lock will certainly result in deadlock. This is the case with spin locks of Linux kernel (Love, 2007, chap. 9) and fast mutex provided by “pthread” library (Mitchell, Oldham & Samuel, 2001, chap. 4).
2.4 Taking recursive lock which is supported by OS but not unlocking correspondingly
Recursive mutex provided by “pthread” library remembers how many times it has been locked by a thread that holds the lock (Mitchell et al., 2001, chap. 4). If that thread does not unlock that many times, the mutex remains locked. Any other thread trying to get the mutex will keep waiting for it until it is unlocked by owner thread. Hence, this scenario can lead to starvation of other threads.
2.5 Taking lock and not checking return value for success or failure
Some library provides locking primitive which returns error or success on trying to get lock e.g. error checking mutex of “pthread” library (Mitchell et al., 2001, chap. 4). If programmer does not check return value on taking lock and take appropriate actions program behavior will be inconsistent. When lock is attained successfully program will run correctly, but on error conditions program result will be inconsistent.
3. Inter process or inter thread related bugs
Thread process has only bare minimum resource necessary to maintain its own context. These include stack, registers, scheduling information, pending or blocking signals and thread specific data. All other resources are shared by it with other threads of same parent and parent process. Hence, parent process and its thread must work synergistically. Any erroneous shared access can cause unpredictable program behavior. Typical scenarios for corruption of shared resources among parent process and other threads are:
3.1 Global File or socket identifier closed by one thread and being accessed by other threads or parent process
If one thread closes global file or socket identifier and other threads or parent continue accessing through same identifier then theses access will be unsuccessful. Proper error handling can detect such scenarios.
pthread_create (&thread_id, NULL, &thread, NULL);
if(EOF == fputs(“Hello”,fp))
Above program elucidates the case of a thread closing global file pointer, and parent process oblivious of any such possibility. Thus, file operations by parent process may fail.
3.2 Heap memory corrupted by one thread result in malfunctioning of other some other thread
Heap memory of different threads of same process is accessible to each other and parent process. Hence, a thread can possibly write to heap memory allocated in some thread. Often, this method is used for inter thread message passing. But, sometimes, garbled pointer of one thread can unintentionally corrupt heap memory of some other thread.
3.3 Memory leaks through zombie process
When a process or thread is spawned, resources are allocated for maintain its context. Once their execution finishes, parent process should do clean up of allocated resources. If parent process does not do this, the spawned process or threads remains in system as zombie process consuming system memory for its data structure. If process is being spawned in a loop with no clean up happening when it finishes, system will gradually have many zombie processes. Consequently, system memory will keep going down with eventual system hang.
Separate catalogues for bugs grounded in misconceptions about compiler and software architecture shall be posted in later articles. Programming community is invited to suggest the gaps in this catalogue. Faculty members are encouraged to use this catalogue in their courses. We shall appreciate the feedback from working professionals, faculty members, and enthusiastic students.