A Catalogue of Software Bugs-III: Bugs Grounded in Common Misconceptions About Compiler

Posted on March 22, 2011

1


This paper is co-authored by Vikas Kumar and Sanjay Goel. This article is the third in this series about software bugs.  The fourth part is A Catalogue of Software Bugs–IV: Bugs Grounded in Software Architecture.   Vikas can be approached at (http://www.linkedin.com/pub/vikas-kumar/a/658/500)

_______________________________________________________

In the two previous article, we  discussed about software bugs. We proposed that software bugs can be classified based on the source of the misconception.   These misconceptions can be related (i) programming fundamentals, (ii) operating systems resources, (iii) compiler, (iv) database, or (v) software design.

In this article we list some common software bugs that are grounded in misconceptions about compiler.  To elucidate bug code snippet in ‘C’ programming language is used.  In this catalogue, we don’t claim to have captured all the misconceptions related to complier.  Hence, this catalogues is only partial.

This section lists common bugs which occur due to compiler intervention while code generation. We categorize these bugs under bugs related to data structure padding, bugs related to source code optimization, and bugs related to Object Oriented Language Support provided by Compiler.

1.  Bugs related to Data Structure padding

Data structure padding is insertion of bytes between end of last data element and start of the next. Compiler does this to ensure aligned memory access. Unaligned memory access results on accessing unaligned memory address (Drake, 2007). A memory address is said to be unaligned if we try to access N bytes and address is not a multiple of N (i.e. addr % N != 0). Repercussion of unaligned memory access depends on implementation of OS and processor architecture being used. Unaligned memory access may cause system crash, system hang, incorrect behaviour or bad performance. Bugs related to data structure padding can occur in following scenarios:

1.1.  Access of structure members as raw memory

Byte(s) added because of structure padding by compiler may contain garbage data. Hence, accessing structure members as raw memory is likely to access the garbage data. Accessing garbage data can lead to surprising program behaviour. Also, accessing memory for structure assuming that there is no padding will result in unaligned memory access (Rentzsch, 2005).

1.2.   Structures used for message communication among different machines

If structures are used for message communication using socket between systems with different architecture, unexpected program behaviour may follow. This will happen, if on one machine structure padding is done and on other machine structure padding is not done. In this scenario, the machine not doing structure padding will expect no padding for structure while reading the structure. Thus, it will read incorrectly. Language primitive for directing compiler to not do structure padding can be used to avoid such scenario.

2.   Bugs related to source code optimization

Compiler does aggressive optimization of code segment in which it determines parameters to have constant value. However, pointer may point to shared resources or memory mapped input/output ports. In such scenario, language primitive such as ‘Volatile’ or ‘#pragma optimization=off’ to indicate compiler to turn off optimization can be used. Otherwise, unpredictable behavior will follow.

2.1 Accessing shared memory through pointer

In multi threaded application, a pointer can point to some shared memory location. If one thread is checking this pointer in a while loop, compiler can do optimization if the shared memory location is not being modified in the loop. But, other threads can access this location, so it’s quite possible that content of that location is being changed. Hence, this will result in inconsistent behavior.

2.2 Accessing memory mapped input/output ports of device

Device driver code for memory mapped input/output ports of device, checks memory bit to know the status for doing read/write operation. This is done using pointer to mapped memory for device. In this case, status bit changes its value dependent on event which is outside of program control. Here, compiler can be told to not optimize code section dealing with this type of pointer variable (Sacks, 1998).

3.   Bugs related to Object Oriented Language Support built in Compiler

Compiler intervention is required for supporting object oriented language features. Compiler augments source code for implementing features of object oriented language such as C++, Java, etc. Compiler intervention for Object oriented language features such as Composition, Inheritance, Polymorphism, Exception handling, etc is typically arcane. Thus, programmer can sometimes get surprises in behavior of object oriented systems. These surprises are mainly because their wrong notion of language feature support in compiler. Here, we list common bugs related to C++ language support in Compiler.

3.1.   Default Constructor Construction

It’s a common assumption among programmer that compiler always inserts and invokes default constructor in the absence of explicit default constructor in program. This is not true. Compiler does this only when the implementation of C++ language’s feature support guaranteed by compiler needs it. Even if the semantics of the program entails default constructor construction compiler would not insert and invoke it (Lippman, 1996, chap. 2).

class Link { public: int val; Link *pnext; };

void fun() {

// Program semantics need l’s members initialized

Link l;

if ( l.val || l.pnext )

// … do something

// …

}

In function ‘fun’ the ‘if’ statement requires object ‘l’ to be properly initialized. But, due to lack of explicit default constructor it does not happen, which may result in surprising program flow.

3.2.   Default Copy Constructor for class with pointer member

Default Copy Constructor construction by compiler happens in similar conditions as mentioned for default constructor construction (Lippman, 1996, chap. 2). For other conditions, compiler does bit wise copy of two objects involved. Hence, when a class object has a pointer member and no explicit copy constructor for allocating separate memory to each copied object, copying one object to another will lead to both objects sharing same memory location. When one the object’s lifetime ends, its destructor is called which will ideally free memory pointed by its pointer member. The other object whose lifetime may still persist, points to a freed region of memory and thus has a dangling pointer. Accessing of dangling pointer is will cause anomalous behaviour.

3.3.   Constructor for class with compiler generated internal members

For supporting language features, compiler augments a class objects with internal members. This augmentation is implicit and programmers unaware of this could unintentionally write code which conflicts with compiler intervention. This will result in surprising program behaviour.

class Base {

private: int a;

public:

Base() { memset( this, 0, sizeof(Base))}

virtual ~Base();

}

Class ‘Base’ has virtual destructor. For supporting polymorphism compiler, creates a table ‘vtbl’ containing virtual functions and other information required for Run Time Type Identification and Virtual Inheritance. A pointer ‘vptr’ pointing to virtual table ‘vtbl’ is added to class. The code for constructor is changed to

Base::Base() {

__vptr__Base = __vtble__Base;

memset(this, 0, sizeof(Base));

}

Through the ‘memset’ statement programmer has unintentionally undone the compiler generated code (Lippman, 1996, chap. 2).

3.4.   Erroneous reference counting due to Named Return Value optimization

Named Return Value (NRV) optimization obviates copy constructor invocation for returning a class object from a function (Lippman, 1996, chap. 2).

X xx = fun ();

X fun () {

X x;

//..

return x;

}

With NRV optimization, code is changed to:

X xx;

fun (xx);

fun (X& x) {

x.X::X();

//do processing on x itself

return;

}

If program maintains count of class objects in copy constructor by incrementing count in each call of copy constructor and decrementing in destructor, then in this case reference count will be incorrect as copy constructor is not called due to NRV.

3.5.   Initializing a class member with another class member using Member Initialization List

Oder of class member initialization in constructor with Member Initialization List depends on order of declaration of members in class and not on order in Member Initialization List. This notion is critical while initializing a class member with another class member (Lippman, 1996, chap. 2).

class X {

int i; int j;

public:

X( int val ) : j( val ), i( j ) {}

};

In this case ‘i’ is initialized before ‘j’ with ‘j’ value. Since ‘j’ has not been yet initialized ‘i’ gets garbage value.

3.6.   Memory leak due to incorrect usage of C++ Compiler supported language primitive

C++ language provides primitive for directing Compiler to augment code needed for supporting C++ features. For creating and destroying object on heap memory C++ provides new and delete operators. These operators not only do memory allocation and de-allocation but also calls constructor and destructor respectively for proper initialization and cleanup. Some common memory leak scenarios are:

3.6.1.   Lack of virtual destructor in Base class

If object of derived class is deleted with pointer of base class type and base class does not have virtual destructor then delete calls base class destructor while programmer expects derived class destructor to be called. Hence, if derived class object has pointer members then its memory will not be freed. However, if base class has virtual destructor then derived class destructor will be called through virtual mechanism.

Base *bptr = new Derived();

delete bptr;

// Base class without virtual destructor

Compiler transform above statement to

if ( 0 != bptr) {

Base :: ~Base (bptr);

free (bptr);

}

// Base class with virtual destructor

After Compiler transformation:

if ( 0 != bptr) {

(*bptr -> vptr [ 1 ] ) ( bptr );

free (bptr);

}

3.6.2.    Incorrect usage of delete for deleting array of class objects

Deleting array of class objects require special syntax with delete operator. This special syntax is an indication to compiler for invoking appropriate code required to delete array of class objects. If programmer uses delete without special syntax, only first element will be destructed and associated memory will be freed (Lippman, 1996, chap. 6). Hence, if class has pointer members then memory allocated to these pointer members will only be freed for first element while memory for other objects will leak.

Base *bptr = new Base [ 10 ];

// Delete with special syntax

delete [ ] bptr;

After Compiler transformation:

vec_delete (0, sizeof ( Base), no_of_objects, &Base :: ~Base);

3.6.3.  Throwing exception without proper cleanup

When exceptions are thrown stack unwinding takes place which deletes object created on stack. But, any memory allocated on heap must be explicitly freed in catch block. Otherwise, memory leak will occur.

3.7.   Deleting array of derived class objects with base class pointer

Deleting array of derived class objects with base class pointer is dangerous even if base class has virtual destructor. This is because of the way delete is implemented by compiler using ‘vec_delete’ function. Class object size is passed in ‘vec_delete’ function. The class depends on the type of pointer with which delete is invoked. The size of class object is used by ‘vec_delete’ to iterate over elements and call their destructor and release memory (Lippman, 1996, chap. 6). If this size is incorrect, pointer passed to delete for individual object, after first object will be incorrect and cause system crash. If derived class has extra member, this scenario will occur leading to system crash.

Base *ptr = new Derived [ 10 ];

//Dangerous way

delete [ ] ptr;

3.8.   Lifetime of compiler generated temporary objects

Compiler generates temporary objects to hold partial results of expression evaluation. These temporary objects are destructed after the expression evaluation (Lippman, 1996, chap. 6).

string s1,s2;

const char *s = (s1 + s2).c_str( );

After Compiler transformation

string temp;

operator + (temp, s1, s2);

s = temp.c_str();

temp.String::~String();

Now ‘s’ points to heap memory which has been freed. Accessing memory pointed by ‘s’ will be problematic. If this memory is allocated to some other variable for same task, ‘s’ will have incorrect value. If this memory is allocated to some other task memory access will result in system crash.

3.9.   Throwing an exception from destructor

Throwing an exception from destructor will cause program termination. This happens because when exception is thrown stack unwinding takes place. During stack unwinding destructor for local objects are called. If another exception is thrown from any of these destructors then C++ Runtime system will not able to track which exception to catch and hence terminate function is called.

____________________________________________________

Separate catalogue for bugs  grounded  in misconceptions about   software architecture shall be posted in a later article.   Programming community is invited to suggest the gaps in this catalogue.   Faculty members are encouraged to use this catalogue in their  courses.  We shall appreciate the feedback from working professionals,  faculty members,  and enthusiastic students.

Also Refer:

A Catalogue of Software Bugs-I: Bugs Grounded in Common Misconceptions About Programming


Software Development Activities: A Catalogue of Technical and Technically Oriented Activities

Software Developers’ Desired Competencies: A Comprehensive Distilled View

Advertisements
Posted in: Uncategorized