Pages

Tuesday, June 30, 2015

C pointers and memory management for generic containers

Generics and C looks far apart. But if we look closely we only need to understand little simple stuff. Then we are good to go except for one reason these compile without much of a problem but it doesn’t mean anything! So, what are those few things? Well, they are pointer arithmetic, memory management and bit about data types.

If we think of a data type such as int or float they will have size (number of bytes) required to store the data which is known at the compilation time. So, if we declare a pointer to any of these data types we know that the data that we are interested in is located in particular size of contiguous memory block and the address of the starting byte of that block is known to us. This information is used to perform pointer arithmetic. Let’s say that we have following,

int* i;
i+1;

C pointer arithmetic

The first statement declares an int pointer. Currently it has garbage. But let’s say that it points to a some meaningful address (EX: 5 – so it points to the byte which is addressed 5). Then if we have a statement like second one, which calculates an address. So, the value produced by i+1 would be i+1*sizeof(int). If the int is 4 bytes long this would point to the 9 th byte. When we do this compiler knows to multiply the added value by size of an int so that it points to the next int. Now that is simple enough.

Pointer arithmetic which involves void*

But when we need to support generics we have to use void* so that we can point to any type of data. But as soon as we do that we lose the size information of the data that we are pointing which was implicitly know at compilation for any given type like int, double etc. Therefore we cannot perform pointer arithmetic with a void* as with other pointers. In generic containers we store many data items in the memory block allocated for our container and we generally only keep track of the starting address of the memory block for our container. So, in order to find starting address of individual data items we must do pointer arithmetic. This is done with the following commonly used hack.

void* a;
(char*) a +2;

Here we have a void* named a. Second statement first cast our pointer to a char*, this tells the compiler to treat “a” as pointer to a character. Now the size of a character is 1 byte. Therefore if we add 2 as in the second statement we are effectively doing normal arithmetic. That is if ‘a’ points to the 5 th byte we would produce 7, this happens because 2 is multiplied by 1 (size of the character).
So, now we can do pointer arithmetic with the void*, but how do we calculate address of individual elements. We need the size of element type that is in our container, which means that we need to have in our struct.

So, what are the things that we need to maintain in our struct for the container? Well, we can see four items up to now. They are,

Pointer to the starting address of the container
Size of a single element in our container (in bytes – to calculate addresses of individual elements)
Size of the memory block for container (in bytes – for boundary checking)
Count of the number of elements currently in the container (boundary checking and insertions).

There is one more thing that we need. You would guess that it is about memory management. Well, that’s right but it would be in the next post.

Note: This is an old post moved here from different location.