Programming Junk: 2015

Sunday, August 9, 2015

logging debug information and continue at a given break point in gdb

When ever we put a break point it becomes a pain if it gets hit many many many times. In such case we can try couple of things without having to modify source recompile and run. one option is conditional break point. when that is not possible, other option is logging values and later search them in the file. we can do this by adding a command to the break point. when ever the break point gets hit given command will execute. That is cool. its cool because it allows us to log information (and do whatever ...). Here goes the example.

Example

list <string> ls;

ls.push_back("jp");

ls.push_back("mom");

ls.push_back("dad");

ls.push_back("grany");

ls.push_back("pip");

list<string>::iterator lb = ls.begin();

list<string>::iterator le = ls.end();

for (; lb != le; ++lb)

{

string val = (*lb);

cout << val<< endl;

}

then we add a break point at line number 23 and lets say its break point 1

the we do following in gdb prompt to add the commnad to log debug information

(gdb) command 1
>>set logging on
>>set logging file tfile
>>p val
>>set logging off
>>c
>>end
(gdb)

commad 1 - we are going to execute the command that we are about to define when the break point 1 is hit
set logging on - this will enable gdb logging
set logging file tfile - this will set the log file to be tfile
set logging off - this will disable logging - otherwise unwanted info will get added to log file
c - continue after logging is done - we dont need keep pressing buttons to continue
end - this says that we are don defining the command

now we just have to check the tfile after execution is done and then figure out the issue if possible :D

Saturday, July 4, 2015

Implementing a generic vector with C

Vector is a container that can grow as needed unlike arrays or the generic stack implementation that was published in an earlier post. So if you don’t know anything you better look at generic stack post. It discusses the pointer arithmetic with void*, general structure for generic containers and dynamic memory management responsibilities. This is only about new things that come up in memory management of the vector implementation.

First as usual the struct for the vector so that we can make sense of the functions.

#ifndef VECTOR_H_
#define VECTOR_H_
#include <stdio.h>
#include <stdlib.h>
typedef struct{
    int elementSize;
    int size;
    int count;
    void* items;
    void (*memclear)(void*);
}Vector;
void newVector(Vector* v,int
size, int elementSize,void (*memclear)(void*));
void add(Vector* v,void* item);
void get(Vector* v,int
position,void* destination);
void v_dispose(Vector* v);
void v_remove(Vector* v,int
position,void* destination);
#endif

Here elementSIze is the size of an item that would be inserted and size variable holds the current size of the vector. Count is number of items in the vector, items is a pointer to the first byte of memory in the heap allocated for our vector and memclear is a function pointer to deal with cleaning the memory of items held in the vector.

Now the struct is done. Although there are many functions I am only interested in only two they are add and remove functions, others are pretty much the same as in our generic stack. So, first let’s see what happens in the add function.

void add(Vector* v,void* item){
    if(v->elementSize+v->size<=v->count*v->elementSize){
       
v->size=v->size+v->elementSize;
       
v->items=realloc(v->items,v->size);
        add(v,item);
    }else{
        void* dest=(char*)
v->items+v->count*v->elementSize;
        memcpy(item,dest,v->elementSize);
    }
}

If block of the add function executes if we don’t have enough memory to hold the new item. Therefore we have to allocate enough memory to hold the new item and here I’m just allocating memory to hold one more item. A better way might be to double the current size of the vector so that frequent reallocation does not happen. Now the question is why use realloc instead of malloc function. The reason is quite simple; when we use realloc it would expand the size without relocating if there is enough contiguous memory blocks available on heap after the last byte of current allocation. This saves us from copying all items to new block and freeing the memory of older one. If there is not enough contiguous memory copying and freeing would be handled by the realloc function. The else block is simple enough right. Just copy the content of our new item to the next free block of memory in our vector.

Now let’s move to the remove function. Idea of the function is to remove an item from the given position, hand it back to the client so that client owns the item which implies that he is responsible for managing memory for that item and rearrange the vector so that there are no empty slots before the last element in the vector. So, here is the code.

void v_remove(Vector* v,int
position,void* item){
    if(position>=v->count){
        return;
    }
    void*
dest=(char*)v->items+v->elementSize*position;
    memcpy(dest,item,v->elementSize);
    if(v->count > position){
       
memmove(dest,(char*)dest+v->elementSize,(v->count-position)*v->elementSize);
    }
    v->count--;
}

First thing is we just return blindly if the given index is not filled in our vector using the first if block. Next if it is there we would copy the item to the given address using memcpy. Notice the pointer arithmetic with the void*. If you don’t understand it see the generic stack. So, now we have given it to the caller, then we should remove it from the vector. But if the item that we removed is not the last one in the vector it would create a haul or unused block of memory in the vector. Therefore we need to move or shift elements that are located after the removed element. That’s where the fancy little memmove function comes in. this function moves a block of memory. Of course we could have used memcopy in this situation, but memmove is fancier where we don’t have to worry about the overlapping of memory blocks. So, as a rule of thumb I usually consider using memcpy when we want to copy items in to a separate location where there is no possibility of overlapping.

That is that. If there are any errors pls let me know. Copilation happens gracefully when we use void type but there we cheat the compiler on its type system. Huh!

Friday, July 3, 2015

Implementing a generic C Stack

In an earlier post I included the header file and why all kinds of variables should be there in the struct of our stack. This post is about the implementation of the functions and their prototypes.

First lets look at the implementation and then we will see why this and that happens in a particular way.

#include <stdio.h>
#include <stdlib.h>
#include "stack.h"

void newStack(Stack* stack,int size,int elementSize,void
(*clearmem)(void* )){
   
stack->size=size;
    stack->currentIndex=0;
   
stack->items=malloc(size*elementSize);
   
stack->elementSize=elementSize;
   
stack->clearmem=clearmem;
}
void push(Stack* stack,void* item){
   
if(stack->currentIndex>=stack->size){
       
printf("not enough space");
        return;
    }else{
        void*
addr=(char*)stack->items+(stack->currentIndex*stack->elementSize);
       
memcpy(item,addr,stack->elementSize);
       
stack->currentIndex++;
    }
}
void pop(Stack* stack,void* destination){
   
if(stack->currentIndex<=0){
       
printf("sorry empty stack");
        return;
    }else{
       
stack->currentIndex--;
       
memcpy((char*)stack->items+(stack->currentIndex*stack->elementSize),destination,stack->elementSize);
    }
}
void dispose(Stack* stack){
    if(stack->clearmem!=NULL){
       
while(stack->currentIndex>=0){
           
stack->currentIndex--;
           
stack->clearmem((char*)stack->items+(stack->currentIndex*stack->elementSize));
        }
    }
   
free(stack->items);
}

Now the newStack function dosent do anything interesting instead it just initializes the stuff in our struct. The push function has some interesting stuff. The idea of the function is we copy everything from the starting address given with void* to the stack covering the length of a single item (we got it when we initialized our stack). Interesting thing here is how to compute the address within our stack memory block.

Void dosent associate size information. That is if you have int* I and you say i+1 you would get the address of the byte that is 4 bytes a way from the starting address. This happens because we know that int is 4 bytes long (normally). But since we are dealing with void first we cast it in to a char* so that it would be assumed as one byte long and pointer arithmetic would be like normal calculation. That is if we have char* c and we do c+1 it would give the address of the byte that is 1 byte away from the starting address.

Now to the pop function. Why that is needed and address to place the item in the stack instead of just returning the address within our stack. Well think of what happens when one push another item after the pop we would overwrite it. That’s not what we want right?. Other thing is the pointer arithmetic which is mentioned above.

Finally, the dispose function destroys the stack and its content. We are not responsible for freeing up memory for items that was poped instead we would do the cleanup for items that are still in the stack by using the memclear function given to us by our client at the initialization. After cleaning items we would free the memory that was allocated for our stack using free function.

Now its done

Thursday, July 2, 2015

Creating a generic C stack

Creating generic containers is little tough in c as compared to Java and C++. Templates in C++ would give you different containers for each type that you use where Java gives you only one container which is capable of holding any type due to its singly rooted object base class hierarchy. Generic containers in C are bit like Java containers in a sense that it would give you only one container as opposed to the container for each type in C++.

It is tricky because we have to deal with void type for variables where we lose the size information and malloc for memory allocation which only deals with raw number of bytes. But it is fun alright!

So first lets see the header file and see why this and that is there. I mean the purpose of having all kinds of stuff in header.

#ifndef STACK_H_
#define STACK_H_

#include <stdio.h>
#include <stdlib.h>

typedef struct {

    int size;

    int currentIndex;

    int elementSize;
    void* items;
    void
(*clearmem)(void*);
} Stack;

void newStack(Stack* stack ,int size,int elementSize,void
(*clearmem)(void* ));
void push(Stack* stack,void* item);
void pop(Stack* stack,void* destination);
void dispose(Stack* stack);
#endif

Let’s first look at the struct. The size variable defines max number of items that we are going to keep in our stack. The elementSize variable holds the size of a single item that would be kept in our stack. This is needed because we are going generic and we would be taking void type items, therefore we don’t have size information using lets say sizeof . The currentIndex keep track of the number of elements in the stack and will be used for pointer arithmetic to locate addresses of items in push and pop operations. Void* items refer to a memory location in the heap which points to the starting address of our stack. Last void (*clearmem)(void*) is a function pointer which knows how to free the memory used by items stored in the stack. This is needed because we don’t know how to free the memory if our stack is holding pointers to strings in heap or pointers to structs etc. if we only deals with data types such as int, double etc. this is not needed because it would totally be within our stack.

Let’s recap what we have at the moment. We have size and elementSize which collectively defines the amount of memory or number of bytes that is needed for our stack. Then we have currentIndex and items to find out addresses of the items inside the stack and elementSize tells us how far from these addresses that we need to consider as single item.

Wow! This is getting long. Function prototypes and implementation next time.

Note: this is an old post moved here from a different location.

Tuesday, June 30, 2015

C pointers and memory management for generic containers part 2

Previous post discussed pointer arithmetic and common elements that goes with any generic container. This post would look in to the memory management part and add another common element to the struct of the generic container.

Primary functions that are being used for dynamic memory management are malloc, realloc and free functions.

Malloc – void *malloc(size_t number) takes the number of bytes that needs to be allocated as the input parameter and return the starting address of the allocated block in heap.

Realloc – void *realloc(void* ptr, size_t size) takes a pointer returned by malloc and the required new size and returns the address of the block. Realloc is basically for resizing a block allocated by malloc function. This function extends the size if there is enough contiguous memory starting from the input pointer, if contiguous memory block starting at the input pointer is not available it would relocate the block to a new place and copy the content in the old block and return the starting address of the new block.

Free – void *free(void *ptr) free the memory and donate it back to the heap. We must free up the memory after we are done using it so that enough space is available for new memory requests.

So, when someone initialize our container we should allocate memory using malloc, if our container allows resizing like a Vector we should use realloc for resizing rather than malloc and copy and free sequence by ourselves and finally we should free the memory used by our stack when user want to dispose the container.

Its quite simple. But think of the disposal. If the data type stored in our container is something like int or double they would get vanish when we clear the memory block of our container. But if the data type is a pointer to some struct on the heap, we would only lose the references that were stored in our container and the memory for the struct will not be donated back to the heap. Therefore we need a way to clear them, but unfortunately we don’t know how so the user have to tell us how by giving us a function which would free the memory and we would use that function to free up memory at the disposal of our container.

So, we got a new element to the common list of things that a container struct need to maintain, that is a pointer to a function to free up the memory. If the container holds built in types such as int and float user could pass in NULL and we don’t have to worry about it.

When an item is inserted to the container I assumed that the ownership is given to us that is the reason that we had to worry about cleaning the things that were still there in the container at the disposal. The ownership also implies that we need to copy whatever the element inserted to the container using memcpy so that it is available throughout the lifetime of the container.

So now we have the common elements and pointer arithmetic knowledge. It is time for an example. First the simple generic stack, then the generic vector which also introduce the memmove function.

Note: This is an old post moved here from a different location.

C pointers and memory management for generic containers

Generics and C looks far apart. But if we look closely we only need to understand little simple stuff. Then we are good to go except for one reason these compile without much of a problem but it doesn’t mean anything! So, what are those few things? Well, they are pointer arithmetic, memory management and bit about data types.

If we think of a data type such as int or float they will have size (number of bytes) required to store the data which is known at the compilation time. So, if we declare a pointer to any of these data types we know that the data that we are interested in is located in particular size of contiguous memory block and the address of the starting byte of that block is known to us. This information is used to perform pointer arithmetic. Let’s say that we have following,

int* i;
i+1;

C pointer arithmetic

The first statement declares an int pointer. Currently it has garbage. But let’s say that it points to a some meaningful address (EX: 5 – so it points to the byte which is addressed 5). Then if we have a statement like second one, which calculates an address. So, the value produced by i+1 would be i+1*sizeof(int). If the int is 4 bytes long this would point to the 9 th byte. When we do this compiler knows to multiply the added value by size of an int so that it points to the next int. Now that is simple enough.

Pointer arithmetic which involves void*

But when we need to support generics we have to use void* so that we can point to any type of data. But as soon as we do that we lose the size information of the data that we are pointing which was implicitly know at compilation for any given type like int, double etc. Therefore we cannot perform pointer arithmetic with a void* as with other pointers. In generic containers we store many data items in the memory block allocated for our container and we generally only keep track of the starting address of the memory block for our container. So, in order to find starting address of individual data items we must do pointer arithmetic. This is done with the following commonly used hack.

void* a;
(char*) a +2;

Here we have a void* named a. Second statement first cast our pointer to a char*, this tells the compiler to treat “a” as pointer to a character. Now the size of a character is 1 byte. Therefore if we add 2 as in the second statement we are effectively doing normal arithmetic. That is if ‘a’ points to the 5 th byte we would produce 7, this happens because 2 is multiplied by 1 (size of the character).

So, now we can do pointer arithmetic with the void*, but how do we calculate address of individual elements. We need the size of element type that is in our container, which means that we need to have in our struct.

So, what are the things that we need to maintain in our struct for the container? Well, we can see four items up to now. They are,

Pointer to the starting address of the container

Size of a single element in our container (in bytes – to calculate addresses of individual elements)

Size of the memory block for container (in bytes – for boundary checking)

Count of the number of elements currently in the container (boundary checking and insertions).

There is one more thing that we need. You would guess that it is about memory management. Well, that’s right but it would be in the next post.

Note: This is an old post moved here from different location.

Saturday, June 20, 2015

cpp object size

If you have ever noticed that objects of empty classes have a non zearo size or variable reordering can change the size of the objects,
you probably wanted to figureout whats going on! Guess what, you are at the right place :D if you know this, you can probably write space efficient code.
So what are the things that affect a size of a object? as you would guess size of the non static member variables. But thats not all there are two other things. They are,
alignment requirements of those member variables and the alignment requirement of the struct or class (which depends on the alignment requirements of the member variables).
There are other things which are imposed by the langues. Those are order of the varibles, presence of virtual members and inheritance.

Size of empty objects
Now lets inspect what is the deal with non zearo size of the empty objects. The standard has the following.

Unless it is a bit-field (9.6), a most derived object shall have a non-zero size and shall occupy one or more bytes of storage.
Base class subobjects may have zero size. An object of trivially copyable or standard-layout type (3.9) shall occupy contiguous bytes of storage

Lets clarify what the above says with examples.

class A

{}; // size = 1



class B:public A

{}; // size =1;



class C

{   A a;  A b; }; // size = 2

Presence of virtual members

Now what happens when there are virtual functions? When virtual members are present each class that has them has virtual table and each instance of the of these classes contains a pointer to that table. Therefore the size would require the additional amout of pointer size irrespective of the number of virtual members. Here is an example.

class D

{   virtual void tt(){}; }; // size = 4



class E{ virtual void t1(){}; virtual void t2(){}; }; // size = 4

Alignment and Ordering of member variables

alignment requirements of data types are not enforced by the language rarther it is imposed by the processor architecture or OS. These requirements are

enforced for efficient data fetching from memory. The alignment requirement is influenced by the bus width. Processor can fetch a word in a single cycle,

that would be 4 bytes in a 32 bit machine. Alignments are enforced so that the processor can fetch data with minimum number of memory cycles. This would suggest

that alignment requirement of int would be 4 so that it would be stored in a single word rather than spread in two different words. Can you guess the alignment requirement

of a short type? it would be 2 otherwise if odd starting addresses were allowed it can spread to two different words requiring two memory cycles. What about a double?

"A double (eight bytes) will be 8-byte aligned on Windows and 4-byte aligned on Linux (8-byte with -malign-double compile time option)."

- http://en.wikipedia.org/wiki/Data_structure_alignment

You can try using google for the reason behind this type of differences :( similarly there are alignment requirements for structs as well. That is the alignment requirement of a

struct would be same as the alignment requirement of the member that has the largest requirement. That is if a struct has a int and a short alignment requirement of the struct would be

4(int has the largest alingment (4) in this struct ). This is needed to create array of struct because arrays elements occupy contigues memory without padding thus it could violate

alignments of memebers of the struct.

Having known these, now we can try to guess the size of different structs. Note that sizes and alignments of int = 4, short = 2, char = 1 in windows 32 bit machine. So lets go,

struct first{char a; int b;}; // size = 8

sizeof(char) + 3 byte padding (to aling int to multiple of 4) + sizeof(int)

struct first_c{int b; char a;}; // size = 8

sizeof(int) + sizeof(char) + 3 byte padding to make the size multipe of 4 (largest type is int)
This shows that reordering does not always improves the size.

struct second{char a; short b;}; // size = 4

sizeof(char) + 1 byte padding (to aling short to multiple of 2) + sizeof(short)

struct third{short a;int b;short c; int d;}; // size =16

sizeof(short) + 2 byte padding (to aling int to multiple of 4) + sizeof(int) + sizeof(short) + 2 byte padding (to aling int to multiple of 4) + sizeof(int)

Well how to reorder the elements so that the size is minimized? arrange them in the decending order of their aligning requirements. This way only the padding at the end of the

struct would be added to match the requirements of the struct and we can avoid padding in between the members.

Now can't the compilers do it by themselves. The standard has things like following to restrict the comilers.

Nonstatic data members of a (non-union) class declared without an intervening access-specifier are allocated so that later members have higher addresses within a class object.

The order of allocation of nonstatic data members separated by an access-specifier is unspecified (11.1)"

So thats about it! C U :D

Programming Junk

Pages