C and Assembly - Subtleties
Aug 25, 2019
4 minute read

    Should I write this

        int flag = 0;		// Boolean value, can only be 0 or 1
        while(some_condition){
            // …
            if(flag == 0)
                    flag = 1;
            // …
        }
    

    or

        int flag = 0;
        while(some_condition){
            // …
            flag = 1;
            // …
        }
    

    what performs better?

    While I was writing some C code I came up to this little question. In a while loop at a certain point we have to set a boolean flag’s value to 1. So, should we check it’s value first and eventually assign it to 1 or just assign it to 1 without checking first?

    One misleading reasoning could be: better the first version since we don’t make an assignment if it’s not necessary!

    Let’s take a look at the assembly code generated by the two versions.

    First version - Check and eventually set

        int main(){                                        movl    $42, -8(%rbp)
                int condition = 42;                        movl    $0, -4(%rbp)
                int flag = 0;                              jmp     .L2
                while(condition){                  .L3:
                        if(flag == 0)                      cmpl    $0, -4(%rbp)
                                flag = 1;                  jne     .L2
                }                                          movl    $1, -4(%rbp)
        }                                          .L2:
        ~                                                  cmpl    $0, -8(%rbp)
        ~                                                  jne     .L3
    

    This assembly code says the following to the CPU:

    Jump to L2.
    L3: load value of flag from memory, compare it to $0, if they aren’t equal jump to L2 else store 1 into flag’s memory address and then jump to L2.
    L2: load value of condition from memory, compare it to 0, if they aren’t equal jump to L3, else exit.

    Why did I write in bold “load” and “store”? Because they represent accesses to memory and these are the most expensive type of instructions for a CPU. Memory response times are generally an order of magnitude higher than CPU elaboration times.

    In this version of the program we have 3 accesses to memory in case flag equals 0 (two loads and one store) and 2 accesses in case flag equals 1(only two loads).

    Notice that we also have two jump instructions inside the loop, one to stay inside it until condition equals 0 and one to skip the assignment if flag is already equal to 1.

    Second version - Set to 1, don’t bother with checking the previous value

        int main(){                                       movl    $42, -4(%rbp)
                int condition = 42;                       movl    $0, -8(%rbp)
                int flag = 0;                             jmp     .L2
                while(condition){                 .L3:
                        flag = 1;                         movl    $1, -8(%rbp)
                }                                 .L2:
        }                                                 cmpl    $0, -4(%rbp)
        ~                                                 jne     .L3
    

    In this second version the assembly code says:

    Jump to L2.
    L3: store 1 into flag’s memory address.
    L2: load value of condition from memory, compare it to 0, if they aren’t equal jump to L3, else exit.

    Here we have only one load and one store per iteration, so only two memory accesses whether the flag value is 0 or 1. Also, notice that the only jump instruction is the one necessary to stay inside the loop if the variable condition isn’t equal to 0.

    So, which version is better?

    Since the second version implies only two memory accesses against three of the first one in case flag equals 0, and also one less jump instruction, in most real case scenarios it would be the right choice if performance is a concern.

    Notice that I labeled load and store as generic memory access instructions. This because the two instructions usually require the same computation time. But… what if the store instruction was way more expensive than the load instruction? In this particular case the first version would have been better but I personally don’t know any scenario in which this happens to be true.


    Update 26/08/19 - C Compiler Optimizations

    It’s interesting to observe that in a “real” program like the following one, compiling with gcc -O3 (level 3 optimizations enabled) yields to the identical assembly code for both versions.

    $ cat ex-v1.c
    #include <stdio.h>
    #define	IN	1
    #define	OUT	0
    int main(){
    	int c;
    	int state;
    	while((c = getchar()) != EOF){
    		if(c == '\n' || c == '\t' || c == ' '){
    			if(state == IN){
    				putchar('\n');
    				state = OUT;
    			}
    		}
    		else{
    			if(state == OUT)
    				state = IN;
    			putchar(c);
    		}
    	}
    	return 0;
    }
    $ cat ex-v2.c
    #include <stdio.h>
    #define	IN	1
    #define	OUT	0
    int main(){
    	int c;
    	int state;
    	while((c = getchar()) != EOF){
    		if(c == '\n' || c == '\t' || c == ' '){
    			if(state == IN){
    				putchar('\n');
    				state = OUT;
    			}
    		}
    		else{
    			state = IN;
    			putchar(c);
    		}
    	}
    	return 0;
    }
    $ gcc -O3 -S ex-v1.c ex-v2.c && diff ex-v1.s ex-v2.s
    1c1
    < 	.file	"ex-v1.c"
    ---
    > 	.file	"ex-v2.c"