Wednesday, January 7, 2009

Order of Operations and the Undefined Behavior

Recently I came across a topic on Stack Overflow that intrigued me. The question was thus:
Can someone explain to me why this code prints 14? I was just asked by another student and couldn't figure it out.


int i = 5;
i = ++i + ++i;
cout << i;


As it turns out, the equation
i = ++i + ++i;
is undefined in the C / C++ standard. The issue boils down to one of expression evaluation. If we look at just i = i++;, and if i++; is interpreted as i = i + 1;, then what is i = i = i + 1? What takes precedence in the assignment and the evaluation? Not too surprisingly, the creators of C / C++ simply resolved the issue with an "undefined" - the ISO equivalent of a punt. See Stroustrup's Explanation or the C++ Standard itself for more explanation. (Note that as Stroustrup points out, C++ more or less inherited this behavior from C).

But this made me curious: if C++ has this behavior as undefined, how does C# handle this same problem?

Let's look at a very simple case that is allowed by the C# compiler:


int i = 1;
i = (++i) + (++i);
System.Console.WriteLine(i);



Knowing that the above code my be undefined (which, by the way, I need to read the C# standard and find out if it really is undefined or not), I would anticipate that the output would be either 5 or 6. Let's look at both.

Scenario 1:
If it is 5, we would have a sequence similar to this:

i = (i++) + (2);
i = (3) + (2);
i = 5;


The above seems to assume that a temporary variable is used for either the pre- or post-increment operator, and then an addition is performed.

Scenario 2:
If, however, the output is 6, we would have a sequence similar to this:

i = (i++) + (2);
i = (3) + (3);
i = 6;


In this case, i is assigned to a value of 3 after the pre-increment operator, and the addition becomes i + i (as opposed to i + (prev)i, as it was for an output of 5).

Running the above code in C# however will output 5. It looks like Scenario 1 is the winner - but what happened?

To answer this, we can look at the IL generated for that snippet of code:


int i = 1;
000000ff mov dword ptr [ebp-4Ch],1
i = (++i) + (++i);
00000106 inc dword ptr [ebp-4Ch]
00000109 mov esi,dword ptr [ebp-4Ch]
0000010c inc dword ptr [ebp-4Ch]
0000010f add dword ptr [ebp-4Ch],esi
System.Console.WriteLine(i);
00000112 mov ecx,dword ptr [ebp-4Ch]
00000115 call 747E2EA0
0000011a nop



We store the original value in ebp. We then increment the value, and store it in esi. We then increment the value in ebp again, then add it to the value in esi. Note that while Scenario 1 here is the winner, the user on Stack Overflow got a result consistent with Scenario 2 with their C++ compiler... interesting!

Even more interesting is what we get if we use post-increment operators in the statement - that being 3.

int i = 1;
000000ff mov dword ptr [ebp-4Ch],1
i = i++ + i++;
00000106 mov edi,dword ptr [ebp-4Ch]
00000109 inc dword ptr [ebp-4Ch]
0000010c mov esi,dword ptr [ebp-4Ch]
0000010f inc dword ptr [ebp-4Ch]
00000112 add edi,esi
00000114 mov dword ptr [ebp-4Ch],edi
System.Console.WriteLine(i);
00000117 mov ecx,dword ptr [ebp-4Ch]
0000011a call 747E2EA0
0000011f nop


Here we store the value initially in ebp. We then store it in edi, and increment ebp. We move that value (2) to esi, and increment ebp again (3). We then add edi and esi together - and get 3. We then move edi to ebp.

Looking at the order of precedence in C# for the post-increment operator, it does make sense that the post-increment occurs prior to the addition - what is interesting is that the addition doesn't use both increments, but rather the original value of i (1) and the first post-increment result stored in esi. Odd.

Let's look at what happens if we use a mix of pre- and post-increment operators:


int i = 1;
000000ff mov dword ptr [ebp-4Ch],1
i = (i++) + (++i);
00000106 mov esi,dword ptr [ebp-4Ch]
00000109 inc dword ptr [ebp-4Ch]
0000010c inc dword ptr [ebp-4Ch]
0000010f add dword ptr [ebp-4Ch],esi
System.Console.WriteLine(i);
00000112 mov ecx,dword ptr [ebp-4Ch]
00000115 call 747E2EA0
0000011a nop


This will actually give you an output of 4. We declare i and store it in ebp. When we reach our i = (i++) + (++i); statement, we store the value of i in register esi. So far, so good. The next two instructions increment the value of i - at this point, I would expect the output of the statement to be 5 or 6, right? Instead, C# does something odd here, and adds it to the original value of i stored in esi. Not at all what I would expect - except, again, that pre- and post-increment operators do not have the same order of precedence. In this case, it appears as if what we are really calculating is i = i + (++(i++)), or i = i + ((i+1))+1).

Moral of the Story

  1. Undefined operations can have interesting results. I haven't searched the C# specification for whether or not this really is undefined in C# (I imagine I probably should do that!) - but either way, people should be careful using what is undefined in one language in a different language as well.
  2. There really is a difference between pre- and post-increment, and order of precedence matters a lot (unless, of course, 3 == 4 == 5).

0 comments:

Post a Comment