experchange > c

blmblm (08-08-18, 05:10 AM)
I teach C programming to undergraduates, and while I do my best
to teach them how to use the language correctly, the recent thread
about undefined behavior reminds me that I never know quite what to
say about overflow in arithmetic on signed integers. Usually I've
just muttered something about how several languages (Java and Scala
come to mind) just give "wrong" answers, and there's not a lot one
can easily do about it. But if in C it's undefined behavior, hm,
that's (strictly speaking!) more serious, and I'm curious about what
one *can* do. Consider this code fragment:

int a, b;
/* code to assign values to a and b */
printf("%d + %d is %d\n", a, b, a + b);

I noticed in the other thread a suggestion (I think -- I may have
misunderstood it) that one can avoid UB in addition of signed integers
by first casting to unsigned integers, adding, and then casting back
to signed. That seems like it ought to give a result consistent with
the behavior or those Other Languages (that quietly wrap around),
and is perhaps the best one can reasonably do. Is that what a truly
pedantic and careful programmer would do?

Mostly I'm curious, because I think for my students the most appropriate
thing to do is just to continue to say that many commonly-used programming
languages (not all!) don't deal very gracefully with this kind of thing
and that a full discussion is beyond the scope of the course. But it
would be nice to give some hints about how to write truly careful code.
Reinhardt Behm (08-08-18, 05:48 AM)
AT Wednesday 08 August 2018 11:10, , wrote:

[..]
> languages (not all!) don't deal very gracefully with this kind of thing
> and that a full discussion is beyond the scope of the course. But it
> would be nice to give some hints about how to write truly careful code.


I would word it in the following way:
What's usually happens in the CPU (may be not all, but most) is silently
wrap around. Many languages handle it the same.
In C the compiler is free to do it the same way and ignore any overflow but
it is also free to do any kind of nasty things (the nasal daemons..)

Ignoring overflow and wrapping around will lead to mathematically incorrect
results. They can lead to embarrassing outputs of your program like websites
we all have seen to show totally nonsensical data (Your subscription will
end in -32767 days). In real life such results can lead to catastrophic
outcomes even with people getting killed and the programmer going to jail.

The responsible way of handling this is to always check the possible range
of inputs and results of calculations - also intermediate ones - and be
prepared that such overflows can happen and choose your data types
accordingly to prevent them. In critical programs document this to make sure
nobody can accuse you and the next programmer modifying the software knows
about it.

A question to the standard experts here:
Arithmetic operations with int types can inevitably lead to overflow and
thus UB. Which allows the compiler do to anything nasty. Can we expect that
if the inputs are such that no overflow will happen, we are guaranteed the
mathematically correct values and no nasal daemons, even if the compiler has
no chance to deduce that no OV will happen.
Malcolm McLean (08-08-18, 11:20 AM)
On Wednesday, August 8, 2018 at 4:48:08 AM UTC+1, Reinhardt Behm wrote:
> A question to the standard experts here:
> Arithmetic operations with int types can inevitably lead to overflow and
> thus UB. Which allows the compiler do to anything nasty. Can we expect that
> if the inputs are such that no overflow will happen, we are guaranteed the
> mathematically correct values and no nasal daemons, even if the compiler has
> no chance to deduce that no OV will happen. Take this code.


int main(int argc, char **argv)
{
printf("%d\n", atoi(argv[1]) + atoi(argv[2]));
}

The compiler has no way of detecting overflow at compile time. However
if the inputs are within range, it must act as you would expect.
Otherwise practically any calculation would be UB because it's generally
to hard to reason about values set in distant parts of the program,
even if the programmer knows that they must be small.

However if the results overflow, output can be anything. Generally it
will be a wrap, but a high quality implementation might terminate with
an error message, because, as you say, wrong results are often worse
than no results at all.
David Brown (08-08-18, 11:33 AM)
On 08/08/18 05:48, Reinhardt Behm wrote:

> A question to the standard experts here:
> Arithmetic operations with int types can inevitably lead to overflow and
> thus UB. Which allows the compiler do to anything nasty. Can we expect that
> if the inputs are such that no overflow will happen, we are guaranteed the
> mathematically correct values and no nasal daemons, even if the compiler has
> no chance to deduce that no OV will happen.


Yes.
David Brown (08-08-18, 11:40 AM)
On 08/08/18 05:10, blmblm wrote:
[..]
> languages (not all!) don't deal very gracefully with this kind of thing
> and that a full discussion is beyond the scope of the course. But it
> would be nice to give some hints about how to write truly careful code.


Teach them not to allow signed integer overflow in their programs.
There are, as you say, languages that define the behaviour of signed
integer overflow - but they give incorrect answers. Defined answers,
but wrong answers (for most purposes).

Teach them that they should not add two numbers if they might overflow -
just as they should not empty one bowl of apples into another bowl of
apples if that bowl is not big enough to hold them all. In C, doing so
may result in a troll coming along and eating you, and then the apples.
In Java, you'd be guaranteed a result - a negative number of apples in
the bowl. /Neither/ result is good. So don't do it.

Casting to unsigned integers, doing the arithmetic, and casting back is
equivalent to the Java "solution". It's adding some more apples to a
pile of apples and getting a negative result. Don't do that.
Consistently, predictably wrong and meaningless is still wrong and
meaningless.

There /are/ occasions when you actively want wraparound behaviour, and
unsigned integers give you that. But they are rare - very rare. In
almost all cases, if it overflows, it's a bug.
Bart (08-08-18, 12:21 PM)
On 08/08/2018 04:10, blmblm wrote:
[..]
> to signed. That seems like it ought to give a result consistent with
> the behavior or those Other Languages (that quietly wrap around),
> and is perhaps the best one can reasonably do.


This is what makes it so silly. You have to obfuscate your code just to
get to the starting point of those other languages. And do it in a
million places (eg. everywhere you might use ++i).

Those other languages had the benefit of being invented when pretty much
every machine used twos complement arithmetic, and they could afford to
ignore the oddball ones. (Although I think Java specifically makes its
behaviour identical on all machines.)

There's no reason why the same assumption couldn't be made with C, if
you are never going to run it on anything different. But no, C has to
play the UB card.

Is that what a truly
> pedantic and careful programmer would do?


If you're that worried about it, you need to use a language like Python,
where overflow is impossible: the numbers just get bigger and bigger
until you run out of memory.

Or one like Ada, where I can't imagine there is no facility to deal with
overflow systematically.

Otherwise, just remember this is a low level language and you're on your
own. There are no exceptions to trap an overflow and jump to a part of
your program to deal with that.

(And even if there were, it would make programs much more complicated.
How exactly could you proceed from such an event anyway? I think the
best is gcc's -ftrapv which will abort your program.)

If the correct result is absolutely crucial, and you can't ensure no
overflow at coding time, then you will need to put in validation checks,
for example in your example:

if (addoverflows(a,b)) ...

addoverflows() will check if a+b will overflow, and one way of doing
that involves performing the addition, but if does overflow, it will
invoke UB! So here you might use the unsigned version, but it would make
life much easier if it wasn't UB.

This is where I disagree with certain people here (although I
acknowledge that UB can be used to generate marginally more efficient code).

> Mostly I'm curious, because I think for my students the most appropriate
> thing to do is just to continue to say that many commonly-used programming
> languages (not all!) don't deal very gracefully with this kind of thing
> and that a full discussion is beyond the scope of the course. But it
> would be nice to give some hints about how to write truly careful code.


If specifically coding in such a low level language, one should be aware
of how overflows are handled inside a machine, which involves looking
beyond what the language says about it.

And actually, the way it works is that addition (signed /or/ unsigned,
it's usually exactly the same operation) wraps around. There may or may
not be internal flags set. But usually, INT_MAX+1 gives you INT_MIN.

One thing that bugs me with this, is that exactly the same limitation of
fixed width numbers also causes overflow with unsigned integers. Imagine
your program was like this:

unsigned int a, b;
/* code to assign values to a and b */
printf("%u + %u is %u\n", a, b, a + b);

Does making a and b unsigned magically make the problem of an incorrect
result disappear? It doesn't.

Except that C says a+b can NEVER overflow, and the result is ALWAYS
correct. Which means that UINT_MAX+1 resulting in zero is fine; it
doesn't represent any kind of overflow.

Good luck putting that across...
Bart (08-08-18, 12:25 PM)
On 08/08/2018 10:40, David Brown wrote:

> Teach them that they should not add two numbers if they might overflow -
> just as they should not empty one bowl of apples into another bowl of
> apples if that bowl is not big enough to hold them all.


Unless the bowls are unsigned.

Then the problem becomes one of explaining why it doesn't matter if the
numbers are unsigned, even though you still don't get the total of all
the apples. (And not even the saturated amount; you might end with less
than in either bowl.)
Reinhardt Behm (08-08-18, 12:43 PM)
AT Wednesday 08 August 2018 17:20, Malcolm McLean wrote:

[..]
> will be a wrap, but a high quality implementation might terminate with
> an error message, because, as you say, wrong results are often worse
> than no results at all.


Thank you and David for this clarification. I am no language lawyer, but
sometimes one can get the impression in this group that everything is UB.
Malcolm McLean (08-08-18, 12:44 PM)
On Wednesday, August 8, 2018 at 11:21:49 AM UTC+1, Bart wrote:
> unsigned int a, b;
> /* code to assign values to a and b */
> printf("%u + %u is %u\n", a, b, a + b);
> Does making a and b unsigned magically make the problem of an incorrect
> result disappear? It doesn't.
> Except that C says a+b can NEVER overflow, and the result is ALWAYS
> correct. Which means that UINT_MAX+1 resulting in zero is fine; it
> doesn't represent any kind of overflow.
> Good luck putting that across...

Another issue is that most integers in a program represent either counts
of things in memory, indices into those arrays, or sizes of memory in bytes.
So they should be size_t. Which means that C defines wrong results on
overflow. To be fair, there is ssize_t, but I have seldom seen it used.
David Brown (08-08-18, 12:44 PM)
On 08/08/18 12:25, Bart wrote:
> On 08/08/2018 10:40, David Brown wrote:
>> Teach them that they should not add two numbers if they might overflow -
>> just as they should not empty one bowl of apples into another bowl of
>> apples if that bowl is not big enough to hold them all.

> Unless the bowls are unsigned.


Read the rest of my post.

Teach them not to add two numbers if they might overflow. That
/includes/ unsigned numbers. Unsigned overflow is defined behaviour in
C - but it still gives you the wrong result for almost all cases.

> Then the problem becomes one of explaining why it doesn't matter if the
> numbers are unsigned, even though you still don't get the total of all
> the apples. (And not even the saturated amount; you might end with less
> than in either bowl.)


No, the key point to explain is how to avoid overflow. That's all. It
doesn't really matter if the integers are signed or unsigned - if you
are overflowing, you have a bug in the code.

Once you get onto more advanced stuff, you can start looking into why
wraparound unsigned overflow is occasionally useful, but that is for
much later on.
Bart (08-08-18, 12:50 PM)
On 08/08/2018 11:44, David Brown wrote:
> On 08/08/18 12:25, Bart wrote:
> Read the rest of my post.
> Teach them not to add two numbers if they might overflow. That
> /includes/ unsigned numbers. Unsigned overflow is defined behaviour in
> C - but it still gives you the wrong result for almost all cases.


Thanks, that must be the first time anyone has agreed with me on that point.
mark.bluemel (08-08-18, 12:50 PM)
On Wednesday, 8 August 2018 11:25:55 UTC+1, Bart wrote:
> On 08/08/2018 10:40, David Brown wrote:
> > Teach them that they should not add two numbers if they might overflow -
> > just as they should not empty one bowl of apples into another bowl of
> > apples if that bowl is not big enough to hold them all.

> Unless the bowls are unsigned.


What on earth do you think you mean by this?

David's point is simple.
1) Integers, in practically any programming language, have finite capacity.
(Naturally there are various Big Number implementations, but native integers
are finite).
2) If you exceed that capacity the result can not be arithmetically
correct.
3) Ergo, you should not exceed that capacity.

The fact that for certain languages, or certain implementations of languages,
the result of exceeding the capacity is a predictable, but still
arithmetically incorrect, value does not invalidate David's point.

The defined behaviour of signed arithmetic overflow in some languages is, I
suspect, largely a result of them developing in an environment where it was
reasonable to restrict the target hardware to that implementing
twos-complement arithmetic. I'd be interested to know if anyone has tried
implementing, say, Java on ones-complement or sign-and-magnitude environments
and if so how much that hurt.

The defined behaviour doesn't make the arithmetic results any more correct,
nor, in my mind, does it invalidate the decision of the designers of C (see
my other post "from the horse's mouth" for some background on the
design/evolution of C) that the behaviour of such overflow should not be
defined, so as to avoid the need for compiler-writers to jump through hoops
in environments where twos-complement wasn't the default.
Bart (08-08-18, 12:59 PM)
On 08/08/2018 11:44, Malcolm McLean wrote:
> On Wednesday, August 8, 2018 at 11:21:49 AM UTC+1, Bart wrote:
> Another issue is that most integers in a program represent either counts
> of things in memory, indices into those arrays, or sizes of memory in bytes.
> So they should be size_t. Which means that C defines wrong results on
> overflow. To be fair, there is ssize_t, but I have seldom seen it used.


C implements array indices as pointer offsets, which need to be positive
or negative. So the use of size_t might be a problem.

Actually, it needs a signed type one bit wider than size_t: given a
pointer to location zero, you might need to access an element at the top
of memory via an offset.

And a pointer to the top of memory, and needing to access a location at
(or near) the bottom of memory.
James Kuyper (08-08-18, 01:01 PM)
On 08/07/2018 11:10 PM, blmblm wrote:
[..]
> the behavior or those Other Languages (that quietly wrap around),
> and is perhaps the best one can reasonably do. Is that what a truly
> pedantic and careful programmer would do?


No.

The problem is that any bit pattern that represents a negative value in
the signed type will represent a positive value in the corresponding
unsigned type that is too big to be represented in the signed type.
Therefore, when such a value is converted from the unsigned type to the
signed type "either the result is implementation-defined or an
implementation-defined signal is raised." (6.3.1.3p3). That's marginally
better than "undefined behavior": "implementation-defined" behavior is
unspecified behavior for which an implementation is required to document
which choice it makes. Therefore, if your code is targeted to only a
small number of implementations, you can read the documentation for each
one, determine what it does with signed overflow, and write your code
accordingly. But such code is not guaranteed to work as desired on all
implementations of C.

I've heard it claimed that there is no C compiler, anywhere, that takes
the "raise a signal" option. I'm not sure how anyone could be
justifiably certain about that (unjustified certainty, on the other
hand, is easy to come by). But it is the case that most implementations
choose "implementation-defined result", and most of them define the
result as 2's complement wrap-around, in precisely the same fashion as
the other languages you're talking about.

The right way to deal with signed integer overflow portably is to
prevent it from happening. This can be trivial if the numbers you're
working with have a limited range: if you use a week numbering
convention that guarantees that week_number is always positive and less
than 53, then you can calculate 7*week_number without having to worry
about overflow. However, avoiding overflow can be annoyingly difficult
if you need to write code that works with arbitrary values.
Bart (08-08-18, 01:07 PM)
On 08/08/2018 11:50, mark.bluemel wrote:
> On Wednesday, 8 August 2018 11:25:55 UTC+1, Bart wrote:
> What on earth do you think you mean by this? ....
> The defined behaviour doesn't make the arithmetic results any more correct,


All I ever remember reading on this group is that there is no such thing
as overflow with unsigned integers, so the wraparound results are always
correct.

That would presume that everyone writing such code always had modular
arithmetic in mind.

You're right of course in that the overflow of both signed and unsigned
numbers, when people are trying to emulate pen and paper results, is
what is important and what you should strive to avoid.

But discussion of modular arithmetic and UB tends to side-line that.