Date: Wed, 11 Dec 2024 08:31:07 +0000
This is going to be a tough thread to follow.
Oliver Hunt wrote:
> As this thread is _still_ going with numerous arguments for or against signed and unsigned I think that shows that signed vs unsigned is not a case with an objectively superior choice, and given the magnitude of this change it would need a much stronger justification.
> I want to be clear, I am not arguing “unsigned is better than signed” or vice versa, just saying that the *value* of the change is clearly not universally - or overwhelmingly - agreed upon, and changes of this magnitude would really require that.
I agree 100% that this will kill any chances for such a proposal of ever being successful.
And that any discussion around this would be purely out of academic interest, and would have no weight on the outcome of the proposal.
I only aim to do the following, to illustrate that there’s not only no overwhelming reason to change the interface from unsigned to signed, if the interface was the other way around, there would be a quite significant reason to do the opposite (to change from signed to unsigned).
Tom Honermann wrote:
> The conversion happens before the bounds checking can be performed unless the index type is a template parameter of the operator[] declaration. That is not the case for the standard library containers.
You mean they happened after bounds checking is performed? Because if it happens before, the bounds check will catch the problem and trivially discard the invalid index.
The way I see it there are only 3 types of applications when it relates to indexing.
1. You can guarantee at compile time that the index is in range. And a bounds checking is unnecessary, sign or unsigned the index does the right thing.
2. You can’t guarantee at compile time that indexes are in range. In that case you should do a bounds check to make sure indexes are in range.
3. Bugged applications.
And if bounds checking needs to be done before doing a safe access, it is usually the responsibility of the user to check that they are within bounds and that they are using the right type.
The fact that variable promotion changes expectations and the properties of what is being represented, is a problem of the variable promotion not of the signed/unsigned API, that is what should be fixed.
While variable promotion rules made some sense in the context of technology that existed when those rules were first set up, technology has evolved since then but the standard has not, which made them very much broken now.
> Correct, but that is only problematic for (unsigned) values that exceed the range of the signed integer type, which is an uncommon occurrence. Bounds checking trivially catches these cases by rejecting negative values.
Well, it does only if you reject negative values. Make it unsigned, and you don’t need to reject negative values because unsigned values cannot be negative.
That’s the point.
TBO integer arithmetic in practice works the same regardless, but bounds checking does not, which I think is being overlooked in this thread.
> This isn't an issue of right or wrong; it is an issue of tradeoffs given the problems caused by implicit integer promotions and conversions. Use of signed types enables more bugs to be detected.
I see it as a matter of not being able to easily represent invalid states, and that matters a lot to me.
While “to large” indexes is not a category of failure modes that are easily eliminated (and nobody is proposing to), “negative” indexes is one that can and has been eliminated.
And as for “enabling more bugs to be detected”, I have direct experience on this, specially when using APIs that use “int” for indexing (like Qt does).
Bounds checking is very often written as “n < container.size()” instead of “n < container.size() && n >= 0”, and it rarely ever happens that “type of n is not the same as type of size”.
Case in point Qt is a good example of an API that uses signed indexing, and it’s a very large pain point (for me) because it doesn’t match well with the things you want to use the containers for, introducing failure modes and pain points that simply do not exist with the standard containers precisely because unsigned is and should be the right type for indexes.
And is (to my experience) a main source (but not the only source) of the mantra “the best way to use Qt is to not use Qt”.
If you think you have issues now with sign/unsigned type casting, making it signed will make the problem much worse.
Andrey Semashev wrote:
> Negative distances only take place when you want to move an iterator backwards. In every other use case, including sizes and indexes, you're always talking non-negative integers.
> I think using signed integers for sizes and offsets is wrong because those values are never negative. Signed types simply mis-document the interface and confuse the users. And then frustrates the users if they want to fix the issue in their end.
> […] No, the fix is to resolve problems of poor interaction between signed and unsigned integers in the language. And the OP doesn't propose that.
X2
This is also my experience.
Sebastian Wittmeier wrote:
> So besides the issues with implicit conversion between signed/unsigned and overflow behavior (whether UB or wrapping),
> why should 0 to UINT_MAX be how it should be,
> but neither INT_MIN to INT_MAX
> nor 0 to N-1?
Signed integers also overflow, except it is UB in the standard, which makes it an argument against using signed values, not for.
> There can be historic and other reasons, but do not understand, why indices greater than the maximum element should be allowed, but indices smaller than the minimal element not. Especially in cases, where the boundaries are known at compile-time like for std::array.
But that’s the point, both “indices greater that maximal” and “indices smaller that minimal” are not allowed. The point of making them unsigned is that it makes “indices smaller that minimal” an unrepresentable state, thus leaving you only with “indices greater that maximal” to deal with.
Jonathan Wakely wrote:
> Always? I've definitely used iter[-1] when dealing with random access iterators and with pointers into arrays. It's more concise than *std::prev(iter), especially if other code nearby is using iter[0] and iter[1].
An iterator is not a container. While there are reasons to makes sense to used signed values with random access iterators, the fact that you can use iterators in a such a way is a whole set of other problems that I think are of topic for this discussion.
Thiago Macieira wrote:
> But you cannot say the same about the distance between two arbitrary elements in a sequence. You don't know the direction, therefore you must record it. You could just record it with an unsigned and a direction, but that would make APIs like operator+ impossible because of the inability to specify direction.
> In any case, the language says pointer distances are ptrdiff_t and thus signed.
I agree, distances between two arbitrary points should be signed. I don’t see what the confusion is here. A distance is not the same thing as an index. I don’t see what is the problem here, is it that the math is complicated to reason about? Did you know that for most platforms the arithmetic operations make absolutely no difference regardless of signdness? (i.e. they are exactly the same, sign or unsigned is irrelevant)
Maybe I don’t understand because it’s not confusing to me.
Oliver Hunt wrote:
> As this thread is _still_ going with numerous arguments for or against signed and unsigned I think that shows that signed vs unsigned is not a case with an objectively superior choice, and given the magnitude of this change it would need a much stronger justification.
> I want to be clear, I am not arguing “unsigned is better than signed” or vice versa, just saying that the *value* of the change is clearly not universally - or overwhelmingly - agreed upon, and changes of this magnitude would really require that.
I agree 100% that this will kill any chances for such a proposal of ever being successful.
And that any discussion around this would be purely out of academic interest, and would have no weight on the outcome of the proposal.
I only aim to do the following, to illustrate that there’s not only no overwhelming reason to change the interface from unsigned to signed, if the interface was the other way around, there would be a quite significant reason to do the opposite (to change from signed to unsigned).
Tom Honermann wrote:
> The conversion happens before the bounds checking can be performed unless the index type is a template parameter of the operator[] declaration. That is not the case for the standard library containers.
You mean they happened after bounds checking is performed? Because if it happens before, the bounds check will catch the problem and trivially discard the invalid index.
The way I see it there are only 3 types of applications when it relates to indexing.
1. You can guarantee at compile time that the index is in range. And a bounds checking is unnecessary, sign or unsigned the index does the right thing.
2. You can’t guarantee at compile time that indexes are in range. In that case you should do a bounds check to make sure indexes are in range.
3. Bugged applications.
And if bounds checking needs to be done before doing a safe access, it is usually the responsibility of the user to check that they are within bounds and that they are using the right type.
The fact that variable promotion changes expectations and the properties of what is being represented, is a problem of the variable promotion not of the signed/unsigned API, that is what should be fixed.
While variable promotion rules made some sense in the context of technology that existed when those rules were first set up, technology has evolved since then but the standard has not, which made them very much broken now.
> Correct, but that is only problematic for (unsigned) values that exceed the range of the signed integer type, which is an uncommon occurrence. Bounds checking trivially catches these cases by rejecting negative values.
Well, it does only if you reject negative values. Make it unsigned, and you don’t need to reject negative values because unsigned values cannot be negative.
That’s the point.
TBO integer arithmetic in practice works the same regardless, but bounds checking does not, which I think is being overlooked in this thread.
> This isn't an issue of right or wrong; it is an issue of tradeoffs given the problems caused by implicit integer promotions and conversions. Use of signed types enables more bugs to be detected.
I see it as a matter of not being able to easily represent invalid states, and that matters a lot to me.
While “to large” indexes is not a category of failure modes that are easily eliminated (and nobody is proposing to), “negative” indexes is one that can and has been eliminated.
And as for “enabling more bugs to be detected”, I have direct experience on this, specially when using APIs that use “int” for indexing (like Qt does).
Bounds checking is very often written as “n < container.size()” instead of “n < container.size() && n >= 0”, and it rarely ever happens that “type of n is not the same as type of size”.
Case in point Qt is a good example of an API that uses signed indexing, and it’s a very large pain point (for me) because it doesn’t match well with the things you want to use the containers for, introducing failure modes and pain points that simply do not exist with the standard containers precisely because unsigned is and should be the right type for indexes.
And is (to my experience) a main source (but not the only source) of the mantra “the best way to use Qt is to not use Qt”.
If you think you have issues now with sign/unsigned type casting, making it signed will make the problem much worse.
Andrey Semashev wrote:
> Negative distances only take place when you want to move an iterator backwards. In every other use case, including sizes and indexes, you're always talking non-negative integers.
> I think using signed integers for sizes and offsets is wrong because those values are never negative. Signed types simply mis-document the interface and confuse the users. And then frustrates the users if they want to fix the issue in their end.
> […] No, the fix is to resolve problems of poor interaction between signed and unsigned integers in the language. And the OP doesn't propose that.
X2
This is also my experience.
Sebastian Wittmeier wrote:
> So besides the issues with implicit conversion between signed/unsigned and overflow behavior (whether UB or wrapping),
> why should 0 to UINT_MAX be how it should be,
> but neither INT_MIN to INT_MAX
> nor 0 to N-1?
Signed integers also overflow, except it is UB in the standard, which makes it an argument against using signed values, not for.
> There can be historic and other reasons, but do not understand, why indices greater than the maximum element should be allowed, but indices smaller than the minimal element not. Especially in cases, where the boundaries are known at compile-time like for std::array.
But that’s the point, both “indices greater that maximal” and “indices smaller that minimal” are not allowed. The point of making them unsigned is that it makes “indices smaller that minimal” an unrepresentable state, thus leaving you only with “indices greater that maximal” to deal with.
Jonathan Wakely wrote:
> Always? I've definitely used iter[-1] when dealing with random access iterators and with pointers into arrays. It's more concise than *std::prev(iter), especially if other code nearby is using iter[0] and iter[1].
An iterator is not a container. While there are reasons to makes sense to used signed values with random access iterators, the fact that you can use iterators in a such a way is a whole set of other problems that I think are of topic for this discussion.
Thiago Macieira wrote:
> But you cannot say the same about the distance between two arbitrary elements in a sequence. You don't know the direction, therefore you must record it. You could just record it with an unsigned and a direction, but that would make APIs like operator+ impossible because of the inability to specify direction.
> In any case, the language says pointer distances are ptrdiff_t and thus signed.
I agree, distances between two arbitrary points should be signed. I don’t see what the confusion is here. A distance is not the same thing as an index. I don’t see what is the problem here, is it that the math is complicated to reason about? Did you know that for most platforms the arithmetic operations make absolutely no difference regardless of signdness? (i.e. they are exactly the same, sign or unsigned is irrelevant)
Maybe I don’t understand because it’s not confusing to me.
Received on 2024-12-11 08:31:19