Date: Thu, 3 Jul 2025 12:44:53 +0200
On 03/07/2025 02:15, Radu Ungureanu wrote:
> > Did you also look for the number of times identifiers like "u8" are
> used for other things, such as an 8-bit variable? "uint8_t u8 = 123;".
> Or for when the combination is used in connection with UTF-8 ? Or
> someone has written "int u1, u2, u3, u4, u5, u6, u7, u8, u9, u10;" ? Or
> code for which "u1" and "u2" are variables, so that it would seem odd if
> "u8" were a type? Or code for which the programmer had defined "u1" as
> an alias for "bool", or had "u8" as an 8-bit bit-field in a struct?
>
> I see your point, but I consider the risk to be quite low considering
> it's in a separate namespace. Such issues would only arise if someone
> does `using namespace std::ints;`.
Obviously namespaces help - without them, the whole idea would clearly
be impossible. However, you are missing two key points as far as I can
tell.
One is that if you are /not/ writing a using clause, the names are
clearly much longer, bulkier and inconvenient than the existing standard
type names. Putting such a using clause at the start of a file is
rightly frowned upon by most styles, and if you have to write it at the
start of each function, you will often add more clutter and extra typing
than you save.
The other is that coding is not about /writing/ code - it is about
/reading/ code. Your job as a programmer is not to write code in the
easiest manner, but to write code that is easiest to read and understand
later - either yourself or other programmers. Your job as a standards
contributor is not to make it easy for programmers to write code
quickly, but to make it easier for programmers to write clearer code.
So it really doesn't matter if there are technical ways to avoid
identifier clashes seen by the compiler. What matters is what a
programmer will think when they look at a piece of code, and see the
identifier "u8" or similar. How is /that/ person avoiding mixups? If
the code uses similar identifiers for other purposes, adding in an
additional extra meaning for them makes matters worse.
If I see the identifier "u8" in a piece of code, I expect to see its
definition within a few lines up in the code. Such identifiers are fine
for very local variables. If I saw it used as a type in wider scope,
I'd think the code writer was not really a C++ programmer. Perhaps they
are a Rust programmer who happened to write some C++ while still
thinking in Rust. Maybe they were a C90 programmer from before the
standardisation of [u]intN_t types. Maybe they are a smart-arse
programmer who thinks it is cool to use "modern" typenames. (Yes, I
know that's a "no true Scotsman" argument - but it shows how use of such
typenames makes code harder to understand.)
Even worse, what should readers think if some bits of code use "u32",
and other bits use "uint32_t" ? Why is there a difference?
Local type aliases can be used for brevity to make code clearer. Wider
scope type aliases should have names that add useful semantic
information, and possibly allow for flexibility of code by later
modification of the type. "using IntVect = std::vector<int>;" can make
code easier to read. "using raw_colour_value = uint8_t;" adds semantic
information. "using u8 = uint8_t;" removes information, makes
negligible difference to code size, and reduces code legibility by
removing the clear type indication.
>
> > We have keywords like "return" and "class" rather that "rt" and
> "cls", for good reasons. Overly long names for common uses are wrong
> (no one would call Cobol a "readable" language!), as are overly short names.
>
> > Using them for wide, global identifiers is a truly terrible idea.
>
> This feature would be /opt-in/, in it's own namespace (not as global
> identifiers, unless explicitly doing a "using namespace") and would not
> be new keywords into the language.
>
> Yes there are cases where short identifiers can be unclear in certain
> contexts, but here it's not the case. These have already been widely
> adopted, and their definitions are quite easy to understand (*i*32 -> 32
> bit integer, *u*16 -> 16 bit unsigned integer). Also, this isn't the
> first time single letters have been used in the STL for this, take for
> example uint16_t, why not have it unsigned_int16_t? Or wchar_t, why not
> have it be wide_char_t? It is a trade-off between brevity and verbosity,
> and the opt-in namespace provides the user a choice.
>
Brevity and abbreviations certainly have their uses, especially for
commonly-used identifiers that are familiar to all. But your argument
in this case is clearly bogus. "int" has been established for over 50
years as meaning "integer". The suffix "_t" has been used for "type"
for almost as long. Thus "int32_t" was immediately recognisable as
"32-bit integer type" when it was standardised 25 years ago. The "u" in
"uint32_t" comes /in addition/ to the name "int32_t", and is immediately
clear from that context. The same applies to the "w" in "wchar_t". What
you are talking about, on the other hand, is a stand-alone "u".
Abbreviations require a context to have meaning - the "u" in "uint32_t"
has that context, while the "u" in "u32" does not.
Of course when a reader sees "u32" in the context of a type, it is
usually going to be obvious that it means "uint32_t". But the cognitive
load is greater for interpreting it. And in C++, it is not always clear
when an identifier is a type or something else - so you have to figure
that out too.
> I agree with Jan Schultke, in that this should be put on the shelves for
> when _BitInt arrives, which would also add the additional semantic
> meaning like Andre Kostur mentioned.
There is scope for all sorts of different kinds of integers in C++.
However, I don't see much potential use for _BitInt types - I haven't
ever seen them used in C, and I work in a field that is much more "bits
and bytes" oriented than the majority of developers. They have three
main potential uses in C - allowing flexibility when defining a type
("constexpr bit_size = 16; typedef _BitInt(bit_size) raw_data_t;" -
something easily handled in C++ by templates), supporting weird integer
sizes on FGPA's (C++ is very rarely used on these), and allowing 24-bit
integer types on 8-bit devices and bigger integer types on small
devices. In addition, there are no integer promotions.
There is nothing about _BitInt types that could not be implemented today
in C++ with relatively simple templates. C does not have the expressive
power of templates, operator overloads, etc., and had to add them as a
core language feature. C++ can handle them without that.
A much better direction to move in for integer types in the C++ standard
library would be explicit control of unintended behaviour such as
overflows, along with support for larger (but still fixed size -
arbitrary sized integers are a different matter) integers. This would
be a template, or set of templates, letting the user specify sizes
and/or limits, and overflow behaviours (UB, modulo wrapping, saturation,
NaN, exceptions, error functions, trapping, etc.).
The one thing that would be worst of all, would be to have C++ implement
_BitInt and then to have a standard "using u8 = unsigned _BitInt(8);" to
give yet another incompatible type with a meaningless name.
David
> > Did you also look for the number of times identifiers like "u8" are
> used for other things, such as an 8-bit variable? "uint8_t u8 = 123;".
> Or for when the combination is used in connection with UTF-8 ? Or
> someone has written "int u1, u2, u3, u4, u5, u6, u7, u8, u9, u10;" ? Or
> code for which "u1" and "u2" are variables, so that it would seem odd if
> "u8" were a type? Or code for which the programmer had defined "u1" as
> an alias for "bool", or had "u8" as an 8-bit bit-field in a struct?
>
> I see your point, but I consider the risk to be quite low considering
> it's in a separate namespace. Such issues would only arise if someone
> does `using namespace std::ints;`.
Obviously namespaces help - without them, the whole idea would clearly
be impossible. However, you are missing two key points as far as I can
tell.
One is that if you are /not/ writing a using clause, the names are
clearly much longer, bulkier and inconvenient than the existing standard
type names. Putting such a using clause at the start of a file is
rightly frowned upon by most styles, and if you have to write it at the
start of each function, you will often add more clutter and extra typing
than you save.
The other is that coding is not about /writing/ code - it is about
/reading/ code. Your job as a programmer is not to write code in the
easiest manner, but to write code that is easiest to read and understand
later - either yourself or other programmers. Your job as a standards
contributor is not to make it easy for programmers to write code
quickly, but to make it easier for programmers to write clearer code.
So it really doesn't matter if there are technical ways to avoid
identifier clashes seen by the compiler. What matters is what a
programmer will think when they look at a piece of code, and see the
identifier "u8" or similar. How is /that/ person avoiding mixups? If
the code uses similar identifiers for other purposes, adding in an
additional extra meaning for them makes matters worse.
If I see the identifier "u8" in a piece of code, I expect to see its
definition within a few lines up in the code. Such identifiers are fine
for very local variables. If I saw it used as a type in wider scope,
I'd think the code writer was not really a C++ programmer. Perhaps they
are a Rust programmer who happened to write some C++ while still
thinking in Rust. Maybe they were a C90 programmer from before the
standardisation of [u]intN_t types. Maybe they are a smart-arse
programmer who thinks it is cool to use "modern" typenames. (Yes, I
know that's a "no true Scotsman" argument - but it shows how use of such
typenames makes code harder to understand.)
Even worse, what should readers think if some bits of code use "u32",
and other bits use "uint32_t" ? Why is there a difference?
Local type aliases can be used for brevity to make code clearer. Wider
scope type aliases should have names that add useful semantic
information, and possibly allow for flexibility of code by later
modification of the type. "using IntVect = std::vector<int>;" can make
code easier to read. "using raw_colour_value = uint8_t;" adds semantic
information. "using u8 = uint8_t;" removes information, makes
negligible difference to code size, and reduces code legibility by
removing the clear type indication.
>
> > We have keywords like "return" and "class" rather that "rt" and
> "cls", for good reasons. Overly long names for common uses are wrong
> (no one would call Cobol a "readable" language!), as are overly short names.
>
> > Using them for wide, global identifiers is a truly terrible idea.
>
> This feature would be /opt-in/, in it's own namespace (not as global
> identifiers, unless explicitly doing a "using namespace") and would not
> be new keywords into the language.
>
> Yes there are cases where short identifiers can be unclear in certain
> contexts, but here it's not the case. These have already been widely
> adopted, and their definitions are quite easy to understand (*i*32 -> 32
> bit integer, *u*16 -> 16 bit unsigned integer). Also, this isn't the
> first time single letters have been used in the STL for this, take for
> example uint16_t, why not have it unsigned_int16_t? Or wchar_t, why not
> have it be wide_char_t? It is a trade-off between brevity and verbosity,
> and the opt-in namespace provides the user a choice.
>
Brevity and abbreviations certainly have their uses, especially for
commonly-used identifiers that are familiar to all. But your argument
in this case is clearly bogus. "int" has been established for over 50
years as meaning "integer". The suffix "_t" has been used for "type"
for almost as long. Thus "int32_t" was immediately recognisable as
"32-bit integer type" when it was standardised 25 years ago. The "u" in
"uint32_t" comes /in addition/ to the name "int32_t", and is immediately
clear from that context. The same applies to the "w" in "wchar_t". What
you are talking about, on the other hand, is a stand-alone "u".
Abbreviations require a context to have meaning - the "u" in "uint32_t"
has that context, while the "u" in "u32" does not.
Of course when a reader sees "u32" in the context of a type, it is
usually going to be obvious that it means "uint32_t". But the cognitive
load is greater for interpreting it. And in C++, it is not always clear
when an identifier is a type or something else - so you have to figure
that out too.
> I agree with Jan Schultke, in that this should be put on the shelves for
> when _BitInt arrives, which would also add the additional semantic
> meaning like Andre Kostur mentioned.
There is scope for all sorts of different kinds of integers in C++.
However, I don't see much potential use for _BitInt types - I haven't
ever seen them used in C, and I work in a field that is much more "bits
and bytes" oriented than the majority of developers. They have three
main potential uses in C - allowing flexibility when defining a type
("constexpr bit_size = 16; typedef _BitInt(bit_size) raw_data_t;" -
something easily handled in C++ by templates), supporting weird integer
sizes on FGPA's (C++ is very rarely used on these), and allowing 24-bit
integer types on 8-bit devices and bigger integer types on small
devices. In addition, there are no integer promotions.
There is nothing about _BitInt types that could not be implemented today
in C++ with relatively simple templates. C does not have the expressive
power of templates, operator overloads, etc., and had to add them as a
core language feature. C++ can handle them without that.
A much better direction to move in for integer types in the C++ standard
library would be explicit control of unintended behaviour such as
overflows, along with support for larger (but still fixed size -
arbitrary sized integers are a different matter) integers. This would
be a template, or set of templates, letting the user specify sizes
and/or limits, and overflow behaviours (UB, modulo wrapping, saturation,
NaN, exceptions, error functions, trapping, etc.).
The one thing that would be worst of all, would be to have C++ implement
_BitInt and then to have a standard "using u8 = unsigned _BitInt(8);" to
give yet another incompatible type with a meaningless name.
David
Received on 2025-07-03 10:45:03