Date: Wed, 27 Jan 2021 10:24:54 -0500
On Wed, Jan 27, 2021 at 5:22 AM BAILLY Yves <yves.bailly_at_[hidden]> wrote:
> I think I get your point. After more thinking about it (which I should
> have done before, sorry for that), [...]
>
No worries. I'm not expressing my points in the directest and clearest
possible manner, either. I think we started at opposite vague extremes and
are slowly "haggling" our way to meet somewhere in the middle. I started at
"This is not possible"; you started at "This is possible." I'm still 100%
certain that when we meet it'll be on the "This is not possible" side of
the line... but we're still working our way there.
> > Is f<U> the same function as, or a different function from, f<T>? (I
> don't know but I think so.)
>
> If *T* and *U* are two different types, then *f<U>* is a different
> function from *f<T>*.
>
> *However,* because *U* and *T* refer to the same Platonic type and a *U*
> can be seen as a *T*, then if *f<U>* is syntactically correct,
> well-formed with regard to the restrictions put on U (for example, inside
> *f<>* there’s no assignment of a * T* to a *U* without an explicit cast),
> then the actual instantiation of * f<U>* can be the same as the
> instantiation of *f<T>*
>
I know what you mean, but I hope we'll agree to ignore the possibility
for *compiler
optimizations* and just focus on the *formal language semantics*.
For example, in C++20 if you write
extern int i;
int *f1(int *x) { return x+i; }
float *f2(float *x) { return x+i; }
the compiler is perfectly well permitted to *optimize* those two functions
into
f1: nop
f2: movslq i(%rip), %rax
leaq (%rdi,%rax,4), %rax
ret
No mainstream compiler actually bothers to do this, but there's nothing
*stopping* a "sufficiently smart compiler" from doing it. (Similar
optimizations at the linker/LTO level are more common.)
However, whatever happens behind the scenes in the compiler must still
respect the "As-If Rule": it isn't allowed to break any conforming program.
For example, if some program asks
assert((void*)f1 != (void*)f2);
the optimized program must still answer correctly. (That's what that extra
`nop` is doing in the codegen — it's giving f1 and f2 distinct addresses at
the machine level, even though f1's *control flow* just flows straight into
f2.)
So yes, of course a sufficiently smart compiler could also make your `f<T>`
and `f<U>` share the same code at the machine level. But it would still
have to ensure that `&f<T> != &f<U>` *in C++*, right? *That's* what I mean
by f<T> and f<U> being different functions.
For template instantiations in particular, there's another way to tell them
apart. Different template specializations have different sets of
function-local-static variables.
template<class A>
void f() {
static int i = 0;
printf(" %d", ++i);
}
struct T {};
using U = new T;
int main() {
f<T>(); f<T>(); f<T>(); // prints 1 2 3
f<U>(); f<U>(); f<U>(); // should print 1 2 3 — *not* 4 5 6, as it
would with an ordinary type alias!
}
If you remove the keyword `new`, so that U and T are just two names for the
same type, then C++'s formal semantics are that &f<T> == &f<U> and that the
output of the program is "1 2 3 4 5 6" (because if T and U are the same
type then we're just calling the same function 6 times, instead of two
different functions each 3 times).
This would apply to the *std::hash<>* specialization: when required to
> instantiate *std::hash<U>*, if it has not been explicitly specialized,
> then as you said the compiler may realize that *std::hash<U>* is the same
> (has the same contents although it doesn’t have the same identity) as
> *std::hash<T>* - again, if and only if the code for *std::hash<U>* is
> well-formed and there’s no explicit specialization defined provided by the
> user.
>
Okay, consider this snippet, then:
using size_t = unsigned long;
template<> struct hash<size_t> {
size_t operator()(const size_t& x) const { return x; }
};
using Width = new size_t;
std::hash<Width> hasher;
Width width = 42;
size_t bytecount = 42;
auto x = hasher(width); // OK?? What is decltype(x)?
x = bytecount; // OK if decltype(x) is size_t... but not OK if
decltype(x) is Width, correct?
AIUI, you're postulating some yet-to-be-fleshed-out mechanism by which the
compiler is going to look at the C++ source code of the `hash<size_t>`
specialization and generate a copy substituting `Width` for `size_t` in
some yet-to-be-fleshed-out manner.
(1) Ideally the compiler would know that the signature of
hash<Width>::operator() should be `size_t operator()(const Width&) const`,
not `Width operator()(const Width&) const`. But how can it possibly know
that one of the size_ts should be substituted and not the other?
(2) What happens if the *declaration* of `hash<size_t>::operator()` is
visible in this TU, but the *definition* is not visible? That seems like a
problem similar to what would happen if you put a function template
definition into a .cpp file. What implications does that have for usability?
(3) How does the compiler even know that I *want* `Width` to have a `hash`
specialization? Will it apply the same logic to say, well, `std::rotl
<https://en.cppreference.com/w/cpp/numeric/rotl>` exists for size_t so it
should also exist for `Width`? That actually seems like the kind of thing
that I want strong typedefs to *protect* me from.
(4) Also, wait a minute, how does `Width width = 42;` even compile at all?
`42` is not a `Width`; it's an int. If implicitly converting size_t(42) to
Width is forbidden, then surely it should be equally forbidden to
implicitly convert int(42) to Width. So, how do you envision this facility
interacting with integer literals?
#1 is the vastly most important issue here, because it strikes directly at
your vague "substitution" mechanism for creating new specializations of
`hash`.
#3 and #4 are relatively unexplored territory and I foresee them having
easy vague answers that would then have to be unpacked via further
discussion. So please don't use #3 and #4 as an excuse to procrastinate on
#1. #1 is important!
HTH,
Arthur
>
> I think I get your point. After more thinking about it (which I should
> have done before, sorry for that), [...]
>
No worries. I'm not expressing my points in the directest and clearest
possible manner, either. I think we started at opposite vague extremes and
are slowly "haggling" our way to meet somewhere in the middle. I started at
"This is not possible"; you started at "This is possible." I'm still 100%
certain that when we meet it'll be on the "This is not possible" side of
the line... but we're still working our way there.
> > Is f<U> the same function as, or a different function from, f<T>? (I
> don't know but I think so.)
>
> If *T* and *U* are two different types, then *f<U>* is a different
> function from *f<T>*.
>
> *However,* because *U* and *T* refer to the same Platonic type and a *U*
> can be seen as a *T*, then if *f<U>* is syntactically correct,
> well-formed with regard to the restrictions put on U (for example, inside
> *f<>* there’s no assignment of a * T* to a *U* without an explicit cast),
> then the actual instantiation of * f<U>* can be the same as the
> instantiation of *f<T>*
>
I know what you mean, but I hope we'll agree to ignore the possibility
for *compiler
optimizations* and just focus on the *formal language semantics*.
For example, in C++20 if you write
extern int i;
int *f1(int *x) { return x+i; }
float *f2(float *x) { return x+i; }
the compiler is perfectly well permitted to *optimize* those two functions
into
f1: nop
f2: movslq i(%rip), %rax
leaq (%rdi,%rax,4), %rax
ret
No mainstream compiler actually bothers to do this, but there's nothing
*stopping* a "sufficiently smart compiler" from doing it. (Similar
optimizations at the linker/LTO level are more common.)
However, whatever happens behind the scenes in the compiler must still
respect the "As-If Rule": it isn't allowed to break any conforming program.
For example, if some program asks
assert((void*)f1 != (void*)f2);
the optimized program must still answer correctly. (That's what that extra
`nop` is doing in the codegen — it's giving f1 and f2 distinct addresses at
the machine level, even though f1's *control flow* just flows straight into
f2.)
So yes, of course a sufficiently smart compiler could also make your `f<T>`
and `f<U>` share the same code at the machine level. But it would still
have to ensure that `&f<T> != &f<U>` *in C++*, right? *That's* what I mean
by f<T> and f<U> being different functions.
For template instantiations in particular, there's another way to tell them
apart. Different template specializations have different sets of
function-local-static variables.
template<class A>
void f() {
static int i = 0;
printf(" %d", ++i);
}
struct T {};
using U = new T;
int main() {
f<T>(); f<T>(); f<T>(); // prints 1 2 3
f<U>(); f<U>(); f<U>(); // should print 1 2 3 — *not* 4 5 6, as it
would with an ordinary type alias!
}
If you remove the keyword `new`, so that U and T are just two names for the
same type, then C++'s formal semantics are that &f<T> == &f<U> and that the
output of the program is "1 2 3 4 5 6" (because if T and U are the same
type then we're just calling the same function 6 times, instead of two
different functions each 3 times).
This would apply to the *std::hash<>* specialization: when required to
> instantiate *std::hash<U>*, if it has not been explicitly specialized,
> then as you said the compiler may realize that *std::hash<U>* is the same
> (has the same contents although it doesn’t have the same identity) as
> *std::hash<T>* - again, if and only if the code for *std::hash<U>* is
> well-formed and there’s no explicit specialization defined provided by the
> user.
>
Okay, consider this snippet, then:
using size_t = unsigned long;
template<> struct hash<size_t> {
size_t operator()(const size_t& x) const { return x; }
};
using Width = new size_t;
std::hash<Width> hasher;
Width width = 42;
size_t bytecount = 42;
auto x = hasher(width); // OK?? What is decltype(x)?
x = bytecount; // OK if decltype(x) is size_t... but not OK if
decltype(x) is Width, correct?
AIUI, you're postulating some yet-to-be-fleshed-out mechanism by which the
compiler is going to look at the C++ source code of the `hash<size_t>`
specialization and generate a copy substituting `Width` for `size_t` in
some yet-to-be-fleshed-out manner.
(1) Ideally the compiler would know that the signature of
hash<Width>::operator() should be `size_t operator()(const Width&) const`,
not `Width operator()(const Width&) const`. But how can it possibly know
that one of the size_ts should be substituted and not the other?
(2) What happens if the *declaration* of `hash<size_t>::operator()` is
visible in this TU, but the *definition* is not visible? That seems like a
problem similar to what would happen if you put a function template
definition into a .cpp file. What implications does that have for usability?
(3) How does the compiler even know that I *want* `Width` to have a `hash`
specialization? Will it apply the same logic to say, well, `std::rotl
<https://en.cppreference.com/w/cpp/numeric/rotl>` exists for size_t so it
should also exist for `Width`? That actually seems like the kind of thing
that I want strong typedefs to *protect* me from.
(4) Also, wait a minute, how does `Width width = 42;` even compile at all?
`42` is not a `Width`; it's an int. If implicitly converting size_t(42) to
Width is forbidden, then surely it should be equally forbidden to
implicitly convert int(42) to Width. So, how do you envision this facility
interacting with integer literals?
#1 is the vastly most important issue here, because it strikes directly at
your vague "substitution" mechanism for creating new specializations of
`hash`.
#3 and #4 are relatively unexplored territory and I foresee them having
easy vague answers that would then have to be unpacked via further
discussion. So please don't use #3 and #4 as an excuse to procrastinate on
#1. #1 is important!
HTH,
Arthur
>
Received on 2021-01-27 09:25:14