Date: Tue, 2 Mar 2021 08:19:27 +0100
On 01/03/2021 16.24, Corentin via SG16 wrote:
> Hey folks!
> Last meeting we talked about the relation between the literal & execution encoding.
>
> I think there is pressure to solve this issue (encoding names, std::print, other features).
> In P2297, I suggested that we say the execution character set is a superset of the literal character set, such that any character in the literal character set results in the same code unit sequence
> whether it is encoded in the literal encoding or execution encoding.
My understanding of "character set" is more along the lines of
"set of characters", independent of encoding.
Thus, even if the execution character set is a superset of the
literal character set, this statement does not establish any
relationship between the corresponding encodings.
> Hubert was concerned this was too restrictive because some ebcdic & iso 646 have codepoints reserved for "national symbols".
> Even Shift-JIS is not 100% ascii compatible (Yen instead of backslash, overline instead of tilde)
Yes.
> I've been thinking about that over the past few days, I think the solution is to not have requirements on the literal character set but rather on the literals themselves.
I don't think we need any constraints spelled out in the standard,
because the fact that the literal encoding and the execution encoding
are unrelated implies that certain library functions only work as
expected if you happen to use only characters that are encoded the
same in both, and where the encoding mechanism doesn't differ too
radically (e.g. Shift-JIS).
Any constraint we spell out should be on the implementation, i.e.
by establishing some required relationship between the encodings,
but it seems that's not in line with reality.
Jens
> Hey folks!
> Last meeting we talked about the relation between the literal & execution encoding.
>
> I think there is pressure to solve this issue (encoding names, std::print, other features).
> In P2297, I suggested that we say the execution character set is a superset of the literal character set, such that any character in the literal character set results in the same code unit sequence
> whether it is encoded in the literal encoding or execution encoding.
My understanding of "character set" is more along the lines of
"set of characters", independent of encoding.
Thus, even if the execution character set is a superset of the
literal character set, this statement does not establish any
relationship between the corresponding encodings.
> Hubert was concerned this was too restrictive because some ebcdic & iso 646 have codepoints reserved for "national symbols".
> Even Shift-JIS is not 100% ascii compatible (Yen instead of backslash, overline instead of tilde)
Yes.
> I've been thinking about that over the past few days, I think the solution is to not have requirements on the literal character set but rather on the literals themselves.
I don't think we need any constraints spelled out in the standard,
because the fact that the literal encoding and the execution encoding
are unrelated implies that certain library functions only work as
expected if you happen to use only characters that are encoded the
same in both, and where the encoding mechanism doesn't differ too
radically (e.g. Shift-JIS).
Any constraint we spell out should be on the implementation, i.e.
by establishing some required relationship between the encodings,
but it seems that's not in line with reality.
Jens
Received on 2021-03-02 01:19:36