Date: Mon, 15 Jun 2020 13:43:26 +0200
On 15/06/2020 13.14, Corentin via SG16 wrote:
> Hey.
>
> It occured to me that while we discuss about terminology, we have not clearly stated our
> objectives
>
> Ultimately the observed behavior should be identical in most respect to what implementers do today and in particular we don't want to change the mapping for a well formed program, to avoid silent change, and we want to keep supporting the set of source encoding supported by existing implementations.
>
> But (I think) we should
>
> - Tighten the specification to describe a semantic rather than implementation defined mapping, while also making sure that mappings prescribed by vendors, or Unicode, such as UTF-EBCDIC are not accidentally forbidden.
> - Make the program ill-formed if the source file contains invalid code unit sequences or unmappable characters (the later of which is not a concern for existing implementations)
>
> - Specify that unicode encoded (and especially utf-8 encoded) files preserve the sequence of code points in phase 1 (aka no normalization)
> - Mandate support for utf-8 files
> - Find a better wording mechanism for the handling of raw string literals, which most likely means that we want to handle universal-character names that appear in the source files as verbatim escape sequences differently than what phase 1 calls extended characters.
> - Not actively prevent, nor mandate, that, if the source and the execution or wide execution encoding are the same, the sequence of bytes in string literal is preserved.
>
> Did I miss anything?
Rename "execution character set" to something involving encoding of literals.
Jens
> Hey.
>
> It occured to me that while we discuss about terminology, we have not clearly stated our
> objectives
>
> Ultimately the observed behavior should be identical in most respect to what implementers do today and in particular we don't want to change the mapping for a well formed program, to avoid silent change, and we want to keep supporting the set of source encoding supported by existing implementations.
>
> But (I think) we should
>
> - Tighten the specification to describe a semantic rather than implementation defined mapping, while also making sure that mappings prescribed by vendors, or Unicode, such as UTF-EBCDIC are not accidentally forbidden.
> - Make the program ill-formed if the source file contains invalid code unit sequences or unmappable characters (the later of which is not a concern for existing implementations)
>
> - Specify that unicode encoded (and especially utf-8 encoded) files preserve the sequence of code points in phase 1 (aka no normalization)
> - Mandate support for utf-8 files
> - Find a better wording mechanism for the handling of raw string literals, which most likely means that we want to handle universal-character names that appear in the source files as verbatim escape sequences differently than what phase 1 calls extended characters.
> - Not actively prevent, nor mandate, that, if the source and the execution or wide execution encoding are the same, the sequence of bytes in string literal is preserved.
>
> Did I miss anything?
Rename "execution character set" to something involving encoding of literals.
Jens
Received on 2020-06-15 06:46:38