Subject: Re: What do we want from source to internal conversion?
From: Jens Maurer (Jens.Maurer_at_[hidden])
Date: 2020-06-15 06:43:26
On 15/06/2020 13.14, Corentin via SG16 wrote:
> It occured to me that while we discuss about terminology, we have notÂ clearly stated our
> UltimatelyÂ the observed behavior should be identical in most respect to what implementersÂ do today and in particular we don't want to change the mapping for a well formed program, to avoid silent change, and we want to keepÂ supporting the set of source encoding supported by existing implementations.
> But (I think) we should
> - Tighten the specification to describe a semantic rather than implementation defined mapping, while also making sure that mappings prescribed by vendors, or Unicode, such as UTF-EBCDIC are notÂ accidentallyÂ forbidden.
> - Make the program ill-formed if the source file contains invalid code unit sequences or unmappable characters (the later of which is not a concern for existing implementations)
> - Specify that unicode encoded (and especially utf-8 encoded) files preserve the sequenceÂ of code points in phase 1 (aka no normalization)
> - Mandate support for utf-8 files
> - Find a better wording mechanism for the handling of raw stringÂ literals, which most likely means that we want to handle universal-character names that appear in the source files as verbatim escape sequences differently than what phase 1 calls extended characters.
> - NotÂ actively prevent, nor mandate, that, if the source and the execution or wide execution encoding are the same, the sequence of bytes in string literal is preserved.
> Did I miss anything?
Rename "execution character set" to something involving encoding of literals.
SG16 list run by firstname.lastname@example.org