On Mon, Jun 15, 2020 at 11:38 AM Corentin via SG16 <sg16@lists.isocpp.org> wrote:
On Mon, 15 Jun 2020 at 17:17, Tom Honermann <tom@honermann.net> wrote:
On 6/15/20 7:14 AM, Corentin via SG16 wrote:
But (I think) we should
- Tighten the specification to describe a semantic rather than implementation defined mapping, while also making sure that mappings prescribed by vendors, or Unicode, such as UTF-EBCDIC are not accidentally forbidden.That sounds good to me. However, as we've recently learned, there are certain implementation shenanigans that need to be accounted for:
- gcc stripping white space following a line continuation character (I think you intend to adopt this behavior in general though).
- trigraphs
Whitespace stripping will be a ewg proposal in that mailing.Hopefully trigraph can either happen in an undocumented phase 0, or be understood as part of "semantic mapping"
Sorry for the late post. It's been a busy week. I believe the discussion has progressed such that reviving this thread might not be the most productive, but the point of trigraphs appear to have last appeared only on this thread.
The behaviour of trigraphs is still subject to "magic" reversal in raw strings, so a "phase 0" approach leaves the reversal as "magic".
I think the model of extended characters discussed in the SG16
telecon this week can handle this. If we think of a trigraph as
an extended character distinct from any other spelling of an
extended character that denotes the same abstract character, then
this situation is analogous to the Shift-JIS case of the same
abstract character having multiple code point assignments in the
source input character set. By allowing the set of extended
characters to be implementation-defined (e.g., not simply the
Unicode character set), then extended characters carry the
original spelling through translation phases until a conversion to
another character set is required (e.g., until a an identifier
requires conformance to Unicode NFC in phase 3, or string literal
encoding in phase 5).
Tom.