On Thursday, 6 May 2021 12:14:35 PDT Ville Voutilainen wrote:Of course it does. It always has.Thanks, Ville. That was a strawman argument to show that the barrier to the feature can be unreasonably high, thus making it as good as useless. That is what I'd like to see fixed next. Not only should there be an easy way to enable the UTF-8 support, it should be enabled by something in the source file itself, not a external to it.
My plan is to submit a paper that discusses the following possibilities:
In all three cases, the intent is that differently encoded source
files will be usable within the same translation unit.
In the first two cases, there will be restrictions regarding
where in the encoding declaration may appear; e.g., it must be
wholly contained within the first 4k bytes of the file. The paper
will discuss how implementations with a default encoding that
differs from the encoding specified by the encoding declaration
will identify the declaration. This is really only relevant for
ASCII-based vs EBCDIC-based concerns.
My present intent is to propose the magic comment solution since
it avoids the
but-my-compiler-warns-about-unrecognized-pragmas-even-though-it-shouldn't
issue. Per Corentin's paper, implementations will still be able
to rely on a command line option, BOM, pragma directive,
filesystem metadata, whatever, to determine an encoding in the
absence of an encoding declaration. The paper will also discuss
the
what-if-the-encoding-declaration-doesn't-match-the-actual-file-encoding
issue (UB of course).
Tom.