sg16: Re: [SG16] P2295R3 Support for UTF-8 as a portable source file encoding

From: Steve Downey <sdowney_at_[hidden]>
Date: Thu, 6 May 2021 19:25:00 -0400

coding[=:]\s*([-\w.]+) - the python magic comment form covers both emacs
and apparently modern vim conventions.

// -*- mode:C++ coding:utf-8 -*-
will be interpreted as a C++ file encoded as utf-8 by emacs where vim looks
for
// vim:fileencoding=utf-8

Emacs will also accept -*-<mode>-*- for just setting the mode, but that's
not relevant, although it's why you'll see -*-c++-*- in many places.

On Thu, May 6, 2021 at 6:38 PM Tom Honermann via SG16 <sg16_at_[hidden]>
wrote:

> On 5/6/21 3:22 PM, Thiago Macieira via SG16 wrote:
>
> On Thursday, 6 May 2021 12:14:35 PDT Ville Voutilainen wrote:
>
> Of course it does. It always has.
>
> Thanks, Ville.
>
> That was a strawman argument to show that the barrier to the feature can be
> unreasonably high, thus making it as good as useless. That is what I'd like to
> see fixed next.
>
> Not only should there be an easy way to enable the UTF-8 support, it should be
> enabled by something in the source file itself, not a external to it.
>
>
> My plan is to submit a paper that discusses the following possibilities:
>
> - A new pragma directive. There is existing practice in the form of IBM's
> #pragma filetag directive
> <https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.3.0/com.ibm.zos.v2r3.cbclx01/zos_pragma_filetag.htm>
> .
> #pragma encoding(encoding-name)
> - A magic comment. Very likely the Python encoding declaration
> <https://docs.python.org/3/reference/lexical_analysis.html#encoding-declarations>
> .
> // -*- coding: <encoding-name> -*-
> - Use of a BOM
>
> In all three cases, the intent is that differently encoded source files
> will be usable within the same translation unit.
>
> In the first two cases, there will be restrictions regarding where in the
> encoding declaration may appear; e.g., it must be wholly contained within
> the first 4k bytes of the file. The paper will discuss how implementations
> with a default encoding that differs from the encoding specified by the
> encoding declaration will identify the declaration. This is really only
> relevant for ASCII-based vs EBCDIC-based concerns.
>
> My present intent is to propose the magic comment solution since it avoids
> the
> but-my-compiler-warns-about-unrecognized-pragmas-even-though-it-shouldn't
> issue. Per Corentin's paper, implementations will still be able to rely on
> a command line option, BOM, pragma directive, filesystem metadata,
> whatever, to determine an encoding in the absence of an encoding
> declaration. The paper will also discuss the
> what-if-the-encoding-declaration-doesn't-match-the-actual-file-encoding
> issue (UB of course).
>
> Tom.
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>

Received on 2021-05-06 18:25:18