On 10/18/20 12:44 PM, Thiago Macieira wrote:
On Sunday, 18 October 2020 09:29:59 PDT Tom Honermann wrote:
I see it a little differently.  I think the marker is most useful to enable
use of UTF-8 projects by non-UTF-8 projects that can't move to UTF-8 for
"reasons".  Such projects could use the marker in a few ways; 1) to mark
their own non-UTF-8 source files, 2) to mark UTF-8 source files, or 3) to
mark all source files.  I see little motivation for the third option;
defaults are useful.  Likewise, the second option competes with the
long-term direction to make UTF-8 the default (thanks to Corentin for
making the point that marking UTF-8 files would continue the C++ trend of
getting the defaults wrong), so that would not be considered a best
practice, but could make sense for projects that have "ANSI" or EBCDIC
encoded source files and complicated non-UTF-8 build/CI/CD processes, but
where those processes are less complicated for third party dependencies. 
Option 1 is then left as best practice.
That I agree with.

To be clear what I'm agreeing with: UTF-8 source files aren't generally marked 
in any special way, but non-UTF-8 source files get a marker that indicates 
their actual encoding. That is aligned with the general direction of not using 
BOM and of defaulting to UTF-8 unless otherwise instructed.
Good, we're on the same page there.

Corollary:

Said marker must be a no-op for backwards compatibility as we're talking about 
tools and environment that "for reasons" can't update to tools that can handle 
UTF-8 properlry. That means we can't have a C++ attribute:
	[[encoding: cp1252]]

Ditto for a pragma:

	#pragma encoding "cp1252"

That leaves out-of-band markers (filesystem metadata) or specially-formatted 
comments.

Mostly agreed.  The standard states that unrecognized pragma directives and unrecognized attributes are to be ignored, but we know that doesn't reflect existing practice.  That doesn't mean that we can't use those, it means that they would have to be guarded by a predefined macro:

#if defined(__cpp_encoding_attribute)
[[ encoding: cp1252 ]];
#endif

or

#if defined(__cpp_encoding_pragma)
#pragma encoding "cp1252"
#endif

That would complicate both the implementability and usability of the feature, so I wouldn't expect WG21 to want to go that route.

Tom.