For most programs there is no default execution character set nor default
execution encoding. A binary program is designed to run with the run time
execution character set of the locale it runs with. So the same binary
řogram can run with a Japanese encoding or a Danish enoding or arabic encoding.
There is no knowledge at compilation time what encoding will be used at run time
The standard assumes there is one. It has to. You cannot not have an encoding.
(Of course it is broken but it's a very old assumption).
Also there is no such thing as a Danish encoding or a Japanese encoding. There is a Danish locale and an encoding attached to that locale (utf8, iso 8859). The standard doesn't always makes the distinction - it should)
But yeah, all of that precludes people to have non ASCII in there source as this is currently the only thing that will work portably.
This is not inherent to C++ which is one reason other languages converged to utf8 as the default/only encoding.
(The primary reason being the Unicode character set is actually useful to store text)
On Tue, Aug 13, 2019 at 04:10:29PM -0400, Steve Downey wrote:
> Getting back to the original question. I think execution character set and
> execution encoding would refer to the encoding specified by the default
> locale, the "C" locale. We do not change the execution encoding via calls
> to setlocale(), we change the global default locale to a new locale.
> Any name is going to be confusing. I think it's better to just get an
> explicit definition to go together with the term. Something like that the
> execution encoding is the same as the default character set associated with
> the default "C" locale, and that it is IF NDR if the actual default
> character set is different than the presumed translation from source
> encoding to execution encoding, or if translation units with different
> execution encodings are linked together. IF NDR because I don't see how it
> could always be detected but it can quickly turn into ODR violations where
> the same named object has different definitions.
> On Tue, Aug 13, 2019 at 1:22 PM Corentin <firstname.lastname@example.org> wrote:
> > On Tue, Aug 13, 2019, 7:08 PM Thiago Macieira <email@example.com> wrote:
> >> On Tuesday, 13 August 2019 09:55:07 PDT Corentin wrote:
> >> > (if anyone is thinking about that, I don't recommend it. You're going
> >> to run
> >> > into size limits: ICC at 512kB and MSVC at 256kB. Use something like
> >> xxd -i
> >> > to generate a brace-delimited array instead)
> >> >
> >> > Afaik that works if you use \x to escape every byte otherwise some
> >> > implementation will mess with your data. Nothing is guaranteed to be
> >> > passthrough otherwise
> >> That would be ideal, but the problem I had was the unavailability of
> >> proper
> >> tools to convert the input into a form that the C++ compiler could
> >> consume. I
> >> was trying to do with a simple concatenation of a header, data, and
> >> footer.
> >> The end result is a shell script, a Perl script and a powershell script:
> >> https://codereview.qt-project.org/c/qt/qtbase/+/263548
> > Interesting ! std::embed could be useful there (we are going a bit off
> > script). Some kind of raw bytes literals or an implementation that would
> > optimize parsing arrays of literals such that it is as efficient at compile
> > time as strings would also be nice.
> >> --
> >> Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
> >> Software Architect - Intel System Software Products
> >> _______________________________________________
> > SG16 Unicode mailing list
> > Unicode@isocpp.open-std.org
> > http://www.open-std.org/mailman/listinfo/unicode
> SG16 Unicode mailing list