C++ Logo

SG16

Advanced search

Subject: Re: [SG16-Unicode] [isocpp-core] What is the proper term for the locale dependent run-time character set/encoding used for the character classification and conversion functions?
From: Corentin (corentin.jabot_at_[hidden])
Date: 2019-08-13 15:49:09


On Tue, Aug 13, 2019, 10:34 PM <keld_at_[hidden]> wrote:

> For most programs there is no default execution character set nor default
> execution encoding. A binary program is designed to run with the run time
> execution character set of the locale it runs with. So the same binary
> řogram can run with a Japanese encoding or a Danish enoding or arabic
> encoding.
> There is no knowledge at compilation time what encoding will be used at
> run time
>

The standard assumes there is one. It has to. You cannot not have an
encoding.
(Of course it is broken but it's a very old assumption).

Also there is no such thing as a Danish encoding or a Japanese encoding.
There is a Danish locale and an encoding attached to that locale (utf8, iso
8859). The standard doesn't always makes the distinction - it should)

But yeah, all of that precludes people to have non ASCII in there source as
this is currently the only thing that will work portably.

This is not inherent to C++ which is one reason other languages converged
to utf8 as the default/only encoding.
(The primary reason being the Unicode character set is actually useful to
store text)

> keld
>
> On Tue, Aug 13, 2019 at 04:10:29PM -0400, Steve Downey wrote:
> > Getting back to the original question. I think execution character set
> and
> > execution encoding would refer to the encoding specified by the default
> > locale, the "C" locale. We do not change the execution encoding via calls
> > to setlocale(), we change the global default locale to a new locale.
> >
> > Any name is going to be confusing. I think it's better to just get an
> > explicit definition to go together with the term. Something like that the
> > execution encoding is the same as the default character set associated
> with
> > the default "C" locale, and that it is IF NDR if the actual default
> > character set is different than the presumed translation from source
> > encoding to execution encoding, or if translation units with different
> > execution encodings are linked together. IF NDR because I don't see how
> it
> > could always be detected but it can quickly turn into ODR violations
> where
> > the same named object has different definitions.
> >
> > On Tue, Aug 13, 2019 at 1:22 PM Corentin <corentin.jabot_at_[hidden]>
> wrote:
> >
> > >
> > >
> > > On Tue, Aug 13, 2019, 7:08 PM Thiago Macieira <thiago_at_[hidden]>
> wrote:
> > >
> > >> On Tuesday, 13 August 2019 09:55:07 PDT Corentin wrote:
> > >> > (if anyone is thinking about that, I don't recommend it. You're
> going
> > >> to run
> > >> > into size limits: ICC at 512kB and MSVC at 256kB. Use something like
> > >> xxd -i
> > >> > to generate a brace-delimited array instead)
> > >> >
> > >> > Afaik that works if you use \x to escape every byte otherwise some
> > >> > implementation will mess with your data. Nothing is guaranteed to be
> > >> > passthrough otherwise
> > >>
> > >> That would be ideal, but the problem I had was the unavailability of
> > >> proper
> > >> tools to convert the input into a form that the C++ compiler could
> > >> consume. I
> > >> was trying to do with a simple concatenation of a header, data, and
> > >> footer.
> > >>
> > >> The end result is a shell script, a Perl script and a powershell
> script:
> > >> https://codereview.qt-project.org/c/qt/qtbase/+/263548
> > >
> > >
> > > Interesting ! std::embed could be useful there (we are going a bit off
> > > script). Some kind of raw bytes literals or an implementation that
> would
> > > optimize parsing arrays of literals such that it is as efficient at
> compile
> > > time as strings would also be nice.
> > >
> > >>
> > >> --
> > >> Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
> > >> Software Architect - Intel System Software Products
> > >>
> > >>
> > >>
> > >> _______________________________________________
> > > SG16 Unicode mailing list
> > > Unicode_at_[hidden]
> > > http://www.open-std.org/mailman/listinfo/unicode
> > >
>
> > _______________________________________________
> > SG16 Unicode mailing list
> > Unicode_at_[hidden]
> > http://www.open-std.org/mailman/listinfo/unicode
>
>



SG16 list run by sg16-owner@lists.isocpp.org