C++ Logo

sg16

Advanced search

Re: [SG16-Unicode] [isocpp-core] What is the proper term for the locale dependent run-time character set/encoding used for the character classification and conversion functions?

From: keld_at <keld_at_[hidden]>
Date: Tue, 13 Aug 2019 22:34:56 +0200
For most programs there is no default execution character set nor default
execution encoding. A binary program is designed to run with the run time
execution character set of the locale it runs with. So the same binary
ĝogram can run with a Japanese encoding or a Danish enoding or arabic encoding.
There is no knowledge at compilation time what encoding will be used at run time.


keld

On Tue, Aug 13, 2019 at 04:10:29PM -0400, Steve Downey wrote:
> Getting back to the original question. I think execution character set and
> execution encoding would refer to the encoding specified by the default
> locale, the "C" locale. We do not change the execution encoding via calls
> to setlocale(), we change the global default locale to a new locale.
>
> Any name is going to be confusing. I think it's better to just get an
> explicit definition to go together with the term. Something like that the
> execution encoding is the same as the default character set associated with
> the default "C" locale, and that it is IF NDR if the actual default
> character set is different than the presumed translation from source
> encoding to execution encoding, or if translation units with different
> execution encodings are linked together. IF NDR because I don't see how it
> could always be detected but it can quickly turn into ODR violations where
> the same named object has different definitions.
>
> On Tue, Aug 13, 2019 at 1:22 PM Corentin <corentin.jabot_at_[hidden]> wrote:
>
> >
> >
> > On Tue, Aug 13, 2019, 7:08 PM Thiago Macieira <thiago_at_[hidden]> wrote:
> >
> >> On Tuesday, 13 August 2019 09:55:07 PDT Corentin wrote:
> >> > (if anyone is thinking about that, I don't recommend it. You're going
> >> to run
> >> > into size limits: ICC at 512kB and MSVC at 256kB. Use something like
> >> xxd -i
> >> > to generate a brace-delimited array instead)
> >> >
> >> > Afaik that works if you use \x to escape every byte otherwise some
> >> > implementation will mess with your data. Nothing is guaranteed to be
> >> > passthrough otherwise
> >>
> >> That would be ideal, but the problem I had was the unavailability of
> >> proper
> >> tools to convert the input into a form that the C++ compiler could
> >> consume. I
> >> was trying to do with a simple concatenation of a header, data, and
> >> footer.
> >>
> >> The end result is a shell script, a Perl script and a powershell script:
> >> https://codereview.qt-project.org/c/qt/qtbase/+/263548
> >
> >
> > Interesting ! std::embed could be useful there (we are going a bit off
> > script). Some kind of raw bytes literals or an implementation that would
> > optimize parsing arrays of literals such that it is as efficient at compile
> > time as strings would also be nice.
> >
> >>
> >> --
> >> Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
> >> Software Architect - Intel System Software Products
> >>
> >>
> >>
> >> _______________________________________________
> > SG16 Unicode mailing list
> > Unicode_at_[hidden]
> > http://www.open-std.org/mailman/listinfo/unicode
> >

> _______________________________________________
> SG16 Unicode mailing list
> Unicode_at_[hidden]
> http://www.open-std.org/mailman/listinfo/unicode

Received on 2019-08-13 22:34:56