sg16: Re: [SG16-Unicode] P1689: Encoding of filenames for interchange

From: Steve Downey <sdowney_at_[hidden]>
Date: Sun, 8 Sep 2019 14:29:00 -0400

Oh dear. Does JSON specify locale for floating point.

And this is why LC_ALL=C is common in scripts that have to run in hostile
environments, like France, isn't it. It's been too long.

Our internal internationalization apis don't use locale, and are almost
always explicit. Does occasionally lead to failure on remembering what the
input format was. Now that almost everything is Unicode we have fewer
problems.

On Sat, Sep 7, 2019, 12:44 Thiago Macieira <thiago_at_[hidden]> wrote:

> On Saturday, 7 September 2019 08:26:00 PDT Lyberta wrote:
> > Thiago Macieira:
> > > On Friday, 6 September 2019 19:17:00 PDT Lyberta wrote:
> > >> I think if the machine-readable output depends on locale, the author
> of
> > >> the program seriously messed up.
> > >
> > > Oh, I agree with you. The problem is that the standard C library (as
> > > extended by POSIX) does not provide the API to make that happen *and*
> > > support internationalisation. And that's assuming the tool even have a
> > > "machine readable" format in the first place. In the Unix tradition,
> you
> > > just scrape the output of tools.
> >
> > Then don't use standard C library. On POSIX use open(), read() and
> > write(), have your own Unicode layer on top and read/write UTF-8 JSON if
> > you want to output anything machine-readable.
>
> Challenge: produce this JSON thread-safely in machine-readable format in
> C,
> with setlocale(LC_ALL, ""); at the top of the file, from input double v =
> 1.1.
>
> [ 1.1 ]
>
>
> > There is no such thing as plain text and Unix philosophy is dead.
>
> Great! Let's drop JSON then.
>
> > I have a C++ proposal for binary IO/serialization here:
> >
> > https://github.com/Lyberta/cpp-io
> >
> > It was already reviewed by Niall twice and hopefully by C++23 we'll have
> > sane binary IO in the standard. I don't have to plans to fix C at this
> > point though because it doesn't have an analog of std::byte yet.
>
> That looks very nice, aside from the obligatory bike-shedding of the class
> and
> namespace names, of course.
>
> > > But the input is not Unicode, it's file paths. On Unix, it is possible
> to
> > > pass binary input in the command-line. With some effort, you can even
> > > pass NULs to specially crafted receiver applications. The
> std::filesystem
> > > API appears to have a way to retrieve the native raw format, which some
> > > application may need.
> > Yeah, it's all because C decided to have char as both bytes and
> > characters and doomed us all for ~50 years of pain. We need to decide if
> > main() should get text or characters and fix it.
>
> I think the decision was made for us: bytes on Unix, bytes on Windows if
> Niall
> can get the runtimes to use UTF-8, wchar_t otherwise.
>
> --
> Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
> Software Architect - Intel System Software Products
>
>
>
> _______________________________________________
> SG16 Unicode mailing list
> Unicode_at_[hidden]
> http://www.open-std.org/mailman/listinfo/unicode
>

Received on 2019-09-08 20:29:15