sg16: Re: [SG16-Unicode] P1689: Encoding of filenames for interchange

From: Lyberta <lyberta_at_[hidden]>
Date: Sun, 08 Sep 2019 18:07:00 +0000

>> Thiago Macieira:
>>> Challenge: produce this JSON thread-safely in machine-readable format in
>>> C,
>>> with setlocale(LC_ALL, ""); at the top of the file, from input double v =
>>> 1.1.>
>>> [ 1.1 ]
>>
>> In C, huh? I'd rather write in C++ and in that case it's not that hard
>> because of std::mutex, char8_t, std::byte, templates.
>
> Ok, so how would you do this in C++?

First, you implement binary IO layer. My GCC 9-compatible implementation
is about 2300 lines of code. I'd say proper implementation that is
proposed is about 3000 lines of code.

Then, you implement Unicode streams on top. Right now I have code unit
and scalar value layer mostly implemented. That's about 3400 lines of
code. But for streams you need UCD and for UCD you need to parse it. My
pessimistic estimation for UCD parser is about 5000 lines of code. That
makes somewhere in the ballpark of 10,000 lines of code for Unicode streams.

Then you need a JSON reader/writer. It's hard to estimate without
writing one. Maybe 3000-5000 lines of code?

So, all in all, the entire thing is within 20,000 lines of code. I can
give better figures later when I have more of that implemented. I need
JSON myself.

> Because it isn't about JSON, it's about everything else. Here's a failure mode
> your code should be able to deal with: printing the localised error message of
> why it couldn't open the output file for writing.

Either write output for humans or machines. Don't mix the two. JSON is
for machines.

> I asked for C because my argument was that the C library is deficient in
> handling machine-readable floating point text forms. C++ has the necessary
> forms, but it's also very easy to fall into a trap and write wrong code.

C is deficient for almost all of the tasks. No need to state the obvious.

> It's not binary, though. This whole argument is that the moment you need to
> write or parse text forms of numbers and strings, you introduce new failure
> modes because programmers aren't aware of the pitfalls.

I agree, that's why I argue for exceptions-by-default on parse errors so
programmers are hit hard if they ignore them.

Received on 2019-09-08 20:08:12