Date: Fri, 16 Oct 2020 15:02:34 -0700
On Friday, 16 October 2020 13:59:15 PDT Tom Honermann wrote:
> I strongly agree that any recommendation that is not aligned with that
> approach is doomed to fail. I'd like to enable a solution that supports
> a best practice in which files that are *not* UTF-8 are marked in some
> way while acknowledging and supporting exceptional circumstances in
> which it makes sense to mark UTF-8 files instead, hopefully as a short
> term solution.
I can sympathise with the desire, but I have to question whether it's worth
it.
That requires:
* tooling be updated to support the markers (not just compilers, but mainly)
* older source code be updated to have said markers
* codebases using that old third party be updated to adopt them
Isn't it far more likely for the older source to be updated to UTF-8 instead?
It seems this marker would be added to source code that can't move to UTF-8
because some important customer can't make the switch, so it's added to tell
everyone else that it's some other encoding and those other people need to
update their compilers. And because said customer won't upgrade, this marker
has to be a no-op for them, unlike for everyone else.
It's far easier to simply keep the source code 7-bit US-ASCII clean instead
and cater to all non-IBM needs.
EBCDIC is a completely different story. EBCDIC is not widespread at all and it
also has a completely different solution, because mojibake is not possible. A
compiler failing to make the correct decision on EBCDIC- vs ASCII-based
encoding will encounter hard, syntax errors. But this is a very limited
environment, where I expect recoding and other marker techniques are already
in use. It's also the only environment that retains trigraphs and I say that,
like them, we shouldn't all be made to suffer -- "the needs of the many".
> I strongly agree that any recommendation that is not aligned with that
> approach is doomed to fail. I'd like to enable a solution that supports
> a best practice in which files that are *not* UTF-8 are marked in some
> way while acknowledging and supporting exceptional circumstances in
> which it makes sense to mark UTF-8 files instead, hopefully as a short
> term solution.
I can sympathise with the desire, but I have to question whether it's worth
it.
That requires:
* tooling be updated to support the markers (not just compilers, but mainly)
* older source code be updated to have said markers
* codebases using that old third party be updated to adopt them
Isn't it far more likely for the older source to be updated to UTF-8 instead?
It seems this marker would be added to source code that can't move to UTF-8
because some important customer can't make the switch, so it's added to tell
everyone else that it's some other encoding and those other people need to
update their compilers. And because said customer won't upgrade, this marker
has to be a no-op for them, unlike for everyone else.
It's far easier to simply keep the source code 7-bit US-ASCII clean instead
and cater to all non-IBM needs.
EBCDIC is a completely different story. EBCDIC is not widespread at all and it
also has a completely different solution, because mojibake is not possible. A
compiler failing to make the correct decision on EBCDIC- vs ASCII-based
encoding will encounter hard, syntax errors. But this is a very limited
environment, where I expect recoding and other marker techniques are already
in use. It's also the only environment that retains trigraphs and I say that,
like them, we shouldn't all be made to suffer -- "the needs of the many".
-- Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org Software Architect - Intel DPG Cloud Engineering
Received on 2020-10-16 17:02:44