Date: Wed, 14 Aug 2019 09:59:13 -0600
> u8"é" is ambiguous. Both people and the compiler may interpret that in a
> variety of ways. Notably if I have utf-8 in that file, which I wrote on
> Linux, but then the msvc compiler thinks it's windows 1252...
> Mojibake.
We have a recursive example of bytes/characters confusion here. If you
want to say that the bytes 75 38 22 c3 a9 22 (because you "have utf-8 in
that file") are ambiguous, of course they are, but so is 5c 41 unless
you restrict to ASCII/Latin-*/UTF-8. You always have to arrange for
your compiler to know which characters are signified by the bytes in
your source file, and having some of them be non-ASCII doesn't
fundamentally change anything (even though in practice it makes it harder).
Your message doesn't contain those bytes anyway; since it contains a header
Content-Type: text/plain; charset="UTF-8"
it's appropriate to say that you wrote 5 (abstract) characters: LATIN
SMALL LETTER U, DIGIT EIGHT, QUOTATION MARK, LATIN SMALL LETTER E WITH
ACUTE, and QUOTATION MARK again. (Of course, you could also have
written LATIN SMALL LETTER E and COMBINING ACUTE ACCENT; that's a
different sort of ambiguity.)
Davis
> variety of ways. Notably if I have utf-8 in that file, which I wrote on
> Linux, but then the msvc compiler thinks it's windows 1252...
> Mojibake.
We have a recursive example of bytes/characters confusion here. If you
want to say that the bytes 75 38 22 c3 a9 22 (because you "have utf-8 in
that file") are ambiguous, of course they are, but so is 5c 41 unless
you restrict to ASCII/Latin-*/UTF-8. You always have to arrange for
your compiler to know which characters are signified by the bytes in
your source file, and having some of them be non-ASCII doesn't
fundamentally change anything (even though in practice it makes it harder).
Your message doesn't contain those bytes anyway; since it contains a header
Content-Type: text/plain; charset="UTF-8"
it's appropriate to say that you wrote 5 (abstract) characters: LATIN
SMALL LETTER U, DIGIT EIGHT, QUOTATION MARK, LATIN SMALL LETTER E WITH
ACUTE, and QUOTATION MARK again. (Of course, you could also have
written LATIN SMALL LETTER E and COMBINING ACUTE ACCENT; that's a
different sort of ambiguity.)
Davis
-- This product is sold by volume, not by mass. If it appears too dense or too sparse, it is because mass-energy conversion has occurred during shipping.
Received on 2019-08-14 17:59:21