Date: Thu, 07 Feb 2019 10:20:00 +0000
Corentin:
> Ideally, we need 3 separate things:
>
> 1/ A way to read/write byte streams
> 2/ A way to transcode to/from non-unicode encoding
> 3/ A way to determine the encoding expected by a given stream.
>
> The later is in the general case not possible, and it might not be
> generically possible for simple things like console i/o.
> I think the only sane way forward is to by default assume utf8 everywhere
> and work with os vendors to ensure they have the same defaults.
>
> I think 1/ falls entirely outside of the scope of SG16
What about transforming text to bytes?
I maintain a C++20 serialization library and have plans to offer it for
standardization: https://gitlab.com/ftz/serialization
The key problem is that I'm not sure exactly how it would work with text.
Since execution character set and UTF-8 use bytes as code units,
[de]serializing them is effectively a memcpy. UTF-16 and UTF-32 otoh
require handling of endianness. My streams store user-supplied
endianness so there is automatic conversion during IO.
Consider this syntax used by my library:
BinaryOutputStream& stream;
std::u16string string{ ... };
stream.SetEndianness(std::endian::big);
Serialization::Write(stream, string);
In my opinion the default behavior would perform byteswap of each code
unit in little endian system before writing. What about BOM? That would
require something explicit. I think the generic way would be to have a
strong type that have special BOM handling during IO.
On the point 2/. I think this is easily done using ranges-like
customization points. My serialization library uses them for serializing
user-defined types. For text conversion a user will simply need to
customize something like std::to_unicode_code_point and
std::from_unicode_code_point, the rest of the code will simply use those.
On the point 3/. Byte streams don't have text encoding. It's up to the
user what encoding to write.
Is there a way for me to reach wider public without having Google
account? std-proposals page gives me a text-only list of topics without
any help of how to participate. Probably because I have JavaScript disabled.
> Ideally, we need 3 separate things:
>
> 1/ A way to read/write byte streams
> 2/ A way to transcode to/from non-unicode encoding
> 3/ A way to determine the encoding expected by a given stream.
>
> The later is in the general case not possible, and it might not be
> generically possible for simple things like console i/o.
> I think the only sane way forward is to by default assume utf8 everywhere
> and work with os vendors to ensure they have the same defaults.
>
> I think 1/ falls entirely outside of the scope of SG16
What about transforming text to bytes?
I maintain a C++20 serialization library and have plans to offer it for
standardization: https://gitlab.com/ftz/serialization
The key problem is that I'm not sure exactly how it would work with text.
Since execution character set and UTF-8 use bytes as code units,
[de]serializing them is effectively a memcpy. UTF-16 and UTF-32 otoh
require handling of endianness. My streams store user-supplied
endianness so there is automatic conversion during IO.
Consider this syntax used by my library:
BinaryOutputStream& stream;
std::u16string string{ ... };
stream.SetEndianness(std::endian::big);
Serialization::Write(stream, string);
In my opinion the default behavior would perform byteswap of each code
unit in little endian system before writing. What about BOM? That would
require something explicit. I think the generic way would be to have a
strong type that have special BOM handling during IO.
On the point 2/. I think this is easily done using ranges-like
customization points. My serialization library uses them for serializing
user-defined types. For text conversion a user will simply need to
customize something like std::to_unicode_code_point and
std::from_unicode_code_point, the rest of the code will simply use those.
On the point 3/. Byte streams don't have text encoding. It's up to the
user what encoding to write.
Is there a way for me to reach wider public without having Google
account? std-proposals page gives me a text-only list of topics without
any help of how to participate. Probably because I have JavaScript disabled.
Received on 2019-02-07 11:20:35