Date: Thu, 07 Feb 2019 11:41:00 +0000
Corentin:
> 1/ Text _is_ bytes. Unicode specifies BOM and little/big endian versions
> so I think prepending a bom ( should you want to ) and byte-swapping can be
> done along transcoding
Maybe separate BOM handling into a format state of the stream. Consider
this:
enum class std::bom_handling
{
none, ///< Do not read or write BOM.
not_utf8, ///< Read and write BOM only in UTf-16 and UTF-32.
all ///< Read and write BOM in all 3 encoding forms.
};
BinaryOutputStream& stream;
stream.GetFormat().SetBOMHandling(std::bom_handling::all);
This feels like a more elegant design because it separates concerns better.
>
> 3/ stream don't have an explicit encoding ( or locale, for that matter ),
> but they might expect one - think for example about writing to stdout.
stdout is a special case and we have a way to handle it - execution
character set. The only problem is that in order to go from Unicode to
ECS right now you'd need to use std::codecvt which is a horrible mess.
Still, there are some utilities that treat stdin/stdout as a stream of
raw bytes so it doesn't always represent a text stream.
> 1/ Text _is_ bytes. Unicode specifies BOM and little/big endian versions
> so I think prepending a bom ( should you want to ) and byte-swapping can be
> done along transcoding
Maybe separate BOM handling into a format state of the stream. Consider
this:
enum class std::bom_handling
{
none, ///< Do not read or write BOM.
not_utf8, ///< Read and write BOM only in UTf-16 and UTF-32.
all ///< Read and write BOM in all 3 encoding forms.
};
BinaryOutputStream& stream;
stream.GetFormat().SetBOMHandling(std::bom_handling::all);
This feels like a more elegant design because it separates concerns better.
>
> 3/ stream don't have an explicit encoding ( or locale, for that matter ),
> but they might expect one - think for example about writing to stdout.
stdout is a special case and we have a way to handle it - execution
character set. The only problem is that in order to go from Unicode to
ECS right now you'd need to use std::codecvt which is a horrible mess.
Still, there are some utilities that treat stdin/stdout as a stream of
raw bytes so it doesn't always represent a text stream.
Received on 2019-02-07 12:42:01