Date: Tue, 7 May 2024 18:51:58 +0000
> The design needs to be driven by use cases (and ideally not by theoretical concerns that should inform the specification but not necessarily the interface)
> Python will not understand our custom escape sequence so it will just randomly fail on some identifiers until someone does the work to implement a "escaped c++ identifiers to utf8" in their python framework. or js framework, or all the 100s of tools that will exist.
>Same for runtime reflection.
> [… Edited for brevity…]
> Ultimately, I agree with Victor. If we really want to optimize for round tripping, and support both narrow encoding utf-8 semi-transparently a magic object is not the worst idea.
> But if we are trying to find a text encoding scheme that does not lose data, I would suggest we use the one we already have :)
I would lean more towards not having an escape sequence.
Providing at least the “literal encoding” should be the base, u8 very much optional (if you want you could generate those by transcoding, hence I think the issue of transcoders are important), as I assume that this could be use for example to resolve symbol names, correct me if I’m wrong but symbol names exist in any platforms, those are related to identifier names and have rules regarding what could be resolved as a symbol name.
Unicode is quite an extensive collection of glyphs, even if it doesn’t have everything, perhaps if you have written an identifier that doesn’t convert back to utf8 and you get into trouble… maybe don’t use those to write code?
> Python will not understand our custom escape sequence so it will just randomly fail on some identifiers until someone does the work to implement a "escaped c++ identifiers to utf8" in their python framework. or js framework, or all the 100s of tools that will exist.
>Same for runtime reflection.
> [… Edited for brevity…]
> Ultimately, I agree with Victor. If we really want to optimize for round tripping, and support both narrow encoding utf-8 semi-transparently a magic object is not the worst idea.
> But if we are trying to find a text encoding scheme that does not lose data, I would suggest we use the one we already have :)
I would lean more towards not having an escape sequence.
Providing at least the “literal encoding” should be the base, u8 very much optional (if you want you could generate those by transcoding, hence I think the issue of transcoders are important), as I assume that this could be use for example to resolve symbol names, correct me if I’m wrong but symbol names exist in any platforms, those are related to identifier names and have rules regarding what could be resolved as a symbol name.
Unicode is quite an extensive collection of glyphs, even if it doesn’t have everything, perhaps if you have written an identifier that doesn’t convert back to utf8 and you get into trouble… maybe don’t use those to write code?
Received on 2024-05-07 18:52:04