Date: Mon, 25 Feb 2019 10:44:41 +0100
„I have GCC writing out JSON-like syntax right now. It isn't 100% valid since it isn't UTF-8, but I don't want *that* in these files either.”
I am aware it is a slight pain to handle UTF-8 in it’s entireity, however as a humble end-user I really would like a world where using a non-ASCII friendly OS localization isn’t going to break things. Paths sometimes contain UTF-8 characters, even if I really don’t want to. On Windows for instance, providing my real first/last name for user name results in my home directory having UTF-8 in it, so all my docs, source code, etc. are reachable through that. I really got fed up by some software randomly choking on sh*t like that, so I instead forfeit my name and create a local user for new OS installs with an ASCII name and „imbue” the account with MS credentials later.
Same things go for installing games, which ordinary people install under C:\Games where a fellow Hungarian might be tempted by C:\Játékok which results in some games unable to load/save games, etc. It generally works fine, until you grab software from 15-20 years ago. C:\Tools become C:\Kellékek, but ultimately one gets fed up by software not working outside the ASCII code table and just starts mixing languages or leaving accents off words (I do the latter).
TL;DR: Paths containing UTF-8 is sometimes not the choice of the user but OS or other SW vendor. Please keep that in mind. (In the 21st century, this should really not cause headaches on the end-user side.)
Feladó: Ben Boeckel
Elküldve: 2019. február 25., hétfő 6:39
Címzett: Mathias Stearn
Másolatot kap: modules_at_[hidden]; WG21 Tooling Study Group SG15
Tárgy: Re: [Tooling] [isocpp-modules] Path to modules with old bad buildsystems
On Sat, Feb 23, 2019 at 22:36:28 -1000, Mathias Stearn wrote:
> I actually think that we should use this opportunity to switch to a
> standardized data format, such as JSON, that has parsers for basically
> every language (even make could use something like jq) for exchanging
> metadata between the compiler, build systems and other tools.
I have GCC writing out JSON-like syntax right now. It isn't 100% valid
since it isn't UTF-8, but I don't want *that* in these files either.
Note that right now, extra information is still necessary for
communicating with the build tool since the compiler is unaware of the
build's root directory, so paths may need prefixed before handing off to
the build tool for accuracy.
> I really don't want to have to teach every tool to read Makefile syntax. It
> is also extremely limited in what it can tell you, and we may need/want a
> lot more kinds of data that it can provide. Eg the hash that should be
> used to detect thst the interface has changed in a way that doesn't need a
> rebuild of importers, or the list of bmi-altering flags in from the current
> command.
Agreed. Especially with the ambiguities that GCC outputs with fun like:
#include "path with spaces.h"
#include <path\with\slash.h>
#include <path\ending\in\\> // Need two to avoid confusing the escape, but the depfile is not good.
> lot more kinds of data that it can provide. Eg the hash that should be
> used to detect thst the interface has changed in a way that doesn't need a
> rebuild of importers, or the list of bmi-altering flags in from the current
> command.
Well, you can't know until you actually compile the BMI whether it has
changed or not. The best we can ask for is "only update if contents are
unchanged". Getting that for .o files would be nice as well. Ninja can
then optimize no-change compilations via `restat`.
> And for the love of $diety, don't put any locale- sensitive strings in this
> metadata!
I'd rather have it just be "a series of bytes that is a valid lookup on
the filesystem". The `\` and `"` characters are escaped using `\` for
obvious reasons. Maybe we do it for control characters as well. Is that
good enough for a specification?
--Ben
_______________________________________________
Tooling mailing list
Tooling_at_[hidden]
http://www.open-std.org/mailman/listinfo/tooling
I am aware it is a slight pain to handle UTF-8 in it’s entireity, however as a humble end-user I really would like a world where using a non-ASCII friendly OS localization isn’t going to break things. Paths sometimes contain UTF-8 characters, even if I really don’t want to. On Windows for instance, providing my real first/last name for user name results in my home directory having UTF-8 in it, so all my docs, source code, etc. are reachable through that. I really got fed up by some software randomly choking on sh*t like that, so I instead forfeit my name and create a local user for new OS installs with an ASCII name and „imbue” the account with MS credentials later.
Same things go for installing games, which ordinary people install under C:\Games where a fellow Hungarian might be tempted by C:\Játékok which results in some games unable to load/save games, etc. It generally works fine, until you grab software from 15-20 years ago. C:\Tools become C:\Kellékek, but ultimately one gets fed up by software not working outside the ASCII code table and just starts mixing languages or leaving accents off words (I do the latter).
TL;DR: Paths containing UTF-8 is sometimes not the choice of the user but OS or other SW vendor. Please keep that in mind. (In the 21st century, this should really not cause headaches on the end-user side.)
Feladó: Ben Boeckel
Elküldve: 2019. február 25., hétfő 6:39
Címzett: Mathias Stearn
Másolatot kap: modules_at_[hidden]; WG21 Tooling Study Group SG15
Tárgy: Re: [Tooling] [isocpp-modules] Path to modules with old bad buildsystems
On Sat, Feb 23, 2019 at 22:36:28 -1000, Mathias Stearn wrote:
> I actually think that we should use this opportunity to switch to a
> standardized data format, such as JSON, that has parsers for basically
> every language (even make could use something like jq) for exchanging
> metadata between the compiler, build systems and other tools.
I have GCC writing out JSON-like syntax right now. It isn't 100% valid
since it isn't UTF-8, but I don't want *that* in these files either.
Note that right now, extra information is still necessary for
communicating with the build tool since the compiler is unaware of the
build's root directory, so paths may need prefixed before handing off to
the build tool for accuracy.
> I really don't want to have to teach every tool to read Makefile syntax. It
> is also extremely limited in what it can tell you, and we may need/want a
> lot more kinds of data that it can provide. Eg the hash that should be
> used to detect thst the interface has changed in a way that doesn't need a
> rebuild of importers, or the list of bmi-altering flags in from the current
> command.
Agreed. Especially with the ambiguities that GCC outputs with fun like:
#include "path with spaces.h"
#include <path\with\slash.h>
#include <path\ending\in\\> // Need two to avoid confusing the escape, but the depfile is not good.
> lot more kinds of data that it can provide. Eg the hash that should be
> used to detect thst the interface has changed in a way that doesn't need a
> rebuild of importers, or the list of bmi-altering flags in from the current
> command.
Well, you can't know until you actually compile the BMI whether it has
changed or not. The best we can ask for is "only update if contents are
unchanged". Getting that for .o files would be nice as well. Ninja can
then optimize no-change compilations via `restat`.
> And for the love of $diety, don't put any locale- sensitive strings in this
> metadata!
I'd rather have it just be "a series of bytes that is a valid lookup on
the filesystem". The `\` and `"` characters are escaped using `\` for
obvious reasons. Maybe we do it for control characters as well. Is that
good enough for a specification?
--Ben
_______________________________________________
Tooling mailing list
Tooling_at_[hidden]
http://www.open-std.org/mailman/listinfo/tooling
Received on 2019-02-25 10:44:45