Date: Mon, 25 Feb 2019 22:11:44 -0500
On Mon, Feb 25, 2019 at 08:42:10 -1000, Mathias Stearn wrote:
> On Sun, Feb 24, 2019, 7:39 PM Ben Boeckel <ben.boeckel_at_[hidden]> wrote:
> > I have GCC writing out JSON-like syntax right now. It isn't 100% valid
> > since it isn't UTF-8, but I don't want *that* in these files either.
>
> It seems reasonable to have non-ascii in user-provided data fields. We
> should figure out how to handle the case where the user's path is invalid
> utf8, like ok linux where it can be a random bag-o-byte or on UCS2
> platforms that allow mismatched surrogates. If the compiledb format handles
> these cases, we should probably just do whatever they do.
Looking at the spec, it just says "JSON", so I would assume that either
it doesn't support non-Unicode command line strings (or values of any
other field), or all the readers of those files is very lax.
> Well, you can't know until you actually compile the BMI whether it has
> > changed or not. The best we can ask for is "only update if contents are
> > unchanged". Getting that for .o files would be nice as well. Ninja can
> > then optimize no-change compilations via `restat`.
>
> I didn't just mean for the scan phase. The BMI can change in ways that
> don't require the downstream stuff to be recompiled, eg a comment string
> was changed on a line of source included only for better error reporting.
> Similarly, I could see that something like that happening with the .o and
> split-dwarf / osx-style unsplit-split-dwarf.
The format I have for GCC allows for saying "these files are also
output". It needs wired up in GCC (and CMake for that matter), but it
sounds OK to me.
> > And for the love of $diety, don't put any locale- sensitive strings in
> > this
> > > metadata!
> >
> > I'd rather have it just be "a series of bytes that is a valid lookup on
> > the filesystem". The `\` and `"` characters are escaped using `\` for
> > obvious reasons. Maybe we do it for control characters as well. Is that
> > good enough for a specification?
> >
>
> I think I made my point poorly, and was misinterpreted. I was just making a
> joke about /showIncludes. The equivalent behavior would be to make the
> field names in the json file match the user's language. I hope no vender is
> mean enough to actually do that! Obviously users need to be able to use
> their language in their files and paths. I'm not suggesting we limit that
> in any way, just that the field names are predictable.
I thought you were alluding to that. I want the spec to use just plain
ASCII keys and have a version field. Something to remember for the TR.
--Ben
> On Sun, Feb 24, 2019, 7:39 PM Ben Boeckel <ben.boeckel_at_[hidden]> wrote:
> > I have GCC writing out JSON-like syntax right now. It isn't 100% valid
> > since it isn't UTF-8, but I don't want *that* in these files either.
>
> It seems reasonable to have non-ascii in user-provided data fields. We
> should figure out how to handle the case where the user's path is invalid
> utf8, like ok linux where it can be a random bag-o-byte or on UCS2
> platforms that allow mismatched surrogates. If the compiledb format handles
> these cases, we should probably just do whatever they do.
Looking at the spec, it just says "JSON", so I would assume that either
it doesn't support non-Unicode command line strings (or values of any
other field), or all the readers of those files is very lax.
> Well, you can't know until you actually compile the BMI whether it has
> > changed or not. The best we can ask for is "only update if contents are
> > unchanged". Getting that for .o files would be nice as well. Ninja can
> > then optimize no-change compilations via `restat`.
>
> I didn't just mean for the scan phase. The BMI can change in ways that
> don't require the downstream stuff to be recompiled, eg a comment string
> was changed on a line of source included only for better error reporting.
> Similarly, I could see that something like that happening with the .o and
> split-dwarf / osx-style unsplit-split-dwarf.
The format I have for GCC allows for saying "these files are also
output". It needs wired up in GCC (and CMake for that matter), but it
sounds OK to me.
> > And for the love of $diety, don't put any locale- sensitive strings in
> > this
> > > metadata!
> >
> > I'd rather have it just be "a series of bytes that is a valid lookup on
> > the filesystem". The `\` and `"` characters are escaped using `\` for
> > obvious reasons. Maybe we do it for control characters as well. Is that
> > good enough for a specification?
> >
>
> I think I made my point poorly, and was misinterpreted. I was just making a
> joke about /showIncludes. The equivalent behavior would be to make the
> field names in the json file match the user's language. I hope no vender is
> mean enough to actually do that! Obviously users need to be able to use
> their language in their files and paths. I'm not suggesting we limit that
> in any way, just that the field names are predictable.
I thought you were alluding to that. I want the spec to use just plain
ASCII keys and have a version field. Something to remember for the TR.
--Ben
Received on 2019-02-26 04:11:55