C++ Logo

std-proposals

Advanced search

Re: [std-proposals] A Minimal JSON Support Library for C++

From: Yexuan Xiao <bizwen_at_[hidden]>
Date: Wed, 14 Feb 2024 09:40:25 +0000
Did you really open the link and read it? Although my paper currently lacks some detailed explanations, at least it has already answered some of your many questions in advance. In my design, json is similar to a container adapter (the only exception is that it needs to allocate the memory of the nodes by itself), so it is closer to the data structure itself, rather than a huge black box. In addition, JSON has its own unique value null​, which is not part of C++, and it uses dynamic data structures, which C++ does not provide any support for.

________________________________
From: Std-Proposals <std-proposals-bounces_at_[hidden]> on behalf of Darrell Wright via Std-Proposals <std-proposals_at_[hidden]>
Sent: Wednesday, February 14, 2024 12:55
To: std-proposals_at_[hidden]pp.org <std-proposals_at_[hidden]>
Cc: Darrell Wright <darrell.wright_at_[hidden]>
Subject: Re: [std-proposals] A Minimal JSON Support Library for C++

My thoughts on the standardization of JSON along with some of the information I have learned about JSON in surveying the ecosystem over the years. First, I don’t think that standardization of a std::json should be done, until at least reflection is in place and we know how to write idiomatic libraries with it. Reflection won’t be a panacea for JSON and other serialization because it only really solves the aggregate DT object problem for many cases, but it makes a lot of sense for the common cases. So at least post C++26. JSON libraries are hard to do in ways that are unopinionated, fast, low memory, and constexpr.

Also, I don’t think that the current norm for C++ JSON libraries is the way to go and is shaped by what has been available. Prior to C++17 a lot of useful techniques to make exploiting the types we are (de)serializing where not available. What has been useful to make in that world is has centred around parsing directly to a json_value object(nlohmann json/Boost JSON/…) that provides a map like interface. This, in general, cannot be low in memory or fast because it is often an intermediary type and it allocates for each branch. With support for custom allocation or facilities like memory resource(Boost JSON/RapidJSON) it can reduce the allocations via the default allocator, but at heart it is a node based data structure with its characteristics. In general this cannot be constexpr either, however, P2738 may help with making this constexpr. The process of transferring data to and from a json_value has a lot of potential errors and code to both check for member availability and data type in the JSON document. One can use reflection, or reflection like libraries, to ease this but it is still an extra and costly step. Another reason I think we should wait is that in a post reflection world I think the use of an intermediary json_value object to do most JSON tasks will fall out of favour and in favour of a Rust Serde like model where we an more easily inspect the data structures and let the libraries have good defaults on how to (de)serialize it. This allows normal C++ data structures to be (de)serialized to/from JSON and other formats without any or a lot of code.

Another common approach is to provide an event based parser interface where the user supplies callbacks(rapidjson/JSON Link) for the start/end of classes/arrays along with when members are encountered of bool/null/string types. This is generally seen as hard to work with as it requires the user to account for all state with more code that introduces more potential coding errors. One area it can work well for, but isn’t really a concern here I think is tooling like JSON magnification. This style of JSON library can reduce the memory requirements and can generally be faster when compared to a json_value like approach.

On-demand/json iterator/lazy parser approach has been successful for projects like simdjson(considered the fastest library for reading)/JSON Link and is of the fastest/lowest memory methods. This method inverts the event parsing and only parses when asked for a member to parse. This method can work well with simple data structures in arrays. The difficulty here is that it is generally read only(I am fairly confident one cannot use it for serialization but not certain) and works best for simple data structures. One could use constructors/customization points to convert the libraries iterator to their data structures though, but this gets into similar complexities as the json_value model.

The unique model I am a fan of but quite biased as it is how I generally approached this, with JSON Link, is declaratively map json objects to ones data structure via it’s constructor. It is a constexpr, non-allocating JSON library. It has the benefit of using the types constructors to enforce preconditions and allows for a declarative mapping approach. With reflection much of the mapping can be automated, but there will be cases to go outside that and a way to explicitly describe the relationship between the JSON object and the C++ data structure is needed. Without explicit SIMD, the library is close in performance to simdjson and has the benefit of being able to write too. The gist of the library is the it knows how to parse a class/array/boolean/null/number(with special paths for integrals/floating point and custom number types like gnu mp bignums through customization) and the declarative mappings tell the library how to parse a type with those building blocks. This model doesn’t prevent a json_value like type, but I have not approached this yet. But generally without the json_value, some types of tasks, like small modifications of the original document are more tedious as a new class must be written.

Some other challenges we need to think of here, and others have talked to a bit is how do we want to input JSON to the library. I am preferential to in memory contiguous characters and most libraries follow this. This allows one to use the document structure and reduces allocations. Another approach is to allow streaming(Boost.JSON allows this and maybe the only one) where one feeds the parser state object and it builds the json_value as it can. This can have the benefit of allowing for servers to work/inspect on the fly and can reduce temporary storage for the document as the json_value is built. Another challenge is that JSON parsers can be complex and this is quite commonly an attack surface with updates needed to remediate. For output, there are customizations around how to format that are generally not just nice to haves, along with how to actually write the serialized data. JeanHeyd Meneide wrote about output ranges, https://thephd.dev/output-ranges , and something like that or some way to abstract writing of data to memory/devices/network is needed.

The other question is, do we want a JSON library or a more general (de)serialization library akin to Rust’s Serde?

In summary, I don’t think the C++ std is the place for JSON, at least yet. I think, we can get something good that is also potentially more general, after C++26 and reflection.


Cheers
Darrell Wright

On Feb 13, 2024, at 22:20, René Ferdinand Rivera Morell via Std-Proposals <std-proposals_at_[hidden]> wrote:

On Tue, Feb 13, 2024 at 11:47 AM Yexuan Xiao via Std-Proposals
<std-proposals_at_[hidden]> wrote:
Any suggestions are welcome and I hope you like it.

I haven't seen this suggestion in the replies so here goes.. Instead
of writing a paper to avoid using external JSON libraries, write a
paper to make it easier for people to use external JSON libraries (or
any other external library).

--
-- René Ferdinand Rivera Morell
-- Don't Assume Anything -- No Supone Nada
-- Robot Dreams - http://robot-dreams.net
--
Std-Proposals mailing list
Std-Proposals_at_[hidden]
https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals

Received on 2024-02-14 09:40:31