Binary Formats Are Better Than JSON in Browsers

adamfaulkner.github.io

75 points by adamkf 2 months ago

pjmlp 2 months ago

They are better everywhere, it makes no sense replicating CORBA and DCOM, parsing text all the time.

It is always a matter of tooling, being able to easily look into whatever they might contain.

I guess at least JSON-RPC (which most REST is actually about) isn't as bad as SOAP.

guidedlight 2 months ago

I remember working in a large corporate on a central integration broker (from memory it was Java based).
The CPU required for marshalling and demarshalling JSON and SOAP was easily over 80% of the workload. And yes, you are right SOAP is worse.
I always thought is probably an opportunity to create hardware accelerators for this type of workloads.

jiehong 2 months ago

Wondering is Apache Parquet would compare well.

I’m unsure of the quality of parquet implementations for browsers [0], though.

[0]: https://github.com/hyparam/hyparquet

codelikeawolf 2 months ago

I've been working with the VS Code codebase for the past few months and I noticed they're encoding/decoding all the IPC messages using a custom binary buffer implementation (or at least I think that's the case, some of the code is hard to follow). It struck me as odd because most of the messages are pretty small (definitely not 100s of MB). I always wondered: at what point does it become more performant to encode messages to binary and decode them on the other end? Does it really take that much longer to just send strings?

actionfromafar 2 months ago

I don't know, but I'm struck by the observation that binary formats are "just" a special case of compression.
- adammarples 2 months ago
  
  Well, they're not, they're just an efficient way of encoding a subset of data. They can't compress that data, like, for example, compressing 100 x "a", unless they have a compression algorithm attached, they will just represent "a" 100 times, but more efficiently than unicode or whatever.
  - actionfromafar 2 months ago
    
    It's almost as you'd have compressed that string 100 into a single byte.

victorNicollet 2 months ago

We are using binary formats in production, also for data visualization and analysis. We went for a simple custom format: in addition to the usual JSON types (string, number, array, boolean, object), the serialized value can contain the standard JSON types, but can also contain JavaScript typed arrays (Uint8Array, Float32Array, etc). The serialized data contains the raw data of all the typed arrays, followed by a single JSON-serialized block of the original value with the typed arrays replaced by reference objects pointing to the appropriate parts of the raw data region.

For most data visualization tasks, the dataset will be composed of 5% of JSON data and 95% of a small number of large arrays (usually Float32Array) representing data table columns. The JSON takes time to parse, but it is small, and the large arrays can be created in constant time from the ArrayBuffer of the HTTP response (on big-endian machines, this will be linear time for all except Uint8Array).

For situations where hundreds of thousands of complex objects must be transferred, we will usually pack those objects as several large arrays instead. For example, using a struct-of-arrays instead of an array-of-structs, and/or by having an Uint8Array contain the result of a binary serialization of each object, with an Uint32Array containing the bounds of each object. The objective is to have the initial parsing be nearly instantaneous, and then to extract the individual objects on demand: this minimizes the total memory usage in the browser, and in the (typical) case where only a small subset of objects is being displayed or manipulate, the parsing time is reduced to only that subset instead of the entire response.

The main difficulty is that the browser developer tools "network" tab does not properly display non-JSON values, so investigating an issue requires placing a breakpoint or console.log right after the parsing of a response...

pjmlp 2 months ago

Usually for such scenarios, in the context of distributed systems, I tend to rely on general purpose monitoring tools, or a quick and dirty mini gateway if I can't reach for them.

naikrovek 2 months ago

> During this process, I benchmarked a number of different binary encodings. To my surprise, I found that all of them were much slower than JSON in browsers.

I doubt this a lot. Even turning a small JSON message into a pre-defined character-delimited string is at least an order of magnitude faster than JSON. In my testing, at least. in my testing over the years, JSON has always been the slowest thing to extract information out of, no matter how you approach it. And any binary format will be faster than just about any text format.

developers think binary stuff is scary, or somehow harder than text, but it isn't. binary is faster, smaller, and depending on your use case, more flexible.

I use type-length-value for just about everything binary. it's easy to understand later with a hex editor if you need to, but obviously documentation helps understanding this greatly. and no documentation for these format is the code which reads it.

binary formats. good. try it. tell your friends.

zigzag312 2 months ago

MessagePack seems to use string field names. I wonder how it would perform with integer field keys defined (which MessagePack also supports).

chris_pie 2 months ago

Here's a benchmark of int keys in .NET https://github.com/MessagePack-CSharp/MessagePack-CSharp#des...

broken_broken_ 2 months ago

I initially did not see the use case for sending hundreds of MiB to the browser, but near the end, data visualization is mentioned which is fair.

I could also see video games and observability platforms needing to do this, for example Datadog, which must send lots of stack traces with long function names along with numerical data.

avmich 2 months ago

Isn't this question better to delegate to compression layer?

mannyv 2 months ago

Uh, where is the comparison? Is the final one deserialize and verify latency?

bdhcuidbebe 2 months ago

In the very first graph